home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!cs.utexas.edu!usc!news!netlabs!lwall
- From: lwall@netlabs.com (Larry Wall)
- Newsgroups: comp.lang.perl
- Subject: Re: Fast String Operations?
- Message-ID: <1992Aug27.002721.1322@netlabs.com>
- Date: 27 Aug 92 00:27:21 GMT
- References: <1992Aug25.151625.3134@IDA.ORG>
- Sender: news@netlabs.com
- Organization: NetLabs, Inc.
- Lines: 85
- Nntp-Posting-Host: scalpel.netlabs.com
-
- In article <1992Aug25.151625.3134@IDA.ORG> rlg@IDA.ORG (Randy garrett) writes:
- : I'm looking for the fastest way to perform the following operation.
- : Basically, this is a conversion of one database format to another.
- : If I have a NULL field, indicated by 2 pipe symbols next to
- : each other, I want to insert either a -1 or a ~ between the
- : two pipes, depending upon whether the type of that field
- : is a integer or a character. I know the type of the field
- : because I've already pre-filled that array with the correct
- : types from the Data Dictionary (Thanks Sybase to Perl Interface!).
- :
- : So, I read in a series of lines from a file. If I find 2 pipe
- : symbols adjacent to each other, "||", in the input string, I want
- : to insert either a ~ or a -1 depending on the type of that field,
- : which I get from the @name array.
-
- There's a very important piece of information that you left out. Namely,
- what percentage of the lines do you expect will have a null field? If
- many lines don't have null fields, you want to say this:
-
- while (<INPUT>) {
- next unless /\|\|/;
- # replacement algorithm goes here
- }
-
- This sort of short circuit can save you oodles of time, regardless of
- how inefficient your replacement algorithm is. Presuming, of course,
- that most lines will be rejected immediately.
-
- If most lines have null fields, then it's a waste of time to do the
- short circuit, and you can just go straight for one of the split
- methods previously posted (preferably the correct one).
-
- Alternately, (there's always an alternately) you can do something
- fancy like this:
-
- #!/usr/bin/perl
-
- $/ = '|';
- @default = (-1,'~',-1,-1,'~',-1,'~',-1);
-
- while (<>) {
- $field = 0, print "\n" if s/^\n//;
- next unless /^\|/;
- print $default[$field];
- }
- continue {
- print;
- ++$field;
- }
-
- This presumes there's always a | right before the "\n". It seemed from
- your description that this is so, but it wasn't explicit.
-
- Anyhoo, try them out on your data, and see which one is faster. I can't
- predict without seeing the data.
-
- By the way. Just between you and me and the gatepost, this *might*
- be a better job for C than for Perl, if you only have to support this
- on one architecture. C is better than Perl at character crawling:
-
- #include <stdio.h>
-
- char *def[] = {"-1","~","-1","-1","~","-1","~","-1"};
-
- main() {
- int ch;
- int lastch;
- int field = 0;
-
- while ((ch = getc(stdin)) != EOF) {
- if (ch == '\n')
- field = 0;
- else if (ch == '|') {
- if (lastch == '|')
- fputs(def[field], stdout);
- field++;
- }
- putc(ch, stdout);
- lastch = ch;
- }
- }
-
- Or sumph'n like that.
-
- Larry
-