home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!wupost!usc!news!netlabs!lwall
- From: lwall@netlabs.com (Larry Wall)
- Newsgroups: comp.lang.perl
- Subject: Re: Processing multi-line entities with perl
- Message-ID: <1992Aug25.224548.11228@netlabs.com>
- Date: 25 Aug 92 22:45:48 GMT
- References: <1992Aug24.125350.3407@cas.org>
- Sender: news@netlabs.com
- Organization: NetLabs, Inc.
- Lines: 79
- Nntp-Posting-Host: scalpel.netlabs.com
-
- In article <1992Aug24.125350.3407@cas.org> lvirden@cas.org (Larry W. Virden) writes:
- : WARNING! NOVICE QUESTION APPROACHES!
- :
- :
- : Okay. I have occasion to deal with files containing one or more entities of
- : the format:
- :
- :
- : [number1,number2]
- : text0
- :
- : text1
- : text2
- : text3
- :
- : text4
- : possible lines text5 to textn
- :
- :
- : What I want to do is sort such entities by info in various places in the entry.
- : For instance, sometimes it may be by [number1,number2], other times
- : by substrings within text0, etc.
- :
- : Does anyone have some tips on where to begin processing such a file? I
- : have seen shell, awk, etc. code which would redefine record separators to
- : be a newline. But in my case, there is no unique line separating the
- : entities. Each entity DOES begin with [some number, some other number],
- : but that value must remain as part of the entity.
- :
- : Any tips that you might have would be appreciated.
-
- There are several approaches you might take. I usually just keep $_ as a
- lookahead line:
-
- while (<>) {
- &do_something if /^\[\d+,\d+]/;
- $text .= $_;
- }
- &do_something;
-
- sub do_something {
- if ($text ne '') {
- # do something
- $text = '';
- }
- }
-
- You could also play around with a record separator of "\n[", but
- that's a little messier. (But it may also be a lot more efficient!)
-
- $/ = "\n[";
-
- while (!eof()) {
- $_ .= <>;
- chop; chop;
- # do something
- }
- continue {
- $_ = "[";
- }
-
- That assumes you don't have any text lines that start with "[". If
- you do, then you have to do a lookahead of some sort, unless...
-
- If the file's not too large, you can just slurp it in and split it
- on /^\[\d+,\d+]\n/:
-
- {
- local($/) = undef;
- @stuff = split(/^\[\d+,\d+]\n/, <>);
- }
- while (($brackets,$text) = splice(@stuff, 0, 2)) {
- # do something
- }
-
- And sometimes it's better to change the thing that spits out the files
- in the first place. Then you could set $/ to a delimiter you know will
- not be in the text.
-
- Larry
-