NetNews Usenet Archive 1992 #19

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #19 / NN_1992_19.iso / spool / comp / lang / perl / 5488 < prev next >

Wrap

Internet Message Format | 1992-08-25 | 2.3 KB

Path: sparky!uunet!wupost!usc!news!netlabs!lwall From: lwall@netlabs.com (Larry Wall) Newsgroups: comp.lang.perl Subject: Re: Processing multi-line entities with perl Message-ID: <1992Aug25.224548.11228@netlabs.com> Date: 25 Aug 92 22:45:48 GMT References: <1992Aug24.125350.3407@cas.org> Sender: news@netlabs.com Organization: NetLabs, Inc. Lines: 79 Nntp-Posting-Host: scalpel.netlabs.com In article <1992Aug24.125350.3407@cas.org> lvirden@cas.org (Larry W. Virden) writes: : WARNING! NOVICE QUESTION APPROACHES! : : : Okay. I have occasion to deal with files containing one or more entities of : the format: : : : [number1,number2] : text0 : : text1 : text2 : text3 : : text4 : possible lines text5 to textn : : : What I want to do is sort such entities by info in various places in the entry. : For instance, sometimes it may be by [number1,number2], other times : by substrings within text0, etc. : : Does anyone have some tips on where to begin processing such a file? I : have seen shell, awk, etc. code which would redefine record separators to : be a newline. But in my case, there is no unique line separating the : entities. Each entity DOES begin with [some number, some other number], : but that value must remain as part of the entity. : : Any tips that you might have would be appreciated. There are several approaches you might take. I usually just keep $_ as a lookahead line: while (<>) { &do_something if /^\[\d+,\d+]/; $text .= $_; } &do_something; sub do_something { if ($text ne '') { # do something $text = ''; } } You could also play around with a record separator of "\n[", but that's a little messier. (But it may also be a lot more efficient!) $/ = "\n["; while (!eof()) { $_ .= <>; chop; chop; # do something } continue { $_ = "["; } That assumes you don't have any text lines that start with "[". If you do, then you have to do a lookahead of some sort, unless... If the file's not too large, you can just slurp it in and split it on /^\[\d+,\d+]\n/: { local($/) = undef; @stuff = split(/^\[\d+,\d+]\n/, <>); } while (($brackets,$text) = splice(@stuff, 0, 2)) { # do something } And sometimes it's better to change the thing that spits out the files in the first place. Then you could set $/ to a delimiter you know will not be in the text. Larry