NetNews Usenet Archive 1992 #20

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #20 / NN_1992_20.iso / spool / comp / lang / perl / 5937 < prev next >

Wrap

Text File | 1992-09-14 | 1.1 KB | 34 lines

Newsgroups: comp.lang.perl Path: sparky!uunet!newshost!root From: mmelling@Trirex.com(Michael Mellinger) Subject: Parsing RTF Message-ID: <1992Sep15.035214.13097@Trirex.COM> Sender: root@Trirex.COM (Operator) Organization: Trirex Systems Inc. Date: Tue, 15 Sep 1992 03:52:14 GMT Lines: 23 I'm interested in writing a small RTF parser in Perl. Given rtf text, like that shown below, what is the best way to extract tokens from the text? {\rtf0\ansi{\fonttbl\f0\fswiss Helvetica;} \margl120 \margr120 {{\attachment0 telephonedirectory2.wp } \pard\tx533\tx1067\tx1601\tx2135\tx2668\tx3202\tx3736\tx4270\tx4803\tx5337 \f0\b0\i0\ul0\fs36 This is the body of the message. The keywords like \rtf and tx (tab settings) are followed by numbers, and as can be seen, keywords don't need to be seperated by spaces. At the moment, I just want to extract out the keywords, but later I anticipate wanting to do more. For those that don't know anything about RTF(Rich Text Format), all keywords begin with a \ and groups of keywords(like stylesheets) are enclose in {}. -Mike mmelling@Trirex.com