home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.lang.perl
- Path: sparky!uunet!newshost!root
- From: mmelling@Trirex.com(Michael Mellinger)
- Subject: Parsing RTF
- Message-ID: <1992Sep15.035214.13097@Trirex.COM>
- Sender: root@Trirex.COM (Operator)
- Organization: Trirex Systems Inc.
- Date: Tue, 15 Sep 1992 03:52:14 GMT
- Lines: 23
-
- I'm interested in writing a small RTF parser in Perl. Given rtf text,
- like that shown below, what is the best way to extract tokens from the
- text?
-
- {\rtf0\ansi{\fonttbl\f0\fswiss Helvetica;}
- \margl120
- \margr120
- {{\attachment0 telephonedirectory2.wp
- }
- \pard\tx533\tx1067\tx1601\tx2135\tx2668\tx3202\tx3736\tx4270\tx4803\tx5337
- \f0\b0\i0\ul0\fs36 This is the body of the message.
-
- The keywords like \rtf and tx (tab settings) are followed by numbers, and
- as can be seen, keywords don't need to be seperated by spaces. At the
- moment, I just want to extract out the keywords, but later I anticipate
- wanting to do more.
-
- For those that don't know anything about RTF(Rich Text Format), all
- keywords begin with a \ and groups of keywords(like stylesheets) are
- enclose in {}.
-
- -Mike
- mmelling@Trirex.com
-