home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!dtix!darwin.sura.net!convex!convex!connolly
- From: connolly@convex.com (Dan Connolly)
- Newsgroups: comp.lang.perl
- Subject: Re: Parsing RTF
- Message-ID: <1992Sep15.182241.6455@news.eng.convex.com>
- Date: 15 Sep 92 18:22:41 GMT
- References: <1992Sep15.035214.13097@Trirex.COM>
- Sender: usenet@news.eng.convex.com (news access account)
- Organization: Engineering, CONVEX Computer Corp., Richardson, Tx., USA
- Lines: 64
- Nntp-Posting-Host: pixel.convex.com
- X-Disclaimer: This message was written by a user at CONVEX Computer
- Corp. The opinions expressed are those of the user and
- not necessarily those of CONVEX.
-
- In article <1992Sep15.035214.13097@Trirex.COM> mmelling@Trirex.com(Michael Mellinger) writes:
- >I'm interested in writing a small RTF parser in Perl. Given rtf text,
- >like that shown below, what is the best way to extract tokens from the
- >text?
- >
-
- Here's some perl code that parses RTF and writes stuff that a lisp
- parser could digest.
-
- #!/usr/local/bin/perl
- #
-
- while(<>){
- while($_ ne ''){
- if(s/^\{//){ # open {
- print "( ";
- }elsif(s/^\}//){ # close }
- print ")\n";
- }elsif(s/^\\//){ # control sequence
- if(s/^([a-zA-Z]+)(-?[0-9]*) ?//){ # control word
- if($2 ne ''){ # with parameter
- print "($1 $2) ";
- }else{
- print $1, " ";
- }
- }else{ # special control sequence
- if(s/^\'//){ # hex encoded char
- s/..//;
- print "#x$& ";
- }elsif(s/^[:{}\\]//){ # single char escape
- print &lisp_string($&);
- }elsif(s/^\|//){
- print "rtfFormula ";
- }elsif(s/^\~//){
- print "rtfNoBrkSpace ";
- }elsif(s/^\_//){
- print "rtfNoReqHyphen ";
- }elsif(s/^[\n\r]//){
- print "par ";
- }elsif(s/^\*//){
- print "rtfOptDest ";
- }else{
- s/.//;
- warn "look this one up: $& ", ord($&);
- }
- }
- }else{
- if(s/^\t//){
- print "TAB ";
- }else{
- s/^[^\t\\{}]+// && print &lisp_string($&);
- }
- }
- }
- }
-
- sub lisp_string{
- local($_) = @_;
-
- s/\n//g;
- return '' if $_ eq '';
- s/\"/\\\"/g;
- return '"' . $_ . '" ';
- }
-