Acorn User 10

home *** CD-ROM | disk | FTP | other *** search

/ Acorn User 10 / AU_CD10.iso / Updates / Perl / RPC / !Perl / lib / site_perl / POD / Parser.pm next >

Wrap

Perl POD Document | 1997-07-14 | 35.8 KB | 1,108 lines

############################################################################# # Parser.pm -- package which defines a base class for parsing pod docs. # # Based on Tom Christiansen's Pod::Text module # (with extensive modifications). # # Copyright (C) 1996 Tom Christiansen. All rights reserved. # This file is part of "PodParser". PodParser is free software; # you can redistribute it and/or modify it under the same terms # as Perl itself. ############################################################################# package Pod::Parser; $VERSION = 1.00; ## Current version of this package require 5.002; ## requires Perl version 5.002 or later =head1 NAME Pod::Parser - base class for creating pod filters and translators =head1 SYNOPSIS use Pod::Parser; package MyParser; @ISA = qw(Pod::Parser); sub new { ## constructor code ... } ## implementation of appropriate subclass methods ... package main; $parser = new MyParser; @ARGV = ('-') unless (@ARGV > 0); for (@ARGV) { $parser->parse_from_file($_); } =head1 DESCRIPTION B<Pod::Parser> is an abstract base class for implementing filters and/or translators to parse pod documentation into other formats. It handles most of the difficulty of parsing the pod sections in a file and leaves it to the subclasses to override various methods to provide the actual translation. The other thing that B<Pod::Parser> provides is the ability to process only selected sections of pod documentation from the input. =head2 SECTION SPECIFICATIONS Certain methods and functions provided by B<Pod::Parser> may be given one or more "section specifications" to restrict the text processed to only the desired set of sections and their corresponding subsections. A section specification is a string containing one or more Perl-style regular expressions separated by forward slashes ("/"). If you need to use a forward slash literally within a section title you can escape it with a backslash ("\/"). The formal syntax of a section specification is: =over 4 =item I<head1-title-regexp>/I<head2-title-regexp>/... =back Any omitted or empty regular expressions will default to ".*". Please note that each regular expression given is implicitly anchored by adding "^" and "$" to the beginning and end. Also, if a given regular expression starts with a "!" character, then the expression is negated (so C<!foo> would match anything I<except> C<foo>). Some example section specifications follow. =over 4 =item Match the C<NAME> and C<SYNOPSIS> sections and all of their subsections: C<NAME|SYNOPSIS> =item Match only the C<Question> and C<Answer> subsections of the C<DESCRIPTION> section: C<DESCRIPTION/Question|Answer> =item Match the C<Comments> subsection of I<all> sections: C</Comments> =item Match all subsections of C<DESCRIPTION> I<except> for C<Comments>: C<DESCRIPTION/!Comments> =item Match the C<DESCRIPTION> section but do I<not> match any of its subsections: C<DESCRIPTION/!.+> =item Match all top level sections but none of their subsections: C</!.+> =back =cut ############################################################################# use strict; # use diagnostics; use Carp; ## Maximum number of heading levels supported for '=headN' directives $Pod::Parser::MAX_HEAD_LEVEL = 3; ##--------------------------------------------------------------------------- =head1 FUNCTIONS B<Pod::Parser> provides the following functions (please note that these are functions and I<not> methods, they do not take an object reference as an implicit first parameter): =cut ##--------------------------------- ## Function definitions begin here ##--------------------------------- =head2 version() Return the current version of this package. =cut sub version { no strict; return $VERSION; } ##--------------------------------------------------------------------------- =head1 INSTANCE METHODS B<Pod::Parser> provides several methods, some of which should be overridden by subclasses. They are as follows: =cut ##------------------------------- ## Method definitions begin here ##------------------------------- =head2 new() This is the the constructor for the base class. You should only use it if you want to create an instance of a B<Pod::Parser> instead of one of its subclasses. The constructor for this class and all of its subclasses should return a blessed reference to an associative array (hash). =cut sub new { my $this = shift; my $class = ref($this) || $this; my %params = @_; my $self = {%params}; bless $self, $class; $self->initialize(); return $self; } =head2 initialize() This method performs any necessary base class initialization. It takes no arguments (other than the object instance of course). If subclasses override this method then they I<must> be sure to invoke the superclass' B<initialize()> method. =cut sub initialize { my $self = shift; return; } =head2 select($section_spec1, $section_spec2, ...) This is the method that is used to select the particular sections and subsections of pod documentation that are to be printed and/or processed. If the I<first> I<argument> is the string "+", then the remaining section specifications are I<added> to the current list of selections; otherwise the given section specifications will I<replace> the current list of selections. Each of the C<$section_spec> arguments should be a section specification as described in L<"SECTION SPECIFICATIONS">. The section specifications are parsed by this method and the resulting regular expressions are stored in the array referenced by C<$self-E<gt>{SELECTED}> (please see the description of this member variable in L<"INSTANCE DATA">). This method should I<not> normally be overridden by subclasses. =cut sub select { my $self = shift; my @sections = @_; my $add = ($sections[0] eq "+") ? shift(@sections) : ""; ## reset the set of sections to use unless (@sections > 0) { undef $self->{SELECTED} unless ($add); return; } $self->{SELECTED} = [] unless ($add && defined $self->{SELECTED}); local($_); my ($spec, $errors); my (@re, $i, $negated, %exprs); ## Parse each spec for $spec (@sections) { $_ = $spec; s|\\\\|\001|go; ## handle escaped backward slashes s|\\/|\002|go; ## handle escaped forward slashes ## Extract the regexps for the heading titles @re = split('/', $_, $Pod::Parser::MAX_HEAD_LEVEL); ## Modify the regexps as needed and check their syntax for ($i = 0; $i < $Pod::Parser::MAX_HEAD_LEVEL; ++$i) { $re[$i] = '.*' if ((! defined $re[$i]) || $re[$i] eq ""); $re[$i] .= '.+' if ($re[$i] eq '!'); ## Put back any escape sequences we "handled" $re[$i] =~ s|\001|\\\\|go; $re[$i] =~ s|\002|\\/|go; ## Check for negation $negated = ($re[$i] =~ /^\!/o); $re[$i] = $' if ($negated); ## Check regexp syntax eval "/$re[$i]/"; if ($@) { ++$errors; carp "Invalid regular expression /$re[$i]/ in \"$spec\": $@\n"; } else { ## Add the forward and rear anchors (and put the negator back) $re[$i] = '^' . $re[$i] unless ($re[$i] =~ /^\^/o); $re[$i] = $re[$i] . '$' unless ($re[$i] =~ /\$$/o); $re[$i] = '!' . $re[$i] if ($negated); } } if ($errors) { carp "Ignoring section spec \"$spec\"!\n"; } else { ## Store them in our sections array push(@{$self->{SELECTED}}, [ @re ]); } } } =head2 want_section($head1_title, $head2_title, ...) Returns a value of true if the given section and subsection titles match any of the section specifications passed to the B<select()> method (or if no section specifications were given). Returns a value of false otherwise. If C<$headN_title> is ommitted then it defaults to the current C<headN> section title in the input. This method should I<not> normally be overridden by subclasses. =cut sub want_section { my $self = shift; my (@heads) = @_; ## Return true if no restrictions were explicitly specified return 1 unless ((defined $self->{SELECTED}) && (@{$self->{SELECTED}} > 0)); ## default any unspecified sections to the current one my $i; for ($i = 0; $i < $Pod::Parser::MAX_HEAD_LEVEL; ++$i) { $heads[$i] = $self->{HEADINGS}->[$i] unless (defined $heads[$i]); } ## Look for a match against the specified section expressions my ($sec_spec, $regexp, $negated, $match); for $sec_spec (@{$self->{SELECTED}}) { $match = 1; for ($i = 0; $i < $Pod::Parser::MAX_HEAD_LEVEL; ++$i) { $regexp = $sec_spec->[$i]; $negated = ($regexp =~ /^\!/o); $regexp = $' if ($negated); $match &= ($negated ? ($heads[$i] !~ /${regexp}/) : ($heads[$i] =~ /${regexp}/)); last unless ($match); } return 1 if ($match); } return 0; ## no match } =head2 begin_input() This method is invoked by B<parse_from_filehandle()> immediately I<before> processing input from a filehandle. The base class implementation does nothing but subclasses may override it to perform any per-file intializations. =cut sub begin_input { my $self = shift; #---------------------------------------------------- # Subclasses may wish to make use of some of the # commented-out code below for initializing pragmas #---------------------------------------------------- # $self->{PRAGMAS} = { # FILL => 'on', # STYLE => 'plain', # INDENT => 0, # }; # ## Initialize all PREVIOUS_XXX pragma values # my ($name, $value); # for (($name, $value) = each %{$self->{PRAGMAS}}) { # $self->{PRAGMAS}->{"PREVIOUS_${name}"} = $value; # } #---------------------------------------------------- return; } =head2 end_input() This method is invoked by B<parse_from_filehandle()> immediately I<after> processing input from a filehandle. The base class implementation does nothing but subclasses may override it to perform any per-file cleanup actions. =cut sub end_input { my $self = shift; return; } =head2 preprocess_line($text) This methods should be overridden by subclasses that wish to perform any kind of preprocessing for each I<line> of input (I<before> it has been determined whether or not it is part of a pod paragraph). The parameter C<$text> is the input line and the value returned should correspond to the new text to use in its place. If the empty string or an undefined value is returned then no further process will be performed for this line. If desired, this method can call the B<parse_paragraph()> method directly with any preprocessed text and return an empty string (to indicate that no further processing is needed). Please note that the B<preprocess_line()> method is invoked I<before> the B<preprocess_paragraph()> method. After all (possibly preprocessed) lines in a paragraph have been assembled together and it has been determined that the paragraph is part of the pod documentation from one of the selected sections, then B<preprocess_paragraph()> is invoked. The base class implementation of this method returns the given text. =cut sub preprocess_line($text) { my $self = shift; my $text = shift; return $text; } =head2 preprocess_paragraph($text) This method should be overridden by subclasses that wish to perform any kind of preprocessing for each block (paragraph) of pod documentation that appears in the input stream. The parameter C<$text> is the pod paragraph from the input file and the value returned should correspond to the new text to use in its place. If the empty string is returned or an undefined value is returned, then the given C<$text> is ignored (not processed). This method is invoked by B<parse_paragraph()>. After it returns, B<parse_paragraph()> examines the current cutting state (which is stored in C<$self-E<gt>{CUTTING}>). If it evaluates to false then input text (including the given C<$text>) is cut (not processed) until the next pod directive is encountered. Please note that the B<preprocess_line()> method is invoked I<before> the B<preprocess_paragraph()> method. After all (possibly preprocessed) lines in a paragraph have been assembled together and it has been determined that the paragraph is part of the pod documentation from one of the selected sections, then B<preprocess_paragraph()> is invoked. The base class implementation of this method returns the given text. =cut sub preprocess_paragraph { my $self = shift; my $text = shift; return $text; } =head2 parse_pragmas($cmd, $text, $sep) This method is called when an C<=pod> directive is encountered. When such a pod directive is seen in the input, this method is called and is passed the command name C<$cmd> (which should be "pod") and the remainder of the text paragraph C<$text> which appeared immediately after the command name. If desired, the text which separated the C<=pod> directive from its corresponding text may be found in C<$sep>. Each word in C<$text> is examined to see if it is a pragma specification. Pragma specifications are of the form C<pragma_name=pragma_value>. Unless the given object is an instance of the B<Pod::Parser> class, the base class implementation of this method will invoke the B<pragma()> method for each pragma specification in C<$text>. I<If and only if> the given object I<is> an instance of the B<Pod::Parser> class, the base class version of this method will simply reproduce the C<=pod> command exactly as it appeared in the input. Derived classes should I<not> usually need to reimplement this method. =cut sub parse_pragmas { my $self = shift; my $cmd = shift; my $text = shift; my $sep = shift; $cmd = '' unless (defined $cmd); $text = '' unless (defined $text); $sep = ' ' unless (defined $sep); local($_); if (ref($self) eq 'Pod::Parser') { ## If and only if this is an instance of the base class, then ## dump the '=pod' paragraph exactly as it appeared in the input. my $out_fh = $self->{OUTPUT}; print $out_fh "=${cmd}${sep}${text}"; } else { ## otherwise invoke the pragma command for each word in $text my @pragmas = split(" ", $text); for (@pragmas) { $self->pragma(lc $`, $') if (/=/o); } } } =head2 pragma($pragma_name, $pragma_value) This method is invoked for each pragma encountered inside an C<=pod> paragraph (see the description of the B<parse_pragmas()> method). The pragma name is passed in C<$pragma_name> (which should always be lowercase) and the corresponding value is C<$pragma_value>. The base class implementation of this method does nothing. Derived class implementations of this method should be able to recognize at least the following pragmas and take any necessary actions when they are encountered: =over 4 =item B<fill=value> The argument I<value> should be one of C<on>, C<off>, or C<previous>. Specifies that "filling-mode" should set to 1, 0, or its previous value (respectively). If I<value> is omitted then the default is C<on>. Derived classes may use this to decide whether or not to perform any filling (wrapping) of subsequent text. =item B<style=value> The argument I<value> should be one of C<bold>, C<italic>, C<code>, C<plain>, or C<previous>. Specifies that the current default paragraph font should be set to C<bold>, C<italic>, C<code>, the empty string C<>, or its previous value (respectively). If I<value> is omitted then the default is C<plain>. Derived classes may use this to determine the default font style to use for subsequent text. =item B<indent=value> The argument I<value> should be an integer value (with an optional sign). Specifies that the current indentation level should be reset to the given value. If a plus (minus) sign precedes the number then the indentation level should be incremented (decremented) by the given number. If only a plus or minus sign is given (without a number) then the current indentation level is incremented or decremented by some default amount (to be determined by subclasses). =back The value returned will be 1 if the pragma name was recognized and 0 if it wasnt (in which case the pragma was ignored). Derived classes should override this method if they wish to implement any pragmas. The base class implementation of this method does nothing but it does contain some commented-out code which subclasses may want to make use of when implementing pragmas. =cut sub pragma { my $self = shift; ## convert remaining args to lowercase my $name = lc shift; my $value = lc shift; my $rc = 1; local($_); #---------------------------------------------------- # Subclasses may wish to make use of some of the # commented-out code below for processing pragmas #---------------------------------------------------- # my ($abbrev, %abbrev_table); # if ($name eq 'fill') { # %abbrev_table = ('on' => 'on', # 'of' => 'off', # 'p' => 'previous'); # $value = 'on' unless ((defined $value) && ($value ne '')); # return $rc unless ($value =~ /^(on|of|p)/io); # $abbrev = $1; # $value = $abbrev_table{$abbrev}; # if ($value eq 'previous') { # $self->{PRAGMAS}->{FILL} = $self->{PRAGMAS}->{PREVIOUS_FILL}; # } # else { # $self->{PRAGMAS}->{PREVIOUS_FILL} = $self->{PRAGMAS}->{FILL}; # $self->{PRAGMAS}->{FILL} = $value; # } # } # elsif ($name eq 'style') { # %abbrev_table = ('b' => 'bold', # 'i' => 'italic', # 'c' => 'code', # 'pl' => 'plain', # 'pr' => 'previous'); # $value = 'plain' unless ((defined $value) && ($value ne '')); # return $rc unless ($value =~ /^(b|i|c|pl|pr)/io); # $abbrev = $1; # $value = $abbrev_table{$abbrev}; # if ($value eq 'previous') { # $self->{PRAGMAS}->{STYLE} = $self->{PRAGMAS}->{PREVIOUS_STYLE}; # } # else { # $self->{PRAGMAS}->{PREVIOUS_STYLE} = $self->{PRAGMAS}->{STYLE}; # $self->{PRAGMAS}->{STYLE} = $value; # } # } # elsif ($name eq 'indent') { # return $rc unless ((defined $value) && ($value =~ /^([-+]?)(\d*)$/o)); # my ($sign, $number) = ($1, $2); # $value .= "4" unless ((defined $number) && ($number ne '')); # $self->{PRAGMAS}->{PREVIOUS_INDENT} = $self->{PRAGMAS}->{INDENT}; # if ($sign) { # $self->{PRAGMAS}->{INDENT} += (0 + $value); # } # else { # $self->{PRAGMAS}->{INDENT} = $value; # } # } # else { # $rc = 0; # } #---------------------------------------------------- return $rc; } =head2 command($cmd, $text, $sep) This method should be overridden by subclasses to take the appropriate action when a pod command paragraph (denoted by a line beginning with "=") is encountered. When such a pod directive is seen in the input, this method is called and is passed the command name C<$cmd> and the remainder of the text paragraph C<$text> which appears immediately after the command name. If desired, the text which separated the command from its corresponding text may be found in C<$sep>. Note that this method is I<not> called for C<=pod> paragraphs. The base class implementation of this method simply prints the raw pod command to the output filehandle and then invokes the B<textblock()> method, passing it the C<$text> parameter. =cut sub command { my $self = shift; my $cmd = shift; my $text = shift; my $sep = shift; $cmd = '' unless (defined $cmd); $text = '' unless (defined $text); $sep = ' ' unless (defined $sep); my $out_fh = $self->{OUTPUT}; print $out_fh "=${cmd}${sep}"; $self->textblock($text); } =head2 verbatim($text) This method may be overridden by subclasses to take the appropriate action when a block of verbatim text is encountered. It is passed the text block C<$text> as a parameter. The base class implementation of this method simply prints the textblock (unmodified) to the output filehandle. =cut sub verbatim { my $self = shift; my $text = shift; my $out_fh = $self->{OUTPUT}; print $out_fh $text; } =head2 textblock($text) This method may be overridden by subclasses to take the appropriate action when a normal block of pod text is encountered (although the base class method will usually do what you want). It is passed the text block C<$text> as a parameter. In order to process interior sequences, subclasses implementations of this method will probably want invoke the B<interpolate()> method, passing it the text block C<$text> as a parameter and then perform any desired processing upon the returned result. The base class implementation of this method simply prints the text block as it occurred in the input stream). =cut sub textblock { my $self = shift; my $text = shift; local($_) = $self->interpolate($text); print $_; } =head2 interior_sequence($seq_cmd, $seq_arg) This method should be overridden by subclasses to take the appropriate action when an interior sequence is encountered. An interior sequence is an embedded command within a block of text which appears as a command name (usually a single uppercase character) followed immediately by a string of text which is enclosed in angle brackets. This method is passed the sequence command C<$seq_cmd> and the corresponding text $seq_arg and is invoked by the B<interpolate()> method for each interior sequence that occurs in the string that it is passed. It should return the desired text string to be used in place of the interior sequence. Subclass implementationss of this method may wish to examine the the array referenced by C<$self-E<gt>{SEQUENCES}> which is a stack of all the interior sequences that are currently being processed (they may be nested). The current interior sequence (the one given by C<$seq_cmdE<lt>$seq_argE<gt>>) should always be at the top of this stack. The base class implementation of the B<interior_sequence()> method simply returns the raw text of the of the interior sequence (as it occurred in the input) to the output filehandle. =cut sub interior_sequence { my $self = shift; my $seq_cmd = shift; my $seq_arg = shift; return "${seq_cmd}<${seq_arg}>"; } =head2 interpolate($text, $end_re) This method will translate all text (including any embedded interior sequences) in the given text string C<$text> and return the interpolated result. If a second argument is given, then it is taken to be a regular expression that indicates when to quit interpolating the string. Upon return, the C<$text> parameter will have been modified to contain only the un-processed portion of the given string (which will I<not> contain any text matched by C<$end_re>). This method should probably I<not> be overridden by subclasses. It should be noted that this method invokes itself recursively to handle any nested interior sequences. =cut sub interpolate { my $self = shift; my ($text, $end_re) = @_; $text = '' unless (defined $text); $end_re = '$' unless ((defined $end_re) && ($end_re ne '')); local($_) = $text; my $result = ''; my ($seq_cmd, $seq_arg, $end) = ('', '', undef); while (($_ ne '') && /([A-Z])<|($end_re)/) { $result .= $`; ## Append text before the match to the result $_ = $'; ## Only text after the match remains to be processed ## See if we matched an interior sequence or an end-expression ($seq_cmd, $end) = ($1, $2); last if (defined $end); ## Saw the end - quit loop here ## At this point we have found an interior sequence, ## we need to obtain its argument push(@{$self->{SEQUENCES}}, $seq_cmd); $seq_arg = $self->interpolate($_, '>'); ## Now process the interior sequence $result .= $self->interior_sequence($seq_cmd, $seq_arg); pop(@{$self->{SEQUENCES}}); } ## Handle whatever is left if we didnt match the ending regexp unless ((defined $end) && ($end_re ne '$')) { $result .= $_; $_ = ''; } ## Modify the input parameter to consume the text that was ## processed so far. $_[0] = $_; ## Return the processed-text return $result; } =head2 parse_paragraph($text) This method takes the text of a pod paragraph to be processed and invokes the appropriate method (one of B<command()>, B<verbatim()>, or B<textblock()>). This method does I<not> usually need to be overridden by subclasses. =cut sub parse_paragraph { my $self = shift; local $_ = shift; my ($para); my ($cmd, $arg, $sep); ## Keep track of current paragraph number ++$self->{PARAGRAPH} if ((defined $_) && ($_ ne '')); ## Ignore up until next pod directive if we are cutting if ($self->{CUTTING}) { return 1 unless /^=/o; $self->{CUTTING} = 0; } ## Keep track of current sections and their titles my ($level, $title); if (/^=((?:sub)*)(?:head(?:ing)?|sec(?:tion)?)(\d*)\s+(.*)\s*$/o) { ## This is a section heading command ($level, $title) = ($2, $3); $level = 1 + (length($1) / 3) if (($level eq '') || ($1 ne '')); ## Reset the current section heading at this level $self->{HEADINGS}->[$level - 1] = $title; ## Reset subsection headings of this one to empty my $i; for ($i = $level; $i < $Pod::Parser::MAX_HEAD_LEVEL; ++$i) { $self->{HEADINGS}->[$i] = ''; } } ## Ignore this block if it isnt in one of the selected sections $self->{CUTTING} = 1 unless ($self->want_section()); return 1 if ($self->{CUTTING}); ## Perform any desired preprocessing $para = $self->preprocess_paragraph($_); next unless ((defined $para) && ($para ne "")); $_ = $para; return 1 if ($self->{CUTTING}); ## Look for one of the three types of paragraphs if (s/^=//o) { ## Looks like a command paragraph ($cmd, $sep, $_) = /\s+/o ? ($`, $&, $') : ($_, '', ''); if ($cmd eq 'cut') { $self->{CUTTING} = 1; } elsif ($cmd eq 'pod') { ## This is a pod pragma paragraph $self->parse_pragmas($cmd, $_, $sep); } else { ## Some other command $self->command($cmd, $_, $sep); } } elsif (/^\s+/o) { ## Indented text - must be a verbatim paragraph $self->verbatim($_); } else { ## Looks like an ordinary block of text $self->textblock($_); } return 1; } =head2 parse_from_filehandle($infilehandle, $outfilehandle) This method takes a glob to a filehandle (which is assumed to already be opened for reading) and reads the entire input stream looking for blocks (paragraphs) of pod documentation to be processed. For each block of pod documentation encountered it will call the B<parse_paragraph()> method. If a second argument is given then it should be a filehandle glob where output should be sent (otherwise the default output filehandle is C<STDOUT>). If no first argument is given the default input filehandle C<STDIN> is used. The input filehandle that is currently in use is stored in the member variable whose key is "INPUT" (e.g. C<$self-E<gt>{INPUT}>). The output filehandle that is currently in use is stored in the member variable whose key is "OUTPUT" (e.g. C<$self-E<gt>{OUTPUT}>). Input is read line-by-line and assembled into paragraphs (which are separated by lines containing nothing but whitespace). The current line number is stored in the member variable whose key is "LINE" (e.g. C<$self-E<gt>{LINE}>) and the current paragraph number is stored in the member variable whose key is "PARAGRAPH" (e.g. C<$self-E<gt>{PARAGRAPH}>). This method does I<not> usually need to be overridden by subclasses. =cut sub parse_from_filehandle { my $self = shift; my $infilehandle = shift; my $outfilehandle = shift; local($_); $infilehandle = \*STDIN unless (defined $infilehandle); $outfilehandle = \*STDOUT unless (defined $outfilehandle); ## Initialize stuff for this input stream $self->{INPUT} = $infilehandle; $self->{OUTPUT} = $outfilehandle; $self->{CUTTING} = 1; ## Keep track of when we are cutting $self->{SEQUENCES} = []; ## Keep track of nested interior sequences ## Initialize section heading titles { $self->{HEADINGS} = []; my $i; for ($i = 0; $i < $Pod::Parser::MAX_HEAD_LEVEL; ++$i) { $self->{HEADINGS}->[$i] = ''; } } $self->begin_input(); my $paragraph = ''; while (<$infilehandle>) { ++$self->{LINE}; ## Keep track of current line number $_ = $self->preprocess_line($_); next unless ((defined $_) && ($_ ne '')); if (! /^\s*$/o) { ## Append this line to the current paragraph $paragraph .= $_; next; } ## This line is blank line and ends the current paragraph next if ($paragraph eq ''); $paragraph .= "\n"; $self->parse_paragraph($paragraph); $paragraph = ''; } ## Dont forget about the last paragraph in the file $self->parse_paragraph($paragraph) unless ($paragraph eq ''); $self->end_input(); } =head2 parse_from_file($filename, $outfile) This method takes a filename and does the following: =over 4 =item * opens the input and output files for reading (creating the appropriate filehandles) =item * invokes the B<parse_from_filehandle()> method passing it the corresponding input and output filehandles. =item * closes the input and output files. =back If the special input filename "-" or "<&STDIN" is given then the STDIN filehandle is used for input (and no open or close is performed). If no input filename is specified then "-" is implied. If a reference is passed instead of a filename then it is assumed to be a glob-style reference to a filehandle. If a second argument is given then it should be the name of the desired output file. If the special output filename "-" or ">&STDOUT" is given then the STDOUT filehandle is used for output (and no open or close is performed). If the special output filename ">&STDERR" is given then the STDERR filehandle is used for output (and no open or close is performed). If no output filename is specified then "-" is implied. If a reference is passed instead of a filename then it is assumed to be a glob-style reference to a filehandle. The name of the input file that is currently being read is stored in the member variable whose key is "INFILE" (e.g. C<$self-E<gt>{INFILE}>). The name of the output file that is currently being written is stored in the member variable whose key is "OUTFILE" (e.g. C<$self-E<gt>{OUTFILE}>). This method does I<not> usually need to be overridden by subclasses. =cut sub parse_from_file { my $self = shift; my $infile = shift; my $outfile = shift; my ($in_fh, $in_fh_name) = (undef, undef); my ($out_fh, $out_fh_name) = (undef, undef); $infile = '-' unless ((defined $infile) && ($infile ne '')); $outfile = '-' unless ((defined $outfile) && ($outfile ne '')); if (($infile eq '-') || ($infile =~ /^<&(STDIN|0)$/o)) { $in_fh = \*STDIN; $self->{INFILE} = "<standard input>"; } elsif (ref $infile) { $in_fh = $infile; $self->{INFILE} = ${$infile};; } else { $self->{INFILE} = $in_fh_name = $infile; $in_fh_name =~ s/\W/_/g; no strict "refs"; open($in_fh_name, "<$infile") || croak "Can't open $infile for reading: $!\n"; $in_fh = \*$in_fh_name; } if (($outfile eq '-') || ($outfile =~ /^>&?(STDOUT|1)$/o)) { $out_fh = \*STDOUT; $self->{OUTFILE} = "<standard output>"; } elsif ($outfile =~ /^>&(STDERR|2)$/o) { $out_fh = \*STDERR; $self->{OUTFILE} = "<standard error>"; } elsif (ref $outfile) { $out_fh = $outfile; $self->{OUTFILE} = ${$outfile};; } else { $self->{OUTFILE} = $out_fh_name = $outfile; $out_fh_name =~ s/\W/_/g; no strict "refs"; open($out_fh_name, ">$outfile") || croak "Can't open $outfile for writing: $!\n"; $out_fh = \*$out_fh_name; } $self->parse_from_filehandle($in_fh, $out_fh); if (defined $in_fh_name) { close($in_fh) || croak "Can't close $infile after reading: $!\n"; } if (defined $out_fh_name) { close($out_fh) || croak "Can't close $outfile after writing: $!\n"; } } ##--------------------------------------------------------------------------- =head1 INSTANCE DATA B<Pod::Parser> uses the following data members for each of its instances (where C<$self> is a reference to such an instance): =head2 $self->{INPUT} The current input filehandle. =head2 $self->{OUTPUT} The current output filehandle. =head2 $self->{INFILE} The name of the current input file. =head2 $self->{OUTFILE} The name of the current output file. =head2 $self->{LINE} The current line number from the input stream. =head2 $self->{PARAGRAPH} The current paragraph number from the input stream (which includes input paragraphs that are I<not> part of the pod documentation). =head2 $self->{HEADINGS} A reference to an array of the current section heading titles for each heading level (note that the first heading level title is at index 0). =head2 $self->{SELECTED} A reference to an array of references to arrays. Each subarray is a list of anchored regular expressions (preceded by a "!" if the regexp is to be negated). The index of the expression in the subarray should correspond to the index of the heading title in B<$self-E<gt>{HEADINGS}> that it is to be matched against. =head2 $self->{CUTTING} A boolean-valued scalar which evaluates to true if text from the input file is currently being "cut". =head2 $self->{SEQUENCES} An array reference to the stack of interior sequence commands that are currently in the middle of being processed. =head1 NOTES To create a pod translator to translate pod documentation to some other format, you usually only need to create a subclass of B<Pod::Parser> which overrides the base class implementation for the following methods: =over 4 =item * B<pragma()> =item * B<command()> =item * B<verbatim()> =item * B<textblock()> =item * B<interior_sequence()> =back You may also want to implement the B<begin_input()> and B<end_input()> methods for your subclass (to perform any needed per-file intialization or cleanup). If you need to perform any preprocesssing of input before it is parsed you may want to implement one or both of the B<preprocess_line()> and/or B<preprocess_paragraph()> methods. Also, don't forget to make sure your subclass constructor invokes the base class' B<initialize()> method. Sometimes it may be necessary to make more than one pass over the input files. This isn't a problem as long as none of the input files correspond to C<STDIN>. You can override either the B<parse_from_filehandle()> method or the B<parse_from_file()> method to make the first pass yourself to collect all the information you need and then invoke the base class method to do the rest of the standard processing. Feel free to add any member data fields you need to keep track of things like current font, indentation, horizontal or vertical position, or whatever else you like. For the most part, the B<Pod::Parser> base class should be able to do most of the input parsing for you and leave you free to worry about how to intepret the commands and translate the result. =head1 AUTHOR Brad Appleton E<lt>Brad_Appleton-GBDA001@email.mot.comE<gt> Based on code for B<Pod::Text> written by Tom Christiansen E<lt>tchrist@mox.perl.comE<gt> =cut 1;