This (preferred) form of the ValueAttr option requires you to specify both
the element and the attribute names. This is not only safer, it also allows
the original XML to be reconstructed by C<XMLout()>.
Note: You probably don't want to use this option and the NoAttr option at the
same time.
=head2 Variables => { name => value } I<# in - handy>
This option allows variables in the XML to be expanded when the file is read.
(there is no facility for putting the variable names back if you regenerate
XML using C<XMLout>).
A 'variable' is any text of the form C<${name}> which occurs in an attribute
value or in the text content of an element. If 'name' matches a key in the
supplied hashref, C<${name}> will be replaced with the corresponding value from
the hashref. If no matching key is found, the variable will not be replaced.
=head2 VarAttr => 'attr_name' I<# in - handy>
In addition to the variables defined using C<Variables>, this option allows
variables to be defined in the XML. A variable definition consists of an
element with an attribute called 'attr_name' (the value of the C<VarAttr>
option). The value of the attribute will be used as the variable name and the
text content of the element will be used as the value. A variable defined in
this way will override a variable defined using the C<Variables> option. For
example:
XMLin( '<opt>
<dir name="prefix">/usr/local/apache</dir>
<dir name="exec_prefix">${prefix}</dir>
<dir name="bindir">${exec_prefix}/bin</dir>
</opt>',
VarAttr => 'name', ContentKey => '-content'
);
produces the following data structure:
{
dir => {
prefix => '/usr/local/apache',
exec_prefix => '/usr/local/apache',
bindir => '/usr/local/apache/bin',
}
}
=head2 XMLDecl => 1 or XMLDecl => 'string' I<# out - handy>
If you want the output from C<XMLout()> to start with the optional XML
declaration, simply set the option to '1'. The default XML declaration is:
<?xml version='1.0' standalone='yes'?>
If you want some other string (for example to declare an encoding value), set
the value of this option to the complete string you require.
=head1 OPTIONAL OO INTERFACE
The procedural interface is both simple and convenient however there are a
couple of reasons why you might prefer to use the object oriented (OO)
interface:
=over 4
=item *
to define a set of default values which should be used on all subsequent calls
to C<XMLin()> or C<XMLout()>
=item *
to override methods in B<XML::Simple> to provide customised behaviour
=back
The default values for the options described above are unlikely to suit
everyone. The OO interface allows you to effectively override B<XML::Simple>'s
defaults with your preferred values. It works like this:
First create an XML::Simple parser object with your preferred defaults:
my $xs = new XML::Simple(ForceArray => 1, KeepRoot => 1);
then call C<XMLin()> or C<XMLout()> as a method of that object:
my $ref = $xs->XMLin($xml);
my $xml = $xs->XMLout($ref);
You can also specify options when you make the method calls and these values
will be merged with the values specified when the object was created. Values
specified in a method call take precedence.
Overriding methods is a more advanced topic but might be useful if for example
you wished to provide an alternative routine for escaping character data (the
escape_value method) or for building the initial parse tree (the build_tree
method).
Note: when called as methods, the C<XMLin()> and C<XMLout()> routines may be
called as C<xml_in()> or C<xml_out()>. The method names are aliased so the
only difference is the aesthetics.
=head1 STRICT MODE
If you import the B<XML::Simple> routines like this:
use XML::Simple qw(:strict);
the following common mistakes will be detected and treated as fatal errors
=over 4
=item *
Failing to explicitly set the C<KeyAttr> option - if you can't be bothered
reading about this option, turn it off with: KeyAttr => [ ]
=item *
Failing to explicitly set the C<ForceArray> option - if you can't be bothered
reading about this option, set it to the safest mode with: ForceArray => 1
=item *
Setting ForceArray to an array, but failing to list all the elements from the
KeyAttr hash.
=item *
Data error - KeyAttr is set to say { part => 'partnum' } but the XML contains
one or more E<lt>partE<gt> elements without a 'partnum' attribute (or nested
element). Note: if strict mode is not set but -w is, this condition triggers a
warning.
=item *
Data error - as above, but value of key attribute (eg: partnum) is not a
scalar string (due to nested elements etc). This will also trigger a warning
if strict mode is not enabled.
=back
=head1 SAX SUPPORT
From version 1.08_01, B<XML::Simple> includes support for SAX (the Simple API
for XML) - specifically SAX2.
In a typical SAX application, an XML parser (or SAX 'driver') module generates
SAX events (start of element, character data, end of element, etc) as it parses
an XML document and a 'handler' module processes the events to extract the
required data. This simple model allows for some interesting and powerful
possibilities:
=over 4
=item *
Applications written to the SAX API can extract data from huge XML documents
without the memory overheads of a DOM or tree API.
=item *
The SAX API allows for plug and play interchange of parser modules without
having to change your code to fit a new module's API. A number of SAX parsers
are available with capabilities ranging from extreme portability to blazing
performance.
=item *
A SAX 'filter' module can implement both a handler interface for receiving
data and a generator interface for passing modified data on to a downstream
handler. Filters can be chained together in 'pipelines'.
=item *
One filter module might split a data stream to direct data to two or more
downstream handlers.
=item *
Generating SAX events is not the exclusive preserve of XML parsing modules.
For example, a module might extract data from a relational database using DBI
and pass it on to a SAX pipeline for filtering and formatting.
=back
B<XML::Simple> can operate at either end of a SAX pipeline. For example,
you can take a data structure in the form of a hashref and pass it into a
SAX pipeline using the 'Handler' option on C<XMLout()>:
use XML::Simple;
use Some::SAX::Filter;
use XML::SAX::Writer;
my $ref = {
.... # your data here
};
my $writer = XML::SAX::Writer->new();
my $filter = Some::SAX::Filter->new(Handler => $writer);
my $simple = XML::Simple->new(Handler => $filter);
$simple->XMLout($ref);
You can also put B<XML::Simple> at the opposite end of the pipeline to take
advantage of the simple 'tree' data structure once the relevant data has been
isolated through filtering:
use XML::SAX;
use Some::SAX::Filter;
use XML::Simple;
my $simple = XML::Simple->new(ForceArray => 1, KeyAttr => ['partnum']);
my $filter = Some::SAX::Filter->new(Handler => $simple);
my $parser = XML::SAX::ParserFactory->parser(Handler => $filter);
my $ref = $parser->parse_uri('some_huge_file.xml');
print $ref->{part}->{'555-1234'};
You can build a filter by using an XML::Simple object as a handler and setting
its DataHandler option to point to a routine which takes the resulting tree,
modifies it and sends it off as SAX events to a downstream handler:
my $writer = XML::SAX::Writer->new();
my $filter = XML::Simple->new(
DataHandler => sub {
my $simple = shift;
my $data = shift;
# Modify $data here
$simple->XMLout($data, Handler => $writer);
}
);
my $parser = XML::SAX::ParserFactory->parser(Handler => $filter);
$parser->parse_uri($filename);
I<Note: In this last example, the 'Handler' option was specified in the call to
C<XMLout()> but it could also have been specified in the constructor>.
=head1 ENVIRONMENT
If you don't care which parser module B<XML::Simple> uses then skip this
section entirely (it looks more complicated than it really is).
B<XML::Simple> will default to using a B<SAX> parser if one is available or
B<XML::Parser> if SAX is not available.
You can dictate which parser module is used by setting either the environment
variable 'XML_SIMPLE_PREFERRED_PARSER' or the package variable
$XML::Simple::PREFERRED_PARSER to contain the module name. The following rules
are used:
=over 4
=item *
The package variable takes precedence over the environment variable if both are defined. To force B<XML::Simple> to ignore the environment settings and use
its default rules, you can set the package variable to an empty string.
=item *
If the 'preferred parser' is set to the string 'XML::Parser', then
L<XML::Parser> will be used (or C<XMLin()> will die if L<XML::Parser> is not
installed).
=item *
If the 'preferred parser' is set to some other value, then it is assumed to be
the name of a SAX parser module and is passed to L<XML::SAX::ParserFactory.>
If L<XML::SAX> is not installed, or the requested parser module is not
installed, then C<XMLin()> will die.
=item *
If the 'preferred parser' is not defined at all (the normal default
state), an attempt will be made to load L<XML::SAX>. If L<XML::SAX> is
installed, then a parser module will be selected according to
L<XML::SAX::ParserFactory>'s normal rules (which typically means the last SAX
parser installed).
=item *
if the 'preferred parser' is not defined and B<XML::SAX> is not
installed, then B<XML::Parser> will be used. C<XMLin()> will die if
L<XML::Parser> is not installed.
=back
Note: The B<XML::SAX> distribution includes an XML parser written entirely in
Perl. It is very portable but it is not very fast. You should consider
installing L<XML::LibXML> or L<XML::SAX::Expat> if they are available for your
platform.
=head1 ERROR HANDLING
The XML standard is very clear on the issue of non-compliant documents. An
error in parsing any single element (for example a missing end tag) must cause
the whole document to be rejected. B<XML::Simple> will die with an appropriate
message if it encounters a parsing error.
If dying is not appropriate for your application, you should arrange to call
C<XMLin()> in an eval block and look for errors in $@. eg:
my $config = eval { XMLin() };
PopUpMessage($@) if($@);
Note, there is a common misconception that use of B<eval> will significantly
slow down a script. While that may be true when the code being eval'd is in a
string, it is not true of code like the sample above.
=head1 EXAMPLES
When C<XMLin()> reads the following very simple piece of XML:
<opt username="testuser" password="frodo"></opt>
it returns the following data structure:
{
'username' => 'testuser',
'password' => 'frodo'
}
The identical result could have been produced with this alternative XML:
<opt username="testuser" password="frodo" />
Or this (although see 'ForceArray' option for variations):
<opt>
<username>testuser</username>
<password>frodo</password>
</opt>
Repeated nested elements are represented as anonymous arrays:
<opt>
<person firstname="Joe" lastname="Smith">
<email>joe@smith.com</email>
<email>jsmith@yahoo.com</email>
</person>
<person firstname="Bob" lastname="Smith">
<email>bob@smith.com</email>
</person>
</opt>
{
'person' => [
{
'email' => [
'joe@smith.com',
'jsmith@yahoo.com'
],
'firstname' => 'Joe',
'lastname' => 'Smith'
},
{
'email' => 'bob@smith.com',
'firstname' => 'Bob',
'lastname' => 'Smith'
}
]
}
Nested elements with a recognised key attribute are transformed (folded) from
an array into a hash keyed on the value of that attribute (see the C<KeyAttr>