(many combinations of a dozen other data formats, see below)
=head1 DESCRIPTION
The DBD::AnyData module provides a DBI/SQL interface to data in many formats and from many sources.
Currently supported formats include general format flatfiles (CSV, Fixed Length, Tab or Pipe "delimited", etc.), specific formats (passwd files, web logs, etc.), a variety of other kinds of formats (XML, Mp3, HTML tables), and, for some operations, any DBI accessible database. The number of supported formats will continue to grow rapidly since there is an open API making it easy for any author to create additional format parsers which can be plugged in to AnyData.
Data in these various formats can come from local files, from remote files, or from perl data structures such as strings and arrays.
Regardless of the format or source of the data, it may be accessed and/or modified using all standard DBI methods and a subset of SQL syntax.
In addition to standard database access to files, the module also supports in-memory tables which allow you to create temporary views; to combine data from a number of sources; to quickly prototype database systems; and to display or save the data in any of the supported formats (e.g. to display data in a CSV file as an HTML table). These in-memory tables can be created from any combination of DBI databases or files of any format. They may also be created from perl data structures which means it's possible to quickly prototype a database system without any file access or rdbms backend.
The module also supports converting files between any of the supported formats (e.g. save selected data from MySQL or Oracle to an XML file).
The ad_convert() method works on the entire database. If you need to convert only a selected portion of the databse, use ad_import() with a SELECT clause and then ad_export() it to the new format.
The ad_import method by default closes the connection for the imported database. If you need to continue using the handle for the other datbase, pass the flag {keep_connection=>1}:
my $dbh = DBI->connect('dbi:AnyData:(RaiseError=>1)');
my $mysql_dbh = DBI->connect('dbi:mysql:database=test:(RaiseError=>1)'),
$dbh->func(
'cars',
'DBI',
$mysql_dbh,
{ keep_connection=>1 },
'ad_import');
#...
$mysql_dbh->disconnect;
=head2 Passwd, Weblog, Mp3
=head2 Other Formats
DBD::AnyData supports an open API that allows other authors to build support for other formats. This means that the list of supported formats will continually grow. At the moment Wes Hardaker is working on AnyData::Format::SNMP and Earl Cahill is working on AnyData::Format::Storable. Anyone who is interested in working on a new format module, please contact me (mailto:jeff@vpservices.com)
=head1 FURTHER DETAILS
=head2 Converting between formats
The $dbh->func(...,'ad_convert') method provides a one-step way to
convert between any of the data formats supported by DBD::AnyData.
For example: read a CSV file and save it as an XML file or vice versa.
See the section below on "convert" for details. See the section on
"Working with other DBI databases" for information on converting data
from ORACLE, or MySQL or almost any other database into XML, CSV, or
any of the DBD::AnyData formats.
=head2 Using remote files
You can import remote files accessible by FTP or HTTP directly into a
DBD::AnyData in memory database using 'ad_import' or you can use ad_convert
to print the remote files as strings or save them to a local file.
If the $file parameter of ad_import or ad_convert starts with "ftp" or "http", DBD::AnyData will call LWP behind the scenes and fetch the file.
This will fetch the remote file, parse its XML, and provide you with
an in-memory table which you can query with DBI/SQL or save to a local
file:
$dbh->func(
'news',
'XML',
'http://www.somewhere.org/files/news.xml',
'ad_import');
This will fetch the remote file, parse its XML, and print it out
as an HTML table:
print $dbh->func(
'XML',
'http://www.somewhere.org/files/news.xml',
'HTMLtable',
'ad_convert');
If the remote file requires authorization, you can include values for
"user" and "pass" in the $flags parameter:
$dbh->func(
'news',
'XML',
'http://www.somewhere.org/news.xml',
{ user => 'fred', passwd => 'x9y77d' },
'ad_import');
=head2 Working with in-memory tables
In addition to normal file storage databases, DBD::AnyData supports databases that are stored and modified in-memory. You may either simply query the databases and then close them, or you can use the ad_export method to display data to the screen or save it to a file. There are a variety of reasons you might want to work with in-memory databases, including:
Prototyping: quickly create a database from a string, an array, or the DATA section of a script without needing any file access or rdbms.
Creating Views: pull selected columns and selected rows from an ORACLE or MySQL database en masse and work with them in memory rather than having to use the full database.
Combining Data from multiple formats: create a single in-memory table by importing selected columns and rows from e.g. an XML file, an Oracle database, and a CSV file.
Rollback/Commit: You can make multiple changes to the in-memory database and then, depending on the sucess or failure of those changes either commit by using export to save the changes to disk or skip export which effectively rolls back the database to its state before the import.
In-memory tables may be modified with DBI/SQL commands and can then be either printed to the screen or saved as a file in any of the AnyData formats. (see the ad_export method below)
In-memory tables may be created in several ways:
1. Create and populate the table from one or more local or remote files
2. Create and populate the table from a string
3. Create and populate the table from an array
4. Use DBI/SQL commands to create & populate the table
=over 3
=item Creating in-memory tables from local or remote files
You can create an in-memory table from a string in a specified format,
Note: the string should be enclosed in square brackets.
This reads a CSV file into an in-memory table. Further access and
modification takes place in-memory without further file access unless
you specifically use ad_export to save the table to a file.
You can also import columns from several different formats into a single table. For example this imports data from an XML file, a CSV file and a Pipe delimited file into a single in-memory database. Note that the $table parameter is the same in each call so the data from each import will be appended into that one table.
When you import more than one table into a single table like this, the resulting table will be a cross join unless you supply a lookup_key flag. If a lookup_key is supplied, then a the resulting table will be a full outer join on that key column. This feature is experimental for the time being but should work as expected unless there are columns other than the key column with the same names in the various tables. You can specify that the joined table will only contain certain columns by creating a blank empty table before doing the imports. You can specify only certain rows with the sql flag. For example:
* a valid file name (relative or absolute) on the local computer
* a valid absolute FTP or HTTP URL
* an arrayref containing arrayrefs of rows with column names first
[
['make','model'],
['Honda','Odyssy'],
['Ford','Suburban'],
]
* an arrayref containing a string in a specified format
CSV : ["id,phrase\n1,foo\n2,bar"]
Pipe : ["id|phrase\n1|foo\n2|bar"]
* a reference to the DATA section of a file
[<DATA>]
* a DBI Database handle
DBI->connect('dbi:mysql:database=...)
The ad_catalog method is the standard way to treat files as databases. Each time you access data, it is read from the file and each time you modify data, it is written to the file. The entire file is never read en masse into memory unless you explicitly request it.
The ad_import method can import data from local or remote files,
from any other DBI accessible database, from perl data structures such
as arrays and strings. You may import an entire table or only the columns and rows you specify. If the data is imported from a file, all of the data you select is read into memory when you call ad_import so this should not be done with selections larger than will fit in your memory. :-). All accessing and modification is done in memory. If you want to save the results of any changes, you will need to call ad_export explicitly.
Not all formats and data sources will work with all methods. Here is a
summary of what will work. "all sources" includes local files, remote files,
any DBI accessible database, perl arrayrefs, perl strings.
Import From all formats, all sources
Convert From all formats, all sources
Convert To all formats except DBI, local files, arrays or strings only
Export To all formats except DBI, local files, arrays or strings only
Catalog all formats except DBI, XML, HTMLtable, Mp3, ARRAY,
local files only
=head2 connect
The DBI->connect call
=head2 ad_catalog
PURPOSE:
Creates an association betweeen a table name, a data format, and a file.
# CONVERT A SELECTION FROM A MySQL DATABASE TO XML
# AND SAVE IT IN A FILE
#
$dbh->func(
'DBI',
$mysql_dbh,
'XML',
'data.xml',
{sql=>"SELECT make,model FROM CARS where year > 1996"}
'ad_convert');
REMARKS
The format 'DBI' (any DBI accessible database) may be used as the
source of a conversion, but not as the target of a conversion.
The format 'ARRAY' may be used to indicate that the source of the
conversion is a reference to an array. Or that the result of the
conversion should be returned as an array reference. (See above,
working with in-memory database for information on the structure of
the array reference).
=head2 Data Sources
The ad_import and ad_convert methods can take data from many
sources, including local files, remote files, strings, arrays,
any DBI accessible database, the DATA section of a script.
The $data_source parameter to ad_import and ad_convert will
vary depending on the specific data source, see below.
Local Files
A string containing the name of a local file. It may either
be a full path, or a path or file relative to the currently
defined f_dir (see ?);
e.g. '/users/me/data.xml'
Remote Files
A string containing the url of the data. Must start with
'ftp://' or 'http://'
e.g. 'http://www.somewhere.org/misc/news.xml'
Arrays of Arrays
A reference to an array of data. Each row of the data is
a reference to an array of values. The first row is the
column names. E.G.:
[
['make','model'],
['Honda','Odyssy'],
['Ford','Suburban'],
]
Strings
A string in the specified format including all field and record
separators. The string should be the only row in an array reference
(i.e. it should be enclosed in square brackets)
e.g. a CSV string
["id,phrase\n1,foo\n2,bar"]
or in Pipe Delimited string
["id|phrase\n1|foo\n2|bar"]
The DATA section of a file
A reference to the array obtained from the lines after
__END__ in a script.
[<DATA>]
DBI Databases
A database handle for a specified rdbms.
DBI->connect('dbi:mysql:database=...)
=head2 ad_clear
PURPOSE:
Clears an in-memory table (deletes it from memory)
SYNTAX:
$dbh->func( $table, 'ad_clear' )
PARAMETERS:
$table = the name of the in-memory table to clear
REMARKS:
In-memory tables will be deleted from memory automatically when the
database handle used to create them goes out of scope. They will also
be deleted if you call $dbh->disconnect() on the database handle
used to create them. The ad_clear method is a way to free up memory
if you intend to keep using the database handle but no longer need a
given table. As with other (all?) Perl memory operations, this frees
memory for the remainder of your perl script to use but does not decrease
the total amount of system memory used by the script.
=head2 SQL Syntax
Currently only a limited subset of SQL commands are supported.
Only a single table may be used in each command. This means
That there are *no joins*, but see the section above on simulating
joins. In coming months additional SQL capabilities will be added,
so keep your eyes out for ANNOUNCE message on usenet or the dbi-users
mailing list (see below "Getting More Help").
Here is a brief synopsis, please see the documentation for
SQL::Statement for a more complete description of these commands.
CREATE TABLE $table
( $col1 $type1, ..., $colN $typeN,
[ PRIMARY KEY ($col1, ... $colM) ] )
DROP TABLE $table
INSERT INTO $table
[ ( $col1, ..., $colN ) ]
VALUES ( $val1, ... $valN )
DELETE FROM $table
[ WHERE $wclause ]
UPDATE $table
SET $col1 = $val1, ... $colN = $valN
[ WHERE $wclause ]
SELECT [DISTINCT] $col1, ... $colN
FROM $table
[ WHERE $wclause ]
[ ORDER BY $ocol1 [ASC|DESC], ... $ocolM [ASC|DESC] ]
$wclause [NOT] $col $op $val|$col
[ AND|OR $wclause2 ... AND|OR $wclauseN ]
$op = | <> | < | > | <= | >=
| IS NULL | IS NOT NULL | LIKE | CLIKE
The "CLIKE" operator works exactly like "LIKE" but is case insensitive.
=head1 GETTING MORE HELP
=head1 ACKNOWLEDGEMENTS
Many people have contributed ideas and code, found bugs, and generally been supportive including Tom Lowery, Andy Duncan, Randal Schwartz, Michel Rodriguez, Wes Hardraker, Bob Starr, Earl Cahill, Bryan Fife, Matt Sisk, Matthew Wickline, Wolfgang Weisseberg. Thanks to Jochen Weidmann for DBD::File and SQL::Statement and of course Tim Bunce and Alligator Descartes for DBI and its documentation.