home *** CD-ROM | disk | FTP | other *** search
-
- =============================
- The sgmlop accelerator module
- =============================
-
- sgmlop contains an optimized SGML/XML parser, designed as an add-on to
- the sgmllib/htmllib and xmllib modules shipped with Python 1.5.
-
- using empty callbacks, this driver is about 6 times faster than the
- original xmllib implementation. when using sgmlop directly, it can be
- more than 50 times faster. for more information on benchmarking
- sgmlop, see below.
-
- Enjoy /F
-
- fredrik@pythonware.com
- http://www.pythonware.com
-
- --------------------------------------------------------------------
- Copyright (c) 1998 by Secret Labs AB.
-
- Permission to use, copy, modify, and distribute this software and
- its associated documentation for any purpose and without fee is
- hereby granted. This software is provided as is.
- --------------------------------------------------------------------
-
-
- release info
- ------------
-
- This is the third public release. Changes include:
-
- - added a starttag attribute parser written in C. this gives
- a considerable speedup on files using lots of tag attributes
-
- - the callback object can now have an sgmllib/xmllib interface
- (finish/handle) *or* a saxlib interface (see saxhack.py for
- an example).
-
-
- contents
- --------
-
- README this file
-
- sgmllib.py a drop-in replacement for the sgmllib.py module
- distributed with Python 1.5
-
- xmllib.py a drop-in replacement for the xmllib.py module
- distributed with Python 1.5
-
- saxhack.py illustrates how to implement the SAX DocumentHandler
- interface directly with native sgmlop. this is over
- 30 times faster than a corresponding parser based on
- the original xmllib.
-
- sgmlop.dll a precompiled version for python 1.5 on win32
-
- sgmlop.c accelerator source code
-
- sgmlop.mak makefile for MSVC++ 5.0 generated by opal/pymake.
- make sure to change the directory names before you
- use it on your own machine.
-
- bench*.py various test files and benchmarks
- test*.py
-
-
- benchmarks
- ----------
-
- benchmarking the sgmlop parser is non-trivial; if you don't install
- any callbacks, it's some 300 times faster than the original xmllib (it
- can parse more than 10 MB/s on a fast Pentium II). this means that in
- a typical test, far more time is lost on the Python method call
- overhead than on the parsing proper.
-
- my earlier benchmarks used a 'collecting' parser, which stored all
- tags and elements in a list. with that setup, sgmlop is roughly 5
- times faster than the original implementation.
-
- the benchxml.py script provided with this release uses empty parsers
- instead (that is, all callbacks exists, but they include only a 'pass'
- operation), in order to measure the parser and Python call overhead
- only.
-
- here's a typical test run (with the time for the original xmllib
- implementation set to 1):
-
- parser time
- --------------------------------------------------------------------
- slow xmllib 1.0
- fast xmllib 0.156 (6.4x)
- sgmlop dummy 0.019 (53.5x)
- sgmlop null 0.003 (297.8x)
-
- the null time is obtained by running the parser without any callbacks
- installed.
-