home *** CD-ROM | disk | FTP | other *** search
- <TITLE>pickle -- Python library reference</TITLE>
- Next: <A HREF="../s/shelve" TYPE="Next">shelve</A>
- Prev: <A HREF="../t/traceback" TYPE="Prev">traceback</A>
- Up: <A HREF="../p/python_services" TYPE="Up">Python Services</A>
- Top: <A HREF="../t/top" TYPE="Top">Top</A>
- <H1>3.4. Standard Module <CODE>pickle</CODE></H1>
- The <CODE>pickle</CODE> module implements a basic but powerful algorithm for
- ``pickling'' (a.k.a. serializing, marshalling or flattening) nearly
- arbitrary Python objects. This is the act of converting objects to a
- stream of bytes (and back: ``unpickling'').
- This is a more primitive notion than
- persistency --- although <CODE>pickle</CODE> reads and writes file objects,
- it does not handle the issue of naming persistent objects, nor the
- (even more complicated) area of concurrent access to persistent
- objects. The <CODE>pickle</CODE> module can transform a complex object into
- a byte stream and it can transform the byte stream into an object with
- the same internal structure. The most obvious thing to do with these
- byte streams is to write them onto a file, but it is also conceivable
- to send them across a network or store them in a database. The module
- <CODE>shelve</CODE> provides a simple interface to pickle and unpickle
- objects on ``dbm''-style database files.
- Unlike the built-in module <CODE>marshal</CODE>, <CODE>pickle</CODE> handles the
- following correctly:
- <UL>
- <LI>• recursive objects (objects containing references to themselves)
- <P>
- <LI>• object sharing (references to the same object in different places)
- <P>
- <LI>• user-defined classes and their instances
- <P>
- </UL>
- The data format used by <CODE>pickle</CODE> is Python-specific. This has
- the advantage that there are no restrictions imposed by external
- standards such as CORBA (which probably can't represent pointer
- sharing or recursive objects); however it means that non-Python
- programs may not be able to reconstruct pickled Python objects.
- <P>
- The <CODE>pickle</CODE> data format uses a printable ASCII representation.
- This is slightly more voluminous than a binary representation.
- However, small integers actually take <I>less</I> space when
- represented as minimal-size decimal strings than when represented as
- 32-bit binary numbers, and strings are only much longer if they
- contain many control characters or 8-bit characters. The big
- advantage of using printable ASCII (and of some other characteristics
- of <CODE>pickle</CODE>'s representation) is that for debugging or recovery
- purposes it is possible for a human to read the pickled file with a
- standard text editor. (I could have gone a step further and used a
- notation like S-expressions, but the parser
- (currently written in Python) would have been
- considerably more complicated and slower, and the files would probably
- have become much larger.)
- <P>
- The <CODE>pickle</CODE> module doesn't handle code objects, which the
- <CODE>marshal</CODE> module does. I suppose <CODE>pickle</CODE> could, and maybe
- it should, but there's probably no great need for it right now (as
- long as <CODE>marshal</CODE> continues to be used for reading and writing
- code objects), and at least this avoids the possibility of smuggling
- Trojan horses into a program.
- For the benefit of persistency modules written using <CODE>pickle</CODE>, it
- supports the notion of a reference to an object outside the pickled
- data stream. Such objects are referenced by a name, which is an
- arbitrary string of printable ASCII characters. The resolution of
- such names is not defined by the <CODE>pickle</CODE> module --- the
- persistent object module will have to implement a method
- <CODE>persistent_load</CODE>. To write references to persistent objects,
- the persistent module must define a method <CODE>persistent_id</CODE> which
- returns either <CODE>None</CODE> or the persistent ID of the object.
- <P>
- There are some restrictions on the pickling of class instances.
- <P>
- First of all, the class must be defined at the top level in a module.
- <P>
- Next, it must normally be possible to create class instances by
- calling the class without arguments. Usually, this is best
- accomplished by providing default values for all arguments to its
- <CODE>__init__</CODE> method (if it has one). If this is undesirable, the
- class can define a method <CODE>__getinitargs__()</CODE>, which should
- return a <I>tuple</I> containing the arguments to be passed to the
- class constructor (<CODE>__init__()</CODE>).
- Classes can further influence how their instances are pickled --- if the class
- defines the method <CODE>__getstate__()</CODE>, it is called and the return
- state is pickled as the contents for the instance, and if the class
- defines the method <CODE>__setstate__()</CODE>, it is called with the
- unpickled state. (Note that these methods can also be used to
- implement copying class instances.) If there is no
- <CODE>__getstate__()</CODE> method, the instance's <CODE>__dict__</CODE> is
- pickled. If there is no <CODE>__setstate__()</CODE> method, the pickled
- object must be a dictionary and its items are assigned to the new
- instance's dictionary. (If a class defines both <CODE>__getstate__()</CODE>
- and <CODE>__setstate__()</CODE>, the state object needn't be a dictionary
- --- these methods can do what they want.) This protocol is also used
- by the shallow and deep copying operations defined in the <CODE>copy</CODE>
- module.
- Note that when class instances are pickled, their class's code and
- data are not pickled along with them. Only the instance data are
- pickled. This is done on purpose, so you can fix bugs in a class or
- add methods and still load objects that were created with an earlier
- version of the class. If you plan to have long-lived objects that
- will see many versions of a class, it may be worthwhile to put a version
- number in the objects so that suitable conversions can be made by the
- class's <CODE>__setstate__()</CODE> method.
- <P>
- When a class itself is pickled, only its name is pickled --- the class
- definition is not pickled, but re-imported by the unpickling process.
- Therefore, the restriction that the class must be defined at the top
- level in a module applies to pickled classes as well.
- <P>
- The interface can be summarized as follows.
- <P>
- To pickle an object <CODE>x</CODE> onto a file <CODE>f</CODE>, open for writing:
- <P>
- <UL COMPACT><CODE>p = pickle.Pickler(f)<P>
- p.dump(x)<P>
- </CODE></UL>
- A shorthand for this is:
- <P>
- <UL COMPACT><CODE>pickle.dump(x, f)<P>
- </CODE></UL>
- To unpickle an object <CODE>x</CODE> from a file <CODE>f</CODE>, open for reading:
- <P>
- <UL COMPACT><CODE>u = pickle.Unpickler(f)<P>
- x = u.load()<P>
- </CODE></UL>
- A shorthand is:
- <P>
- <UL COMPACT><CODE>x = pickle.load(f)<P>
- </CODE></UL>
- The <CODE>Pickler</CODE> class only calls the method <CODE>f.write</CODE> with a
- string argument. The <CODE>Unpickler</CODE> calls the methods <CODE>f.read</CODE>
- (with an integer argument) and <CODE>f.readline</CODE> (without argument),
- both returning a string. It is explicitly allowed to pass non-file
- objects here, as long as they have the right methods.
- The following types can be pickled:
- <UL>
- <LI>• <CODE>None</CODE>
- <P>
- <LI>• integers, long integers, floating point numbers
- <P>
- <LI>• strings
- <P>
- <LI>• tuples, lists and dictionaries containing only picklable objects
- <P>
- <LI>• classes that are defined at the top level in a module
- <P>
- <LI>• instances of such classes whose <CODE>__dict__</CODE> or
- <CODE>__setstate__()</CODE> is picklable
- <P>
- </UL>
- Attempts to pickle unpicklable objects will raise the
- <CODE>PicklingError</CODE> exception; when this happens, an unspecified
- number of bytes may have been written to the file.
- <P>
- It is possible to make multiple calls to the <CODE>dump()</CODE> method of
- the same <CODE>Pickler</CODE> instance. These must then be matched to the
- same number of calls to the <CODE>load()</CODE> instance of the
- corresponding <CODE>Unpickler</CODE> instance. If the same object is
- pickled by multiple <CODE>dump()</CODE> calls, the <CODE>load()</CODE> will all
- yield references to the same object. <I>Warning</I>: this is intended
- for pickling multiple objects without intervening modifications to the
- objects or their parts. If you modify an object and then pickle it
- again using the same <CODE>Pickler</CODE> instance, the object is not
- pickled again --- a reference to it is pickled and the
- <CODE>Unpickler</CODE> will return the old value, not the modified one.
- (There are two problems here: (a) detecting changes, and (b)
- marshalling a minimal set of changes. I have no answers. Garbage
- Collection may also become a problem here.)
- <P>
- Apart from the <CODE>Pickler</CODE> and <CODE>Unpickler</CODE> classes, the
- module defines the following functions, and an exception:
- <P>
- <DL><DT><B>dump</B> (<VAR>object</VAR>, <VAR>file</VAR>) -- function of module pickle<DD>
- Write a pickled representation of <VAR>obect</VAR> to the open file object
- <VAR>file</VAR>. This is equivalent to <CODE>Pickler(file).dump(object)</CODE>.
- </DL>
- <DL><DT><B>load</B> (<VAR>file</VAR>) -- function of module pickle<DD>
- Read a pickled object from the open file object <VAR>file</VAR>. This is
- equivalent to <CODE>Unpickler(file).load()</CODE>.
- </DL>
- <DL><DT><B>dumps</B> (<VAR>object</VAR>) -- function of module pickle<DD>
- Return the pickled representation of the object as a string, instead
- of writing it to a file.
- </DL>
- <DL><DT><B>loads</B> (<VAR>string</VAR>) -- function of module pickle<DD>
- Read a pickled object from a string instead of a file. Characters in
- the string past the pickled object's representation are ignored.
- </DL>
- <DL><DT><B>PicklingError</B> -- exception of module pickle<DD>
- This exception is raised when an unpicklable object is passed to
- <CODE>Pickler.dump()</CODE>.
- </DL>
-