Deflate and Inflate

The classes of the "Big Three" Simulation objects (experiments, metabs and pulse sequences) and their subobjects all implement methods called deflate and inflate. This document explains the purpose of those methods and how they work.

Nomenclature

The names are from the Wikipedia article on serialization which describes what these functions are for. In short, deflate turns reduces object to raw data in some other format (e.g. XML) while inflate reverses the process.

The names serialize and deserialize would probably have worked just as well or even better. When I chose deflate/inflate I had in mind that the functions might do more than just convert the objects to/from a disk-friendly format. That has not and probably will not happen, so the more standard terms serialize and deserialize might be a better choice.

Formats

The functions recognize two formats: Python dictionaries and XML stored in an ElementTree object (more on this below). Both are used by inflate while deflate uses only the ElementTree format.

ElementTree Format

The default format is a Python standard library ElementTree. This is a very useful representation as it's dead easy to turn it into an XML file. The library even handles encoding issues, which is nice. Since this is the default and also the format on which Simulation relies for all of its exports and imports, considerable time & attention has been invested in the code.

Dictionary Format

The other format supported by inflate/deflate is that of Python dictionaries. In practice, the delfate-to-dict path is unused (see below for why); only inflate uses dicts. Dictionaries are passed to inflate when reconstituting an object from the database.

Side note: The previous sentence is not strictly true, but is "true enough" thanks to Python's duck typing. SQLite returns row objects to our database code. SQLite's row objects are tuple-ish with one dict-like feature. Specifically, the columns are accessible by column name. We improved the dict-ishness of SQLite's default row object with our _BetterRow class which is implemented in db.py. It's actually _BetterRow objects which get passed to inflate, but _BetterRow objects are sufficiently dict-like that the inflate code can treat them as dicts. One could also pass real dicts to inflate, and early versions of Simulation did just that but we don't do so anymore.

Here's a subtle but important point – when inflating from a dict, inflate compares the dict keys with the object's attribute names. When it finds a match, it assigns the value associated with the key to the attribute of the same name. Keys that don't match any attribute names are ignored, and attributes that don't match any key names are left untouched.

For example, suppose an Experiment object's inflate method is passed this dict:

{ 
    "name" : "Fred's Experiment", 
    "foo"  : 42 
}

Since Experiment objects have a "name" attribute and the dict has a key of the same name, "Fred's Experiment" is assigned to experiment.name. The experiment object's other attributes (created, investigator, etc.) are unchanged because their names aren't represented in the dict keys. And the "foo" key is ignored because Experiment objects have no attribute called foo.

As mentioned above, SQLite returns rows keyed by column names. In general, our database column names match our object attribute names, so the dict-like rows returned by SQLite can be passed directly to inflate without alteration. For instance, the SQL statement SELECT name FROM experiments will produce rows containing keys called "name".

Occasionally this relationship falters. A good example is in the PulseSequenceParameter object which has an attribute called default. That's fine for Python, but default is a reserved word in SQL so the corresponding column is called default_value. In this case, the database code renames the key from default_value to default before passing the dict to inflate

Why We Don't Deflate to Dicts

Our code inflates and deflates using ElementTrees and inflates using dicts, so why doesn't it also deflate using dicts? What's with the inconsistency?

Just as deflating to an ElementTree happens when an object is about to be written to XML, the natural place for deflating to a dict would be when the object is about to be written to the database. We could do that, and it would make the code more consistent. However, it wouldn't provide much benefit.

Regardless of whether or not the database code is passed a dict or one of our custom Vespa objects, it still has to know a bit about the object in order to represent it correctly in the database. In that light, it doesn't make much difference whether the data comes from a dict or a Vespa object. And since the data is already in a Vespa object, deflating it into a dict would just be an extra step that provides no benefit.