Import Subtleties
Python's import
can behave in a way that might surprise you. This document explains when that can happen, and what the effects are, and how to avoid it.
Minimalist Practical Summary
Code in vespa/common
should always import with the form import vespa.common.foo
rather than the lazier import foo
.
Introduction
We use two kinds of import statements. I'll call them qualified and local imports. Qualified imports start with vespa.
and specify the location of the target module inside the vespa package. Here's three examples. Note that the util_xml
alias doesn't change the fact that what it's aliasing is qualified –
#!python
import vespa.common.constants
import vespa.common.util.misc
import vespa.common.util.xml_ as util_xml
What I call a local import relies on the fact that Python includes the current directory in the search path for a module reference. That's why modules in vespa/common
can do this –
#!python
import constants
import mrs_metabolite
import util.xml_ as util_xml
Subtle Side Effect Number 1
When Python encouters an import statement for a module that's already been imported, it checks sys.modules
to see if the module is already imported. If it is, then the import is essentially a no-op. So in Vespa where we have import wx
and import numpy
at the top of many, many files, we still only load one copy of wx
and one copy of numpy
.
However, the files in vespa/common
can be referenced in two ways. Files outside of that directory tree use qualified references, like in the examples above. Files in that tree can use qualified or local references. Both work fine.
However, importing via both qualified and local references imports the module twice.
We can demonstrate this with an interactive Python session started in the vespa/common
directory.
#!python
>>> import mrs_metabolite
>>> import vespa.common.mrs_metabolite
>>> mrs_metabolite is vespa.common.mrs_metabolite
False
>>> import sys
>>> sys.modules["mrs_metabolite"]
<module 'mrs_metabolite' from 'mrs_metabolite.pyc'>
>>> sys.modules["vespa.common.mrs_metabolite"]
<module 'vespa.common.mrs_metabolite' from '/Users/philip/w/duke/src/vespa/common/mrs_metabolite.pyc'>
>>>
The is
comparison in the example above proves that Python considers these different modules. mrs_metabolite
has been loaded twice.
Subtle Side Effect Number 2
So far we've demonstrated an inefficiency, but nothing worse. Now we'll build on the previous scenario to show how this creates a real problem. Let's pick up where we left off in the interactive Python session.
#!python
>>> mrs_metabolite.Metabolite is vespa.common.mrs_metabolite.Metabolite
False
>>> metab = mrs_metabolite.Metabolite()
>>> isinstance(metab, mrs_metabolite.Metabolite)
True
>>> isinstance(metab, vespa.common.mrs_metabolite.Metabolite)
False
>>>
What the session above shows is that although we might think of the modules as being the same, to Python they're different. Not only are the modules different, Python considers the classes inside the module to be different. That's why the first isinstance()
returns True
and the second False
.
Practical Consequences of Multiple Imports
- It breaks
isinstance()
. This is a headache for us. - It's inefficient. In the grand scheme of things, the inefficiences probably don't matter much. But add enough small inefficiences together and you get trouble.
Recommendation
When we have a choice between using a local or qualified import, we should use the latter. Since only files in vespa/common
have the choice, we just need to be careful that those files always use qualified imports.
Side Note
Python offers absolute and relative imports. Don't confuse the issue described in this document with those Python features! Absolute and relative imports determine how Python resolves ambiguous names. For instance, if there's a module called draw.py
in your local directory and also a draw
package in your site-packages
directory, which should Python grab when your code says import draw
? That's what absolute and relative imports are concerned with.