How to generate a module object from a code object in Python - python

Given that I have the code object for a module, how do I get the corresponding module object?
It looks like moduleNames = {}; exec code in moduleNames does something very close to what I want. It returns the globals declared in the module into a dictionary. But if I want the actual module object, how do I get it?
EDIT:
It looks like you can roll your own module object. The module type isn't conveniently documented, but you can do something like this:
import sys
module = sys.__class__
del sys
foo = module('foo', 'Doc string')
foo.__file__ = 'foo.pyc'
exec code in foo.__dict__

As a comment already indicates, in today's Python the preferred way to instantiate types that don't have built-in names is to call the type obtained via the types module from the standard library:
>>> import types
>>> m = types.ModuleType('m', 'The m module')
note that this does not automatically insert the new module in sys.modules:
>>> import sys
>>> sys.modules['m']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'm'
That's a task you must perform by hand:
>>> sys.modules['m'] = m
>>> sys.modules['m']
<module 'm' (built-in)>
This can be important, since a module's code object normally executes after the module's added to sys.modules -- for example, it's perfectly correct for such code to refer to sys.modules[__name__], and that would fail (KeyError) if you forgot this step. After this step, and setting m.__file__ as you already have in your edit,
>>> code = compile("a=23", "m.py", "exec")
>>> exec code in m.__dict__
>>> m.a
23
(or the Python 3 equivalent where exec is a function, if Python 3 is what you're using, of course;-) is correct (of course, you'll normally have obtained the code object by subtler means than compiling a string, but that's not material to your question;-).
In older versions of Python you would have used the new module instead of the types module to make a new module object at the start, but new is deprecated since Python 2.6 and removed in Python 3.

Related

Why does Python 2 try to get a module as a package attribute when using "import ... as ..."?

EDIT: closely related to Imports in __init__.py and import as statement. That question deals with the behaviour of import ... as ... for up to Pyhon 3.6. The change in behaviour I'm describing below was introduced in Python 3.7, with the intention to fix the bug described in that other question. I'm more interested in where the change is documented (or where the two different behaviours, for Py2 up to Py3.6 vs Py3.7+, are respectively documented) rather than how exactly this behaviour arises (as I already mostly understand that as a result of experimenting in preparation for this question).
Consider the following directory structure:
.
└── package
├── __init__.py
├── a.py
└── b.py
The __init__.py file is empty. The two modules package.a and package.b contain, respectively:
# package.a
import sys
print('package.b' in sys.modules)
import package.b as b
spam = 'ham'
print("{} says b is {}".format(__name__, b))
# package.b
import package.a
print("{} says package.a.spam is {}".format(__name__, repr(package.a.spam)))
With Python 3.x (specifically 3.8), When I run python -c "from __future__ import print_function; import package.b" from the root directory, I get
True
package.a says b is <module 'package.b' from 'C:\\[...]\\package\\b.py'>
package.b says package.a.spam is 'ham'
but with Python 2.x (specifically 2.7) I get
True
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "package\b.py", line 1, in <module>
import package.a
File "package\a.py", line 4, in <module>
import package.b as b
AttributeError: 'module' object has no attribute 'b'
The question is: what warrants this difference? Where is this change documented, e.g. the Python docs, a PEP or similar?
I get that package.b hasn't finished initialising when package.a is imported, so the module object for package.b hasn't yet been added as an attribute of the module object for package. But yet the module object itself exists (as it is added to sys.modules), so there shouldn't be any trouble binding the name b to that object, which is what Python 3 does I believe? Python 2 seems like it's not binding it directly to the module object, but rather trying to fetch it by getting an attribute named 'b' from the module object for package.
As far as I can see, there is no such specification in the documentation.
Import statement (Python 3.8):
If the requested module is retrieved successfully, it will be made available in the local namespace in one of three ways:
If the module name is followed by as, then the name following as is bound directly to the imported module.
[...]
Import statement (Python 2.7):
The first form of import statement binds the module name in the local namespace to the module object, and then goes on to import the next identifier, if any. If the module name is followed by as, the name following as is used as the local name for the module.
Notes:
Using from package import b in package/a.py yields the same, only with a different error (i.e. ImportError instead of AttributeError). I suspect the ImportError is just wrapping the underlying AttributeError.
Using import package.b in package/a.py doesn't give the AttributeError upon import in Py2. But, of course, referencing package.b later in the print call produces an AttributeError in both Py2 and Py3.
If you do
import package.submodule
and then try to access package.submodule, that access is an attribute lookup on the module object for package, and it will find whatever object is bound to the submodule attribute (or fail if that attribute is unset). For consistency,
import package.submodule as whatever
performs the same attribute lookup to find the object to bind to whatever.
This was changed in Python 3.7 to fall back to a sys.modules['package.submodule'] lookup if the attribute lookup fails. This change was made for consistency with a previous change in Python 3.5 that made from package import submodule fall back to a sys.modules lookup, and that change was made to make relative imports more convenient.

Class definition from shelve

I'm using python 3.5. Fairly new to python but not new to programming. I have three source files as follows (a much simplified version of what I'm actually doing):
c.py
class C:
def __init__(self, x):
self.x = x
def method(self):
print(self.x)
init.py
import shelve
from c import C
db = shelve.open("DB")
db['key1'] = C("test")
db.close()
test.py
import shelve
db = shelve.open("DB")
obj = db['key1']
obj.method() # this works
C.method(obj) # this doesn't -- 'C' not defined
db.close()
So I run init.py to set up my shelved database. Then I run test.py. It is happy with executing obj.method(), so it seems to know about class C even though I haven't explicitly imported it (Lutz says something about it being stored in the database). But if I try to do C.method(obj) (not that I'd necesarily need to call it this way, but using C as a class (for example to create new objects) has its usefulness) it says 'C' is not defined. But if I add 'from c import C' to test.py then it works. So in one way it seems to know about C's definition, but then again it doesn't. I am curious to know why this is.
When shelve serializes an object (I believe by pickling it), it stores the import path to the class for unpickling later. When it retrieves the object, pickle imports the module (c) and returns an instance of C back to you (that is equivalent to the one that was originally serialized).
So far, this isn't anything new to you based on your observations. However, when it imports c, it doesn't import it into your current namespace. In fact, it imports it into a very localized namespace. So while the c module has been imported (you can find it in sys.modules), you won't find c or C in your current namespace because you haven't imported it there.
Put another way, simply importing something doesn't make it accessible to every module in your program. It only makes it accessible to the module (actually scope) where it was imported. I can import os, but just because os imports sys doesn't mean that I immediately have access to sys. I need to import it too before I can start using it.
>>> import os
>>> sys
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'sys' is not defined
>>> os.sys
<module 'sys' (built-in)>
>>> import sys
>>> sys
<module 'sys' (built-in)>
What you have doesn't work for the reasons stated in mgilson's answer.
A work-around for the problem would be to manually import the class from the module and assign the class to the name C — something along these lines (tested with Python 3.5.1):
import shelve
db = shelve.open("DB")
obj = db['key1']
obj.method() # this works
## Begin added code ##
classname = obj.__class__.__name__
module_name = obj.__module__
module = __import__(module_name, globals(), locals(), [classname], 0)
globals()[classname] = getattr(module, classname)
## Added code end ##
C.method(obj) # this also works now
db.close()

Why __import__() is returning the package instead of the module?

I have this file structure (where the dot is my working directory):
.
+-- testpack
+-- __init__.py
+-- testmod.py
If I load the testmod module with the import statement, I can call a function that is declared within:
>>> import testpack.testmod
>>> testpack.testmod.testfun()
hello
but if I try to do the same using the __import__() function, it doesn't work:
>>> __import__("testpack.testmod").testfun()
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
__import__("testpack.testmod").testfun()
AttributeError: 'module' object has no attribute 'testfun'
indeed, it returns the package testpack instead of the module testmod:
>>> __import__("testpack.testmod").testmod.testfun()
hello
How come?
This behaviour is given in the docs:
When the name variable is of the form package.module, normally, the
top-level package (the name up till the first dot) is returned, not
the module named by name. However, when a non-empty fromlist argument
is given, the module named by name is returned.
...
The statement import spam.ham results in this call:
spam = __import__('spam.ham', globals(), locals(), [], -1)
Note how __import__() returns the toplevel module here because this is
the object that is bound to a name by the import statement.
Also note the warning at the top:
This is an advanced function that is not needed in everyday Python
programming, unlike importlib.import_module().
And then later:
If you simply want to import a module (potentially within a package)
by name, use importlib.import_module().
So the solution here is to use importlib.import_module().
It's worth noting that the double underscores either side of a name in Python imply that the object at hand isn't meant to be used directly most of the time. Just as you should generally use len(x) over x.__len__() or vars(x)/dir(x) over x.__dict__. Unless you know why you need to use it, it's generally a sign something is wrong.

In python after I import a module is there a way of finding what physical file it was loaded from?

Something's acting up in my math package I think and I want to ensure I'm loading the correct module. How do I check the physical file location of loaded modules in python?
Use the __file__ attribute:
>>> import numpy
>>> numpy.__file__
'/usr/lib/python2.7/dist-packages/numpy/__init__.pyc'
Note that built-in modules written in C and statically linked to the interpreter do not have this attribute:
>>> import math
>>> math.__file__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute '__file__'
An other way to obtain the path to the file is using inspect.getfile. It raises TypeError if the object passed is a built.in module, class or function.
On a side note, you should avoid using names that conflict with language built-ins or standard library modules. So, I'd suggest you to rename your math package to something else, or, if it is part of a package like mypackage.math, to avoid importing it directly and use mypackage.math instead.
Check themodule.__file__.
import urllib
print urllib.__file__
>>> import math
>>> math.__file__
'/usr/lib/python2.7/lib-dynload/math.so'

in python, is there a way to find the module that contains a variable or other object from the object itself?

As an example, say I have a variable defined where there may be multiple
from __ import *
from ____ import *
etc.
Is there a way to figure out where one of the variables in the namespace is defined?
edit
Thanks, but I already understand that import * is often considered poor form. That wasn't the question though, and in any case I didn't write it. It'd just be nice to have a way to find where the variable came from.
This is why it is considered bad form to use from __ import * in python in most cases. Either use from __ import myFunc or else import __ as myLib. Then when you need something from myLib it doesn't over lap something else.
For help finding things in the current namespace, check out the pprint library, the dir builtin, the locals builtin, and the globals builtin.
No, the names defined by from blah import * don't retain any information about where they came from. The values might have a clue, for example, classes have a __module__ attribute, but they may have been defined in one module, then imported from another, so you can't count on them being the values you expect.
Sort-of, for example:
>>> from zope.interface.common.idatetime import *
>>> print IDate.__module__
'zope.interface.common.idatetime'
>>> print Attribute.__module__
'zope.interface.interface'
The module of the Attribute may seem surprising since that is not where you imported it from, but it is where the Attribute type was defined. Looking at zope/interface/common/idatetype.py, we see:
from zope.interface import Interface, Attribute
which explains the value of __module__. You'll also run into problems with instances of types imported from other modules. Suppose that you create an Attribute instance named att:
>>> att = Attribute('foo')
>>> print att.__module__
'zope.interface.interface'
Again, you're learning where the type came from, but not where the variable was defined.
Quite possibly the biggest reason to not use wildcard imports is that you don't know what you're getting and they pollute your namespace and possibly clobber other types/variables.
>>> class Attribute(object):
... foo = 9
...
>>> print Attribute.foo
9
>>> from zope.interface.common.idatetime import *
>>> print Attribute.foo
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: type object 'Attribute' has no attribute 'foo'
Even if today the import * works without collision, there is no guarantee that it won't happen with future updates to the package being imported.
If you call the method itself in the interpreter it will tell you what it's parent modules are.
For example:
>>> from collections import *
>>> deque
<type 'collections.deque'>

Categories

Resources