Sandboxing Python code is notoriously difficult due to the power of the reflection facilities built into the language. At a minimum one has to take away the import mechanism and most of the built-in functions and global variables, and even then there are holes ({}.__class__.__base__.__subclasses__(), for instance).
In both Python 2 and 3, the 'sys' module is built into the interpreter and preloaded before user code begins to execute (even in -S mode). If you can get a handle to the sys module, then you have access to the global list of loaded modules (sys.modules) which enables you to do all sorts of naughty things.
So, the question: Starting from an empty module, without using the import machinery at all (no import statement, no __import__, no imp library, etc), and also without using anything normally found in __builtins__ unless you can get a handle to it some other way, is it possible to acquire a reference to either sys or sys.modules? (Each points to the other.) Am interested in both 2.x and 3.x answers.
__builtins__ can usually be recovered, giving you a path back to __import__ and thus to any module.
For Python 3 this comment from eryksun works, for example:
>>> f = [t for t in ().__class__.__base__.__subclasses__()
... if t.__name__ == 'Sized'][0].__len__
>>> f.__globals__['__builtins__']['__import__']('sys')
<module 'sys' (built-in)>
In Python 2, you just look for a different object:
>>> f = [t for t in ().__class__.__base__.__subclasses__()
... if t.__name__ == 'catch_warnings'][0].__exit__.__func__
>>> f.__globals__['__builtins__']['__import__']('sys')
<module 'sys' (built-in)>
Either method looks for subclasses of a built-in type you can create with literal syntax (here a tuple), then referencing a function object on that subclass. Function objects have a __globals__ dictionary reference, which will give you the __builtins__ object back.
Note that you can't just say no __import__ because it is part of __builtins__ anyway.
However, many of those __globals__ objects are bound to have sys present already. Searching for a sys module on Python 3, for example, gives me access to one in a flash:
>>> next(getattr(c, f).__globals__['sys']
... for c in ().__class__.__base__.__subclasses__()
... for f in dir(c)
... if isinstance(getattr(c, f, None), type(lambda: None)) and
... 'sys' in getattr(c, f).__globals__)
<module 'sys' (built-in)>
The Python 2 version only need to unwrap the unbound methods you find on classes to get the same results:
>>> next(getattr(c, f).__func__.__globals__['sys']
... for c in ().__class__.__base__.__subclasses__()
... for f in dir(c)
... if isinstance(getattr(c, f, None), type((lambda: 0).__get__(0))) and
... 'sys' in getattr(c, f).__func__.__globals__)
<module 'sys' (built-in)>
Related
In a Python program, if a name exists in the namespace of the program, is it possible to find out if the name is imported from some module, and if yes, which module it is imported from?
You can see which module a function has been defined in via the __module__ attribute. From the Python Data model documentation on __module__:
The name of the module the function was defined in, or None if unavailable.
Example:
>>> from re import compile
>>> compile.__module__
're'
>>> def foo():
... pass
...
>>> foo.__module__
'__main__'
>>>
The Data model later mentions that classes have the same attribute as well:
__module__ is the module name in which the class was defined.
>>> from datetime import datetime
>>> datetime.__module__
'datetime'
>>> class Foo:
... pass
...
>>> Foo.__module__
'__main__'
>>>
You can also do this with builtin names such as int and list. You can accesses them from the builtins module.
>>> int.__module__
'builtins'
>>> list.__module__
'builtins'
>>>
I can use int and list without from builtins import int, list. So how do int and list become available in my program?
That is because int and list are builtin names. You don't have to explicitly import them for Python to be able to find them in the current namespace. You can see this for yourself in the CPython virtual machine source code. As #user2357112 mentioned, builtin names are accessed when global lookup fails. Here's the relevant snippet:
if (v == NULL) {
v = PyDict_GetItem(f->f_globals, name);
Py_XINCREF(v);
if (v == NULL) {
if (PyDict_CheckExact(f->f_builtins)) {
v = PyDict_GetItem(f->f_builtins, name);
if (v == NULL) {
format_exc_check_arg(
PyExc_NameError,
NAME_ERROR_MSG, name);
goto error;
}
Py_INCREF(v);
}
else {
v = PyObject_GetItem(f->f_builtins, name);
if (v == NULL) {
if (PyErr_ExceptionMatches(PyExc_KeyError))
format_exc_check_arg(
PyExc_NameError,
NAME_ERROR_MSG, name);
goto error;
}
}
}
}
In the code above, CPython first searches for a name in the global scope. If that fails, then it falls back and attempts to get the name from a mapping of builtin names in the current frame object its executing. That's what f->f_builtins is.
You can observe this mapping from the Python level using sys._getframe():
>>> import sys
>>> frame = sys._getframe()
>>>
>>> frame.f_builtins['int']
<class 'int'>
>>> frame.f_builtins['list']
<class 'list'>
>>>
sys._getframe() returns the frame at the top of the call stack. In this case, its the frame for the module scope. And as you can see from above, the f_builtins mapping for the frame contains both the int and list classes, so Python has automatically made those names available to you. In other words, it's "built" them into the scope; hence the term "builtins".
If for some reason the source is unavailable, you could use getmodule from inspect which tries its best to find the module by grabbing __module__ if it exists and then falling back to other alternatives.
If everything goes o.k, you get back a module object. From that, you can grab the __name__ to get the actual name of the module:
from inspect import getmodule
from collections import OrderedDict
from math import sin
getmodule(getmodule).__name__
'inspect'
getmodule(OrderedDict).__name__
'collections'
getmodule(sin).__name__
'math'
If it doesn't find anything, it returns None so you'd have to special case this. In general this encapsulates the logic for you so you don't need to write a function yourself to actually grab __module__ if it exists.
This doesn't work for objects that don't have this information attached. You can, as a fall-back, try and pass in the type to circumvent it:
o = OrderedDict()
getmodule(o) # None
getmodule(type(0)).__name__ # 'collections'
but that won't always yield the correct result:
from math import pi
getmodule(type(pi)).__name__
'builtins'
Some objects (but far from all) have an attribute __module__.
Unless code is doing something unusual like updating globals directly, the source code should indicate where every variable came from:
x = 10 # Assigned in the current module
from random import randrange # Loaded from random
from functools import * # Loads everything listed in functools.__all__
You can also look at globals(), it will output a dict with all the names python uses BUT also the variable, modules and functions you declared inside the namespace.
>>> x = 10
>>> import os
>>> globals() # to big to display here but finish with
# {... 'x': 10, 'os': <module 'os' from '/usr/local/lib/python2.7/os.py'>}
Therefore, you can test if variable were declared like this:
if globals()['x']:
print('x exist')
try:
print(globals()['y'])
except KeyError:
print('but y does not')
# x exist
# but y does not
Works also with modules:
print(globals()['os']) # <module 'os' from '/usr/local/lib/python2.7/os.py'>
try:
print(globals()['math'])
except KeyError:
print('but math is not imported')
# <module 'os' from '/usr/local/lib/python2.7/os.py'>
# but math is not imported
Can a python module have a __repr__? The idea would be to do something like:
import mymodule
print mymodule
EDIT: precision: I mean a user-defined repr!
Short answer: basically the answer is no.
But can't you find the functionality you are looking for using docstrings?
testmodule.py
""" my module test does x and y
"""
class myclass(object):
...
test.py
import testmodule
print testmodule.__doc__
Long answer:
You can define your own __repr__ on a module level (just provide def __repr__(...) but then you'd have to do:
import mymodule
print mymodule.__repr__()
to get the functionality you want.
Have a look at the following python shell session:
>>> import sys # we import the module
>>> sys.__repr__() # works as usual
"<module 'sys' (built-in)>"
>>> sys.__dict__['__repr__'] # but it's not in the modules __dict__ ?
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: '__repr__'
>>> sys.__class__.__dict__['__repr__'] # __repr__ is provided on the module type as a slot wrapper
<slot wrapper '__repr__' of 'module' objects>
>>> sys.__class__.__dict__['__repr__'](sys) # which we should feed an instance of the module type
"<module 'sys' (built-in)>"
So I believe the problem lies within these slot wrapper objects which (from what can be read at the link) have the result of bypassing the usual 'python' way of looking up item attributes.
For these class methods CPython returns C pointers to the corresponding methods on these objects (which then get wrapped in the slot wrapper objects to be callable from the python-side).
You can achieve this effect--if you're willing to turn to the Dark Side of the Force.
Add this to mymodule.py:
import sys
class MyReprModule(mymodule.__class__):
def __init__(self, other):
for attr in dir(other):
setattr(self, attr, getattr(other, attr))
def __repr__(self):
return 'ABCDEFGHIJKLMNOQ'
# THIS LINE MUST BE THE LAST LINE IN YOUR MODULE
sys.modules[__name__] = MyReprModule(sys.modules[__name__])
Lo and behold:
>>> import mymodule
>>> print mymodule
ABCDEFGHIJKLMNOQ
I dimly remember, in previous attempts at similarly evil hacks, having trouble setting special attributes like __class__. I didn't have that trouble when testing this. If you run into that problem, just catch the exception and skip that attribute.
Modules can have a __repr__ function, but it isn't invoked when getting the representation of a module.
So no, you can't do what you want.
As a matter of fact, many modules do [have a __repr__]!
>>> import sys
>>> print(sys)
<module 'sys' (built-in)> #read edit, however, this info didn't come from __repr__ !
also try dir(sys) to see __repr__ is there along with __name__ etc..
Edit:
__repr__ seems to be found in modules, in Python 3.0 and up.
As indicated by Ned Batchelder, this methods is not used by Python when it print out the a module. (A quick experiment, where the repr property was re-assigned showed that...)
No, because __repr__ is a special method (I call it a capability), and it is only ever looked up on the class. Your module is just another instance of the module type, so however you would manage to define a __repr__, it would not be called!
In a Python program, if a name exists in the namespace of the program, is it possible to find out if the name is imported from some module, and if yes, which module it is imported from?
You can see which module a function has been defined in via the __module__ attribute. From the Python Data model documentation on __module__:
The name of the module the function was defined in, or None if unavailable.
Example:
>>> from re import compile
>>> compile.__module__
're'
>>> def foo():
... pass
...
>>> foo.__module__
'__main__'
>>>
The Data model later mentions that classes have the same attribute as well:
__module__ is the module name in which the class was defined.
>>> from datetime import datetime
>>> datetime.__module__
'datetime'
>>> class Foo:
... pass
...
>>> Foo.__module__
'__main__'
>>>
You can also do this with builtin names such as int and list. You can accesses them from the builtins module.
>>> int.__module__
'builtins'
>>> list.__module__
'builtins'
>>>
I can use int and list without from builtins import int, list. So how do int and list become available in my program?
That is because int and list are builtin names. You don't have to explicitly import them for Python to be able to find them in the current namespace. You can see this for yourself in the CPython virtual machine source code. As #user2357112 mentioned, builtin names are accessed when global lookup fails. Here's the relevant snippet:
if (v == NULL) {
v = PyDict_GetItem(f->f_globals, name);
Py_XINCREF(v);
if (v == NULL) {
if (PyDict_CheckExact(f->f_builtins)) {
v = PyDict_GetItem(f->f_builtins, name);
if (v == NULL) {
format_exc_check_arg(
PyExc_NameError,
NAME_ERROR_MSG, name);
goto error;
}
Py_INCREF(v);
}
else {
v = PyObject_GetItem(f->f_builtins, name);
if (v == NULL) {
if (PyErr_ExceptionMatches(PyExc_KeyError))
format_exc_check_arg(
PyExc_NameError,
NAME_ERROR_MSG, name);
goto error;
}
}
}
}
In the code above, CPython first searches for a name in the global scope. If that fails, then it falls back and attempts to get the name from a mapping of builtin names in the current frame object its executing. That's what f->f_builtins is.
You can observe this mapping from the Python level using sys._getframe():
>>> import sys
>>> frame = sys._getframe()
>>>
>>> frame.f_builtins['int']
<class 'int'>
>>> frame.f_builtins['list']
<class 'list'>
>>>
sys._getframe() returns the frame at the top of the call stack. In this case, its the frame for the module scope. And as you can see from above, the f_builtins mapping for the frame contains both the int and list classes, so Python has automatically made those names available to you. In other words, it's "built" them into the scope; hence the term "builtins".
If for some reason the source is unavailable, you could use getmodule from inspect which tries its best to find the module by grabbing __module__ if it exists and then falling back to other alternatives.
If everything goes o.k, you get back a module object. From that, you can grab the __name__ to get the actual name of the module:
from inspect import getmodule
from collections import OrderedDict
from math import sin
getmodule(getmodule).__name__
'inspect'
getmodule(OrderedDict).__name__
'collections'
getmodule(sin).__name__
'math'
If it doesn't find anything, it returns None so you'd have to special case this. In general this encapsulates the logic for you so you don't need to write a function yourself to actually grab __module__ if it exists.
This doesn't work for objects that don't have this information attached. You can, as a fall-back, try and pass in the type to circumvent it:
o = OrderedDict()
getmodule(o) # None
getmodule(type(0)).__name__ # 'collections'
but that won't always yield the correct result:
from math import pi
getmodule(type(pi)).__name__
'builtins'
Some objects (but far from all) have an attribute __module__.
Unless code is doing something unusual like updating globals directly, the source code should indicate where every variable came from:
x = 10 # Assigned in the current module
from random import randrange # Loaded from random
from functools import * # Loads everything listed in functools.__all__
You can also look at globals(), it will output a dict with all the names python uses BUT also the variable, modules and functions you declared inside the namespace.
>>> x = 10
>>> import os
>>> globals() # to big to display here but finish with
# {... 'x': 10, 'os': <module 'os' from '/usr/local/lib/python2.7/os.py'>}
Therefore, you can test if variable were declared like this:
if globals()['x']:
print('x exist')
try:
print(globals()['y'])
except KeyError:
print('but y does not')
# x exist
# but y does not
Works also with modules:
print(globals()['os']) # <module 'os' from '/usr/local/lib/python2.7/os.py'>
try:
print(globals()['math'])
except KeyError:
print('but math is not imported')
# <module 'os' from '/usr/local/lib/python2.7/os.py'>
# but math is not imported
I'm trying to replicate from foo.bar import object using the __import__ function and I seem to have hit a wall.
A simpler case from glob import glob is easy: glob = __import__("glob").glob
The problem I'm having is that I am importing a name from a subpackage (i.e. from foo.bar):
So what I'd like is something like
string_to_import = "bar"
object = __import__("foo." + string_to_import).object
But this just imported the top-level foo package, not the foo.bar subpackage:
__import__("foo.bar")
<module 'foo' from 'foo/__init__.pyc'>
How to use python's __import__() function properly?
There are two kinds of uses:
direct importing
a hook to alter import behavior
For the most part, you don't really need to do either.
For user-space importing
Best practice is to use importlib instead. But if you insist:
Trivial usage:
>>> sys = __import__('sys')
>>> sys
<module 'sys' (built-in)>
Complicated:
>>> os = __import__('os.path')
>>> os
<module 'os' from '/home/myuser/anaconda3/lib/python3.6/os.py'>
>>> os.path
<module 'posixpath' from '/home/myuser/anaconda3/lib/python3.6/posixpath.py'>
If you want the rightmost child module in the name, pass a nonempty list, e.g. [None], to fromlist:
>>> path = __import__('os.path', fromlist=[None])
>>> path
<module 'posixpath' from '/home/myuser/anaconda3/lib/python3.6/posixpath.py'>
Or, as the documentation declares, use importlib.import_module:
>>> importlib = __import__('importlib')
>>> futures = importlib.import_module('concurrent.futures')
>>> futures
<module 'concurrent.futures' from '/home/myuser/anaconda3/lib/python3.6/concurrent/futures/__init__.py'>
Documentation
The docs for __import__ are the most confusing of the builtin functions.
__import__(...)
__import__(name, globals=None, locals=None, fromlist=(), level=0) -> module
Import a module. Because this function is meant for use by the Python
interpreter and not for general use it is better to use
importlib.import_module() to programmatically import a module.
The globals argument is only used to determine the context;
they are not modified. The locals argument is unused. The fromlist
should be a list of names to emulate ``from name import ...'', or an
empty list to emulate ``import name''.
When importing a module from a package, note that __import__('A.B', ...)
returns package A when fromlist is empty, but its submodule B when
fromlist is not empty. Level is used to determine whether to perform
absolute or relative imports. 0 is absolute while a positive number
is the number of parent directories to search relative to the current module.
If you read it carefully, you get the sense that the API was originally intended to allow for lazy-loading of functions from modules. However, this is not how CPython works, and I am unaware if any other implementations of Python have managed to do this.
Instead, CPython executes all of the code in the module's namespace on its first import, after which the module is cached in sys.modules.
__import__ can still be useful. But understanding what it does based on the documentation is rather hard.
Full Usage of __import__
To adapt the full functionality to demonstrate the current __import__ API, here is a wrapper function with a cleaner, better documented, API.
def importer(name, root_package=False, relative_globals=None, level=0):
""" We only import modules, functions can be looked up on the module.
Usage:
from foo.bar import baz
>>> baz = importer('foo.bar.baz')
import foo.bar.baz
>>> foo = importer('foo.bar.baz', root_package=True)
>>> foo.bar.baz
from .. import baz (level = number of dots)
>>> baz = importer('baz', relative_globals=globals(), level=2)
"""
return __import__(name, locals=None, # locals has no use
globals=relative_globals,
fromlist=[] if root_package else [None],
level=level)
To demonstrate, e.g. from a sister package to baz:
baz = importer('foo.bar.baz')
foo = importer('foo.bar.baz', root_package=True)
baz2 = importer('bar.baz', relative_globals=globals(), level=2)
assert foo.bar.baz is baz is baz2
Dynamic access of names in the module
To dynamically access globals by name from the baz module, use getattr. For example:
for name in dir(baz):
print(getattr(baz, name))
Hook to alter import behavior
You can use __import__ to alter or intercept importing behavior. In this case, let's just print the arguments it gets to demonstrate we're intercepting it:
old_import = __import__
def noisy_importer(name, locals, globals, fromlist, level):
print(f'name: {name!r}')
print(f'fromlist: {fromlist}')
print(f'level: {level}')
return old_import(name, locals, globals, fromlist, level)
import builtins
builtins.__import__ = noisy_importer
And now when you import you can see these important arguments.
>>> from os.path import join as opj
name: 'os.path'
fromlist: ('join',)
level: 0
>>> opj
<function join at 0x7fd08d882618>
Perhaps in this context getting the globals or locals could be useful, but no specific uses for this immediately come to mind.
The __import__ function will return the top level module of a package, unless you pass a nonempty fromlist argument:
_temp = __import__('foo.bar', fromlist=['object'])
object = _temp.object
See the Python docs on the __import__ function.
You should use importlib.import_module, __import__ is not advised outside the interpreter.
In __import__'s docstring:
Import a module. Because this function is meant for use by the Python
interpreter and not for general use it is better to use
importlib.import_module() to programmatically import a module.
It also supports relative imports.
Rather than use the __import__ function I would use the getattr function:
model = getattr(module, model_s)
where module is the module to look in and and model_s is your model string. The __import__ function is not meant to be used loosely, where as this function will get you what you want.
In addition to these excellent answers, I use __import__ for convenience, to call an one-liner on the fly. Examples like the following can also be saved as auto-triggered snippets at your IDE.
Plant an ipdb break-point (triggered with "ipdb")
__import__("ipdb").set_trace(context=9)
Print prettily (triggered with "pp")
__import__("pprint").pprint(<cursor-position>)
This way, you get a temporary object, that is not referenced by anything, and call an attribute on the spot. Also, you can easily comment, uncomment or delete a single line.
Given that I have the code object for a module, how do I get the corresponding module object?
It looks like moduleNames = {}; exec code in moduleNames does something very close to what I want. It returns the globals declared in the module into a dictionary. But if I want the actual module object, how do I get it?
EDIT:
It looks like you can roll your own module object. The module type isn't conveniently documented, but you can do something like this:
import sys
module = sys.__class__
del sys
foo = module('foo', 'Doc string')
foo.__file__ = 'foo.pyc'
exec code in foo.__dict__
As a comment already indicates, in today's Python the preferred way to instantiate types that don't have built-in names is to call the type obtained via the types module from the standard library:
>>> import types
>>> m = types.ModuleType('m', 'The m module')
note that this does not automatically insert the new module in sys.modules:
>>> import sys
>>> sys.modules['m']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'm'
That's a task you must perform by hand:
>>> sys.modules['m'] = m
>>> sys.modules['m']
<module 'm' (built-in)>
This can be important, since a module's code object normally executes after the module's added to sys.modules -- for example, it's perfectly correct for such code to refer to sys.modules[__name__], and that would fail (KeyError) if you forgot this step. After this step, and setting m.__file__ as you already have in your edit,
>>> code = compile("a=23", "m.py", "exec")
>>> exec code in m.__dict__
>>> m.a
23
(or the Python 3 equivalent where exec is a function, if Python 3 is what you're using, of course;-) is correct (of course, you'll normally have obtained the code object by subtler means than compiling a string, but that's not material to your question;-).
In older versions of Python you would have used the new module instead of the types module to make a new module object at the start, but new is deprecated since Python 2.6 and removed in Python 3.