I want to load an object from file using eval. That object is dumped to the file so that it is a valid python expression - all types are given with their fqdn, like this:
mod1.Class1(
attr1=mod2.Class2(a=1,b=2),
attr2=[1,2,3,4],
attr3=mod1.submod1.Class3(),
)
When I feed this into eval, not all of those modules are imported in the scope where eval is called, so I get either NameError: name 'mod1' is not defined for top-level modules, or, when those are imported, AttributeError: 'module' object has not attribute 'submod1' for sub-modules.
Is there a graceful way to handle that? I can parse NameError, run __import__ and re-try eval, but I am at loss how to get what went wrong from AttributeError.
Could I feed the expression to compile, walk the AST and import whatever is necessary? Never worked with the AST though, any example for that?
Note I am not interested about security here.
Why not use pickle for this? You can even __getstate__ and __setstate__ methods on your classes to control aspects of the serialization and instantiation. Seems seriously better than doing your own eval() thing.
Otherwise, how controlled are the values in your serialization format? I.e. maybe you can just predict what modules are going to be needed.
If you're wedded to using full Python (rather than something more easily parseable like JSON or YAML) for your data, walking the AST sounds fairly feasible. You'd want to implement an ast.NodeVisitor and keep track of the Attribute nodes visited.
Related
I have an application that dynamically generates a lot of Python modules with class factories to eliminate a lot of redundant boilerplate that makes the code hard to debug across similar implementations and it works well except that the dynamic generation of the classes across the modules (hundreds of them) takes more time to load than simply importing from a file. So I would like to find a way to save the modules to a file after generation (unless reset) then load from those files to cut down on bootstrap time for the platform.
Does anyone know how I can save/export auto-generated Python modules to a file for re-import later. I already know that pickling and exporting as a JSON object won't work because they make use of thread locks and other dynamic state variables and the classes must be defined before they can be pickled. I need to save the actual class definitions, not instances. The classes are defined with the type() function.
If you have ideas of knowledge on how to do this I would really appreciate your input.
You’re basically asking how to write a compiler whose input is a module object and whose output is a .pyc file. (One plausible strategy is of course to generate a .py and then byte-compile that in the usual fashion; the following could even be adapted to do so.) It’s fairly easy to do this for simple cases: the .pyc format is very simple (but note the comments there), and the marshal module does all of the heavy lifting for it. One point of warning that might be obvious: if you’ve already evaluated, say, os.getcwd() when you generate the code, that’s not at all the same as evaluating it when loading it in a new process.
The “only” other task is constructing the code objects for the module and each class: this requires concatenating a large number of boring values from the dis module, and will fail if any object encountered is non-trivial. These might be global/static variables/constants or default argument values: if you can alter your generator to produce modules directly, you can probably wrap all of these (along with anything else you want to defer) in function calls by compiling something like
my_global=(lambda: open(os.devnull,'w'))()
so that you actually emit the function and then a call to it. If you can’t so alter it, you’ll have to have rules to recognize values that need to be constructed in this fashion so that you can replace them with such calls.
Another detail that may be important is closures: if your generator uses local functions/classes, you’ll need to create the cell objects, perhaps via “fake” closures of your own:
def cell(x): return (lambda: x).__closure__[0]
In Python C API, I already know how to import a module via PyImport_ImportModule, as described in Python Documentation: Importing Modules. I also know that there is a lot of ways to create or allocate or initialize a module and some functions for operating a module, as described in Python Documentation: Module Objects.
But how can I get a function from a module (and call it), or, get a type/class from a module (and instantiate it), or, get an object from a module (and operate on it), or get anything from a module and do anything I want to do?
I think this can be a fool question but I really cannot find any tutorial or documentation. The only way I think that I can achieve this is use PyModule_GetDict to get the __dict__ property of the module and fetch what I want, as described in the latter documentation I mentioned. But the documentation also recommend that one should not use this function to operate the module.
So any "official way" or best practice for getting something from a module?
According to the documentation for PyModule_GetDict:
It is recommended extensions use other PyModule_*() and PyObject_*() functions rather than directly manipulate a module’s __dict__.
The functions you need are generic object functions (PyObject_*) rather than module functions (PyModule_*), and I suspect this is where you were looking in the wrong place.
You want to use PyObject_GetAttr or PyObject_GetAttrString.
If I use a function to read the contents of a file in one module:
def get_objectstore():
with open(os.getcwd() + "\\data.store", "rb") as infile:
objA = cPickle.load(infile)
objectstore = eval((str(objA)).translate(string.maketrans(coder, alpha)))
return objectstore
and I call this function from my main program like this:
from main_vars import get_objectstore
objectstore=get_objectstore()
now objectstore has all images and sound used by my program. How can I use
objectstore
in all other modules loaded into the main program.
How can I use objectstore in all other modules loaded into the main program.
This is one of those things that you can do, but almost certainly shouldn't… so first I'll explain how, then explain why not.
If you want something to be directly available in every module, without having to import it in each module, just as if it were a builtin… the answer is to add it to the builtins. So, how do you do that?
Well, technically, there's no guaranteed safe way to do it, but practically, I believe monkeypatching the builtins module works in every version of every major implementation. So:
import builtins
builtins.objectstore = objectstore
Note that in Python 2.x, the module is called __builtin__, but works the same way.
This doesn't work in a few cases, e.g., inside code being run by an exec with a custom globals that provides a custom __builtins__. If you need to handle that… well, you can't handle it portably. But what if you only care about CPython (and I think PyPy, but not Jython, and I don't know about Iron…)? In CPython, it's guaranteed that every global environment, even the ones created for compile, exec, etc., will contain something named __builtins__ that's either the builtin dict for that environment, or some object whose namespace is the builtin dict for the environment. If there are no builtins at all, or you're looking at the global environment for the builtins module itself, it may be missing. So, you can write this:
try:
__builtins__['objectstore'] = objectstore
except AttributeError:
__builtins__.objectstore = objectstore
(That doesn't handle the case where you're running code inside the builtins namespace itself, because… I'm not sure what you'd want to do there, to be honest.)
Of course that's ugly. That's probably on purpose.
Now, why don't you want to do that?
Well, for one thing, it makes your code a lot harder to read. Anyone looking at your code can't tell where objectstore was defined, except by searching some completely unrelated module that isn't even referenced in the current module.
For another, it can lead to problems that are hard to debug. If you later change the list of which modules import which other modules, some module that depends on objectstore being patched in may run before it's been patched in and therefore fail.
Also, the fact that it's not actually documented that you can monkeypatch builtins, or that doing so affects the builtin environment, is a good sign that it's not something you should be relying on.
It's better to make what you're doing explicit. Have objectstore be a module global for some module that every other module imports from. Then it's immediately clear what's happening, on even a cursory glance.
For example, why not add that objectstore = get_objectstore() to mainvars, and then have every module do from mainvars import objectstore? Or maybe you even want to make it a singleton, so it's safe for anyone to call get_objectstore() and know that they'll all get back a single, shared value. Without knowing exactly what you're trying to accomplish, it's hard to suggest a best solution. All I can say for sure is that making objectstore a builtin-like cross-module global is very unlikely to be the best solution for almost anything you might be trying to accomplish.
I'm attempting to broadcast a module to other python processes with MPI. Of course, a module itself isn't pickleable, but the __dict__ is. Currently, I'm pickling the __dict__ and making a new module in the receiving process. This worked perfectly with some simple, custom modules. However, when I try to do this with NumPy, there's one thing that I can't pickle easily: the ufunc.
I've read this thread that suggests pickling the __name__ and __module__ of the ufunc, but it seems they rely on having numpy fully built and present before they rebuild it. I need to avoid using the import statement all-together in the receiving process, so I'm curious if the getattr(numpy,name) statement mentioned would work with a module that doesn't have ufuncs included yet.
Also, I don't see a __module__ attribute on the ufunc in the NumPy documentation:
http://docs.scipy.org/doc/numpy/reference/ufuncs.html
Any help or suggestions, please?
EDIT: Sorry, forgot to include thread mentioned above. http://mail.scipy.org/pipermail/numpy-discussion/2007-January/025778.html
Pickling a function in Python only serializes its name and the module it comes from. It does not transport code over the wire, so when unpickling you need to have the same libraries available as when pickling. On unpickling, Python simply imports the module in question, and grabs the items via getattr. (This is not limited to Numpy, but applies to pickling in general.)
Ufuncs don't pickle cleanly, which is a wart. Your options mainly are then to pickle just the __name__ (and maybe the __class__) of the ufunc, and reconstruct them later on manually. (They are not actually Python functions, and do not have a __module__ attribute.)
I am working on my program, GarlicSim, in which a user creates a simulation, then he is able to manipulate it as he desires, and then he can save it to file.
I recently tried implementing the saving feature. The natural thing that occured to me is to pickle the Project object, which contains the entire simulation.
Problem is, the Project object also includes a module-- That is the "simulation package", which is a package/module that contains several critical objects, mostly functions, that define the simulation. I need to save them together with the simulation, but it seems that it is impossible to pickle a module, as I witnessed when I tried to pickle the Project object and an exception was raised.
What would be a good way to work around that limitation?
(I should also note that the simulation package gets imported dynamically in the program.)
If the project somehow has a reference to a module with stuff you need, it sounds like you might want to refactor the use of that module into a class within the module. This is often better anyway, because the use of a module for stuff smells of a big fat global. In my experience, such an application structure will only lead to trouble.
(Of course the quick way out is to save the module's dict instead of the module itself.)
If you have the original code for the simulation package modules, which I presume are dynamically generated, then I would suggest serializing that and reconstructing the modules when loaded. You would do this in the Project.__getstate__() and Project.__setstate__() methods.