Python: Alternatives to pickling a module - python

I am working on my program, GarlicSim, in which a user creates a simulation, then he is able to manipulate it as he desires, and then he can save it to file.
I recently tried implementing the saving feature. The natural thing that occured to me is to pickle the Project object, which contains the entire simulation.
Problem is, the Project object also includes a module-- That is the "simulation package", which is a package/module that contains several critical objects, mostly functions, that define the simulation. I need to save them together with the simulation, but it seems that it is impossible to pickle a module, as I witnessed when I tried to pickle the Project object and an exception was raised.
What would be a good way to work around that limitation?
(I should also note that the simulation package gets imported dynamically in the program.)

If the project somehow has a reference to a module with stuff you need, it sounds like you might want to refactor the use of that module into a class within the module. This is often better anyway, because the use of a module for stuff smells of a big fat global. In my experience, such an application structure will only lead to trouble.
(Of course the quick way out is to save the module's dict instead of the module itself.)

If you have the original code for the simulation package modules, which I presume are dynamically generated, then I would suggest serializing that and reconstructing the modules when loaded. You would do this in the Project.__getstate__() and Project.__setstate__() methods.

Related

Accessing A Python Module + General Library/Module Structure

Below is a screenshot of part of an article explaining how to access the example Python module dataset.py, for which they provide the following line:
import my_model.training.dataset
I'd like to know if the following methods below are equivalent and accomplish the same thing:
from my_model.training import dataset
from my_model import training.dataset
I have a library where I've been accumulating all of my .py files over time. I'm trying to organize it into something more.. neat but I'm having trouble deciding how to do that.
The library (or rather, the folder I'm dumping everything in) is meant to be just a collection of independent modules, but some of the modules have cross dependencies.. It'd be easier if I had a systematic way to group functions/classes within certain files ie modules. Should they be grouped by purpose?
keep in mind these aren't even packages for projects, they are the building blocks for other packages; just my own personal collection of classes and functions but starting to get hard to manage. so i could use some advice
Thanks

How can I save a dynamically generated module and reimport them from file?

I have an application that dynamically generates a lot of Python modules with class factories to eliminate a lot of redundant boilerplate that makes the code hard to debug across similar implementations and it works well except that the dynamic generation of the classes across the modules (hundreds of them) takes more time to load than simply importing from a file. So I would like to find a way to save the modules to a file after generation (unless reset) then load from those files to cut down on bootstrap time for the platform.
Does anyone know how I can save/export auto-generated Python modules to a file for re-import later. I already know that pickling and exporting as a JSON object won't work because they make use of thread locks and other dynamic state variables and the classes must be defined before they can be pickled. I need to save the actual class definitions, not instances. The classes are defined with the type() function.
If you have ideas of knowledge on how to do this I would really appreciate your input.
You’re basically asking how to write a compiler whose input is a module object and whose output is a .pyc file. (One plausible strategy is of course to generate a .py and then byte-compile that in the usual fashion; the following could even be adapted to do so.) It’s fairly easy to do this for simple cases: the .pyc format is very simple (but note the comments there), and the marshal module does all of the heavy lifting for it. One point of warning that might be obvious: if you’ve already evaluated, say, os.getcwd() when you generate the code, that’s not at all the same as evaluating it when loading it in a new process.
The “only” other task is constructing the code objects for the module and each class: this requires concatenating a large number of boring values from the dis module, and will fail if any object encountered is non-trivial. These might be global/static variables/constants or default argument values: if you can alter your generator to produce modules directly, you can probably wrap all of these (along with anything else you want to defer) in function calls by compiling something like
my_global=(lambda: open(os.devnull,'w'))()
so that you actually emit the function and then a call to it. If you can’t so alter it, you’ll have to have rules to recognize values that need to be constructed in this fashion so that you can replace them with such calls.
Another detail that may be important is closures: if your generator uses local functions/classes, you’ll need to create the cell objects, perhaps via “fake” closures of your own:
def cell(x): return (lambda: x).__closure__[0]

avoiding a circular dependency in packaging

I'm cleaning up some Python (2.7) code that I inherited, and have come across a circular import scenario that I'd like to get rid of. The code currently runs (by abusing the import function), but it's a messy and causes issues when other code doesn't access it in a specific way.
The file structure is essentially this:
/deep/nested/path/__init__.py
/deep/nested/path/objects.py
/deep/nested/path/api.py
objects is a collection of data models
api exposes developer interface with functions to get/create instances of objects.
the circular import occurs because some objects need to invoke api functions to create child objects.
this section of code handles analytics and is executed a lot (many objects, deep recursion). the package namespace is fairly nested too -- so using the package path has a tangible effect on performance.
i'm very tempted to just move the factory functions needed by objects into that file, and then import them back into api for general use. that would solve my problems (and eliminate a dot), but lose some of the code organization (which is actually pretty decent). I'm hoping for another set of eyes to give some input.
while there are several questions about circular imports already here, i'm not concerned with getting this to work (which it does). i'm concerned with minimizing the dot notation. api.factory and objects.foo work, but package.api.factory wont.
Perhaps it would be better to move those factory functions into a third
module. Then objects can import it to creates its objects; api can
import it if needed; other modules can import it if they need what it
contains.

Does importing a Python module affect performance?

When searching for a solution, it's common to come across several methods. I often use the solution that most closely aligns with syntax I'm familiar with. But sometimes the far-and-away most upvoted solution involves importing a module new to me, as in this thread.
I'm already importing various modules in large script that will be looping 50K times. Does importing additional modules affect processing time, or otherwise affect the script's efficiency? Do I need to worry about the size of the module being called? Seeking guidance on whether, generally, it's worth the extra time/effort to find solutions using methods contained in modules I'm already using.
Every bytecode in Python affects performance. However, unless that code is on a critical path and repeated a high number of times, the effect is so small as to not matter.
Using import consists of two distinct steps: loading the module (done just once), and binding names (where the imported name is added to your namespace to refer to something loaded by the module, or the module object itself). Binding names is almost costless. Because loading a module happens just once, it won't affect your performance.
Focus instead on what the module functionality can do to help you solve your problem efficiently.

Save Workspace - save all variables to a file. Python doesn't have it)

I cannot understand it. Very simple, and obvious functionality:
You have a code in any programming language, You run it. In this code You generate variables, than You save them (the values, names, namely everything) to a file, with one command. When it's saved You may open such a file in Your code also with simple command.
It works perfect in matlab (save Workspace , load Workspace ) - in python there's some weird "pickle" protocol, which produces errors all the time, while all I want to do is save variable, and load it again in another session (?????)
f.e. You cannot save class with variables (in Matlab there's no problem)
You cannot load arrays in cPickle (but YOu can save them (?????) )
Why don't make it easier?
Is there a way to save the current variables with values, and then load them?
What you are describing is Matlab environment feature not a programming language.
What you need is a way to store serialized state of some object which could be easily done in almost any programming language. In python world pickle is the easiest way to achieve it and if you could provide more details about the errors it produces for you people would probably be able to give you more details on that.
In general for object oriented languages (including python) it is always a good approach to incapsulate a your state into single object that could be serialized and de-serialized and then store/load an instance of such class. Pickling and unpickling of such objects works perfectly for many developers so this must be something specific to your implementation.
Since you're talking about Matlab, you probably want to try out IPython, which is a shell for Python offering much more functionality than the standard interpreter shell you get when executing Python.
Among this functionality is the ability to load/save workspace sessions, create macros out of session input etc., which is probably more like what you are used to in Matlab (I actually use both and find IPython to be much more elegant, but YMMV):
http://ipython.scipy.org
PiCloud has implemented a fancier pickle, but I can't find the code. I saw a poster session.
Generally in Python instantiated objects don't have any one way to recreate them, and in some cases its particularly difficult (like an open file) as it takes several steps to recreate.
I take issue with the statement that the saving of variables in Matlab is an environment function. the "save" statement in matlab is a function and part of the matlab language not just a command. It is a very useful function as you don't have to worry about the trivial minutia of file i/o and it handles all sorts of variables from scalar, matrix, objects, structures.
It's an old thread, but thought I should throw it out there anyway - Spyder the Scientific Python development environment allows you to do just this through the Variable explorer. There's a button there Save data that packs your whole workspace up in a .spydata file that you can later reload. Works like a charm when you're switching between projects!

Categories

Resources