Why can't I change variables from cached modules in IronPython? - python

Disclaimer: I am new to python and IronPython so sorry if this is obvious.
We have a C# application that uses IronPython to execute scripts. There are a few common modules/scripts and then a lot of little scripts that define parameters, do setup, then call functions in the core modules. After some recent additions made the common modules larger performance took a hit on the imports. I attempted to fix this by making sure we only created one engine and created scopes for each script to run in. I've seen information that these are compiled on the engine, but this is apparently not so as it continued to take excessive time importing them so it must be cached in the scope. Then I used THIS blog entry to create a custom shared dictionary where I could precompile the common modules at app load and then reuse it. Everything was working fine until I realized that variables were not changing on subsequent runs. After creating a scope in which to run a script I would add a required variable...
currentScope.SetVariable("agr", aggregator)
The first time this runs agr works fine in the scripts and is say instance A. On subsequent runs a new scope is created, a new aggregator is created (let's call it B) and set as agr, but when the underlying modules call agr it is not aggregator B, its aggregator A which i no longer valid. I have even tried to force it adding this to the main script...
CommonModule.agr = agr
#Do Work
CommonModule.agr = None
to no avail. agr itself is not stored in the shared symbol dictionary, but CommonModule is and it has a variable for agr. What do I have to do to change this variable and why is it cached in this manner?
UPDATE FOR CLARIFICATION: Sorry about the confusion, but it's a combination of so much code across C# and python it would be hard to include. Let me see if I can clarify a little. Every time I run a script I need to set the value for 'agr' to a new object which is created in C# prior to python execution using scope.SetVariable(). Some core modules are imported and compiled into a cached scope. On script execution a new temporary scope is created using a SharedSymbolDictionary created with the shared scope (to avoid importing core modules every time) which executes the script.
The problem is 'agr' is set correctly the first time both in the main script and the core (precompiled) scripts, however on subsequent script executions 'agr' is correct in the main script, but when the core scripts reference 'agr' it is pointing to the 'agr' created the first execution and NOT the new 'agr' object created for that execution and most of its references are null now.

All the comments without the code are a bit confusing.
But taking just the last paragraph, if you would like to modify the module level variable from C# you can:
# scope is SharedSymbolDictionary
var module = scope.GetVariable("CommonModule") as PythonModule;
module.Get__dict__()["agr"] = "new value";
The second observation, the variables which are provided to SharedSymbolDictionary as sharedScope can be changed within the individual run, but disappear on subsequent runs. If you would like to make persistent changes during the script run, you need to change TrySetExtraValue to something like this:
protected override bool TrySetExtraValue(string key, object value) {
lock (_sharedScope) {
if (_sharedScope.ContainsVariable(key)) {
_sharedScope.SetVariable(key, value);
return true;
}
return false;
}
}
Note: I work with Ironpython 2.7 and .Net 4.0. The signature of TrySetExtraValue is a bit different than the one in the blog.

So I don't have a solid explanation, but I found a simple solution. Originally 'agr' was the variable name used everywhere... scope.SetVariable(), in the top level scripts, and in the precompiled core scripts.
For the fix, I changed the C# to use the variable name 'aggregator' for SetVariable(). I then created a module imported by all the top level main scripts i.e. sharedModule and then used...
sharedModule.agr = aggregator
Then I changed all core scripts to use sharedModule.agr instead of just 'agr' and that seems to work the way I want it.

Related

Choose Python classes to instantiate at runtime based on either user input or on command line parameters

I am starting a new Python project that is supposed to run both sequentially and in parallel. However, because the behavior is entirely different, running in parallel would require a completely different set of classes than those used when running sequentially. But there is so much overlap between the two codes that it makes sense to have a unified code and defer the parallel/sequential behavior to a certain group of classes.
Coming from a C++ world, I would let the user set a Parallel or Serial class in the main file and use that as a template parameter to instantiate other classes at runtime. In Python there is no compilation time so I'm looking for the most Pythonic way to accomplish this. Ideally, it would be great that the code determines whether the user is running sequentially or in parallel to select the classes automatically. So if the user runs mpirun -np 4 python __main__.py the code should behave entirely different than when the user calls just python __main__.py. Somehow it makes no sense to me to have if statements to determine the type of an object at runtime, there has to be a much more elegant way to do this. In short, I would like to avoid:
if isintance(a, Parallel):
m = ParallelObject()
elif ifinstance(a, Serial):
m = SerialObject()
I've been reading about this, and it seems I can use factories (which somewhat have this conditional statement buried in the implementation). Yet, using factories for this problem is not an option because I would have to create too many factories.
In fact, it would be great if I can just "mimic" C++'s behavior here and somehow use Parallel/Serial classes to choose classes properly. Is this even possible in Python? If so, what's the most Pythonic way to do this?
Another idea would be to detect whether the user is running in parallel or sequentially and then load the appropriate module (either from a parallel or sequential folder) with the appropriate classes. For instance, I could have the user type in the main script:
from myPackage.parallel import *
or
from myPackage.serial import *
and then have the parallel or serial folders import all shared modules. This would allow me to keep all classes that differentiate parallel/serial behavior with the same names. This seems to be the best option so far, but I'm concerned about what would happen when I'm running py.test because some test files will load parallel modules and some other test files would load the serial modules. Would testing work with this setup?
You may want to check how a similar issue is solved in the stdlib: https://github.com/python/cpython/blob/master/Lib/os.py - it's not a 100% match to your own problem, nor the only possible solution FWIW, but you can safely assume this to be a rather "pythonic" solution.
wrt/ the "automagic" thing depending on execution context, if you decide to go for it, by all means make sure that 1/ both implementations can still be explicitely imported (like os.ntpath and os.posixpath) so they are truly unit-testable, and 2/ the user can still manually force the choice.
EDIT:
So if I understand it correctly, this file you points out imports modules depending on (...)
What it "depends on" is actually mostly irrelevant (in this case it's a builtin name because the target OS is known when the runtime is compiled, but this could be an environment variable, a command line argument, a value in a config file etc). The point was about both conditional import of modules with same API but different implementations while still providing direct explicit access to those modules.
So in a similar way, I could let the user type from myPackage.parallel import * and then in myPackage/init.py I could import all the required modules for the parallel calculation. Is this what you suggest?
Not exactly. I posted this as an example of conditional imports mostly, and eventually as a way to build a "bridge" module that can automagically select the appropriate implementation at runtime (on which basis it does so is up to you).
The point is that the end user should be able to either explicitely select an implementation (by explicitely importing the right submodule - serial or parallel and using it directly) OR - still explicitely - ask the system to select one or the other depending on the context.
So you'd have myPackage.serial and myPackage.parallel (just as they are now), and an additional myPackage.automagic that dynamically selects either serial or parallel. The "recommended" choice would then be to use the "automagic" module so the same code can be run either serial or parallel without the user having to care about it, but with still the ability to force using one or the other where it makes sense.
My fear is that py.test will have modules from parallel and serial while testing different files and create a mess
Why and how would this happen ? Remember that Python has no "process-global" namespace - "globals" are really "module-level" only - and that python's import is absolutely nothing like C/C++ includes.
import loads a module object (can be built directly from python source code, or from compiled C code, or even dynamically created - remember, at runtime a module is an object, instance of the module type) and binds this object (or attributes of this object) into the enclosing scope. Also, modules are garanteed (with a couple caveats, but those are to be considered as error cases) to be imported only once for a given process (and then cached) so importing the same module twice in a same process will yield the same object (IOW a module is a singleton).
All this means that given something like
# module A
def foo():
return bar(42)
def bar(x):
return x * 2
and
# module B
def foo():
return bar(33)
def bar(x):
return x / 2
It's garanteed that however you import from A and B, A.foo will ALWAYS call A.bar and NEVER call B.bar and B.foo will only ever call B.bar (unless you explicitely monkeyptach them of course but that's not the point).
Also, this means that within a module you cannot have access to the importing namespace (the module or function that's importing your module), so you cannot have a module depending on "global" names set by the importer.
To make a long story short, you really need to forget about C++ and learn how Python works, as those are wildly different languages with wildly different object models, execution models and idioms. A couple interesting reads are http://effbot.org/zone/import-confusion.htm and https://nedbatchelder.com/text/names.html
EDIT 2:
(about the 'automagic' module)
I would do that based on whether the user runs mpirun or just python. However, it seems it's not possible (see for instance this or this) in a portable way without a hack. Any ideas in that direction?
I've never ever had anything to do with mpi so I can't help with this - but if the general consensus is that there's no reliable portable way to detect this then obviously there's your answer.
This being said, simple stupid solutions are sometimes overlooked. In your case, explicitly setting an environment variable or passing a command-line switch to your main script would JustWork(tm), ie the user should for example use
SOMEFLAG=serial python main.py
vs
SOMEFLAG=parallel mpirun -np4 python main.py
or
python main.py serial
vs
mpirun -np4 python main.py parallel
(whichever works best for you needs - is the most easily portable).
This of course requires a bit more documentation and some more effort from the end-user but well...
I'm not really what you're asking here. Python classes are just (callable/instantiable) objects themselves, so you can of course select and use them conditionally. If multiple classes within multiple modules are involved, you can also make the imports conditional.
if user_says_parallel:
from myPackage.parallel import ParallelObject
ObjectClass = ParallelObject
else:
from myPackage.serial import SerialObject
ObjectClass = SerialObject
my_abstract_object = ObjectClass()
If that's very useful depends on your classes and the effort it takes to make sure they have the same API so they're compatible when replacing each other. Maybe even inheritance à la ParallelObject => SerialObject is possible, or at least a common (virtual) base class to put all the shared code. But that's just the same as in C++.

python reduce sqlite3 db lookups

I am trying to reduce sqlite3 db look ups in python. I have system with limited RAM of 1 GB only where i am implementing it. I want to store current DB values somewhere from where i can retrieve them without consulting DB again and again. One thing to keep in mind is that trigger point of all of my python scripts (processes) is different and there is no master script or you can say i am not controlling all of my scripts from one point.
What's in my knowledge:
I don't want to save/retrieve data from file as i don't want to make read/write operations. In a nutshell i don't want to manipulate it via file (Simply saying no to pickel and shelve python modules)
I also cannot use in memory cache modules like memcahced and beaker because of limitation of memory size and also these modules are intended for server side development and i am working on stand alone scripts (iot device)
I cannot use singleton classes because of limitation of namespaces and scope. As soon as scope of one script ends, instance of singleton also vanishes and i am not able to persist instance of singleton class in all of my python scripts. I am not able to use static variables and static methods too because instance does not stick in scope and everything becomes volatile and goes back to initialized value instead of current DB values every next time i import singleton class script in any of my other scripts.
As trigger point of all of my python scripts is different which also makes it impossible to use global variables too. Global variables are required to be initialized with some value whereas i want current DB values in global variables.
I also cannot do memory segmentation as python does not allow me to do so.
What else can i do more?
Is there any python library or any other language's library which allows me to insert current DB values so that i instead of looking up from Sqlite3 DB i get values from there without doing any read/write operation?? (By read/write operation i mean not to load from hard drive or sd card)
Thanks in advance, any help from you is highly appreciated.

Save A Reloaded Python Module For Testing Purposes

I have a Python module that I am testing, and because of the way that the module works (it does some initialization upon import) have been reloading the module during each unittest that is testing the initialization. The reload is done in the setUp method, so all tests are actually reloading the module, which is fine.
This all works great if I am only running tests in that file during any given Python session because I never required a reference to the previous instance of the module. But when I use Pydev or unittest's discover I get errors as seen here because other tests which import this module have lost their reference to objects in the module since they were imported before all of the reloading business in my tests.
There are similar questions around SO like this one, but those all deal with updating objects after reloads have occurred. What I would like to do is save the state of the module after the initial import, run my tests that do all of the reloading, and then in the test tearDown to put the initial reference to the module back so that tests that run downstream that use the module still have the correct reference. Note that I am not making any changes to the module, I am only reloading it to test some initialization pieces that it does.
There are also some solutions that include hooks in the module code which I am not interested in. I don't want to ask developers to push things into the codebase just so tests can run. I am using Python 2.6 and unittest. I see that some projects exist like process-isolation, and while I am not sure if that does entirely what I am asking for, it does not work for Python 2.6 and I don't want to add new packages to our stack if possible. Stub code follows:
import mypackage.mymodule
saved_module = mypackage.mymodule
class SomeTestThatReloads(unittest.TestCase):
def setUp(self):
reload(mypackage.mymodule)
def tearDown(self):
# What to do here with saved_module?
def test_initialization(self):
# testing scenario code
Unfortunately, there is no simple way to do that. If your module's initialization has side effects (and by the looks of it it does -- hooks, etc.), there is no automated way to undo them, short of entirely restarting the Python process.
Similarly, if anything in your code imports something from your module rather than the module itself (e.g. from my_package.my_module import some_object instead of import my_package.my_module), reloading the module won't do anything to the imported objects (some_object will refer to whatever my_package.my_module.some_object referred to when the import statement was executed, regardless of what you reload and what's on the disk).
The problem this all comes down to is that Python's module system works by executing the modules (which is full of side effects, the definition of classes/functions/variables being only one of many) and then exposing the top-level variables they created, and the Python VM itself treats modules as one big chunk of global state with no isolation.
Therefore, the general solution to your problem is to restart a new Python process after each test (which sucks :( ).
If your modules' initialization side effects are limited, you can try running your tests with Nose instead of Unittest (the tests are compatible, you don't have to rewrite anything), whose Isolate plugin attempts to do what you want: http://nose.readthedocs.org/en/latest/plugins/isolate.html
But it's not guaranteed to work in the general case, because of what I said above.

Import statement: Config file Python

I'm maintaining a dictionary and that is loaded inside the config file. The dictionary is loaded from a JSON file.
In config.py
name_dict = json.load(open(dict_file))
I'm importing this config file in several other scripts(file1.py, file2.py,...,filen.py) using
import config
statement. My question is when will the config.py script be executed ? I'm sure it wont be executed for every import call that is made inside my multiple scripts. But, what exactly happens when an import statement is called.
The top-level code in a module is executed once, the first time you import it. After that, the module object will be found in sys.modules, and the code will not be re-executed to re-generate it.
There are a few exceptions to this:
reload, obviously.
Accidentally importing the same module under two different names (e.g., if the module is in a package, and you've got some directory in the middle of the package in sys.path, you could end up with mypackage.mymodule and mymodule being two copies of the same thing, in which case the code gets run twice).
Installing import hooks/custom imported that replace the standard behavior.
Explicitly monkeying with sys.modules.
Directly calling functions out of imp/importlib or the like.
Certain cases with multiprocessing (and modules that use it indirectly, like concurrent.futures).
For Python 3.1 and later, this is all described in detail under The import system. In particular, look at the Searching section. (The multiprocessing-specific cases are described for that module.)
For earlier versions of Python, you pretty much have to infer the behavior from a variety of different sources and either reading the code or experimenting. However, the well-documented new behavior is intended to work like the old behavior except in specifically described ways, so you can usually get away with reading the 3.x docs even for 2.x.
Note that in general, you don't want to rely on whether top-level code in the module is run once or multiple times. For example, given a top-level function definition, as long as you never compare function objects, or rebind any globals that it (meaning the definition itself, not just the body) depends on, it doesn't make any difference. However, there are some exceptions to that, and loading start-time config files is a perfect example of an exception.

Reloading global Python variables in a long running process

I have celery Python worker processes that are restarted every day or so. They execute Python/Django programs.
I have set certain quasi-global values that should persist in memory for the duration of the process. Namely, I have certain MySQL querysets that do not change often and are therefore evaluated one time and stored as a CONSTANT as soon as the process starts (a bad example being PROFILE = Profile.objects.get(user_id=5)).
Let's say that I want to reset this value in the celery process without exec-ing a whole new program.
This value is imported (and used) in a number of different modules. I'm assuming I'd have to go through each one in sys.modules that imports the CONSTANT and delete/reset the key? Is that right?
This seems very hacky. I usually use external services like Memcached for coordination of memory among multiple processes, but every once in a while, I figure local memory is preferable to over the network calls to a NoSQL store.
It's a bit hard to say without seeing some code, but importing just sets a reference, exactly as with variable assignment: that is, if the data changes, the references change too. Naturally though this only works if it's the parent context that you've imported (otherwise assignment will change the reference, rather than updating the value.)
In other words, if you do this:
from mypackage import mymodule
do_something_with(mymodule.MY_CONSTANT)
#elsewhere
mymodule.MY_CONSTANT = 'new_value'
then all references to mymodule.MY_CONSTANT will get the new value. But if you did this:
from mypackage.mymodule import MY_CONSTANT
# elsewhere
mymodule.MY_CONSTANT = 'new_value'
the original reference won't get the new value, because you've rebound MY_CONSTANT to something else but the first reference is still pointing at the old value.

Categories

Resources