Multiple scripts access the same module with the same data in python?

Multiple scripts access the same module with the same data in python? - python

Recently I have been trying to make a makeshift "disk space" reader. I made a library that stores values in a list "the disk" and when I subprocess a new script to write to the "disk" to see if the values change on the display nothing happens. I realized that any time you import a module the module sort of clones itself to only that script.
I want to be able to have scripts import the same module and so that if 1 script changes a value another script can see that value.
Here is my code for the "disk" system
import time
ram = []
space = 256000
lastspace = 0
for i in range(0,space + 1):
ram.append('')
def read(location):
try:
if ram[int(location)] == '':
return "ERR_NO_VALUE"
else:
return ram[int(location)]
except:
return "ERR_OUT_OF_RANGE"
def write(location, value):
try:
ram[int(location)] = value
except:
return "ERR_OUT_OF_RANGE"
def getcontents():
contents = []
for i in range(0, 256001):
contents.append([str(i)+ '- ', ram[i]])
return contents
def getrawcontents():
contents = []
for i in range(0, 256001):
contents.append(ram[i])
return contents
def erasechunk(beg, end):
try:
for i in range(int(beg), int(end) + 1):
ram[i] = ''
except:
return "ERR_OUT_OF_RANGE"
def erase(location):
ram[int(location)] = ''
def reset():
ram = []
times = space/51200
tc = 0
for i in range(0,round(times)):
for x in range(0,51201):
ram.append('')
tc += 1
print("Byte " + str(tc) + " of " + " Bytes")
for a in range(0,100):
print('\a', end='')
return [len(ram), ' bytes']
def wipe():
for i in range(0,256001):
ram[i] = ''
return "WIPED"
def getspace():
x = 0
for i in range(0,len(ram)):
if ram[i] != "":
x += 1
return [x,256000]

The shortest answer to your question, which I'm understanding as "if I import the same function into two (or more) Python namespaces, can they interact with each other?", is no. What actually happens when you import a module is that Python uses the source script to 'build' those functions in the namespace you're importing them to; there is no sense of permanence in "where the module came from" since that original module isn't actually running in a Python process anywhere! When you import those functions into multiple scripts, it's just going to create those pseudo-global variables (in your case ram) with the function you're importing.
Python import docs: https://docs.python.org/3/reference/import.html
The whole page on Python's data model, including what __globals__ means for functions and modules: https://docs.python.org/3/reference/datamodel.html
Explanation:
To go into a bit more depth, when you import any of the functions from this script (let's assume it's called 'disk.py'), you'll get an object in that function's __globals__ dict called ram, which will indeed work as you expect for these functions in your current namespace:
from disk import read,write
write(13,'thing')
print(read(13)) #prints 'thing'
We might assume, since these functions are accurately accessing our ram object, that the ram object is being modified somehow in the namespace of the original script, which could then be accessed by a different script (a different Python process). Looking at the namespace of our current script using dir() might support that notion, since we only see read and write, and not ram. But the secret is that ram is hidden in those functions' __globals__ dict (mentioned above), which is how the functions are interacting with ram:
from disk import read,write
print(type(write.__globals__['ram'])) #<class 'list'>
print(write.__globals__['ram'] is read.__globals__['ram']) #True
write(13,'thing')
print(read(13)) #'thing'
print(read.__globals__['ram'][13]) #'thing'
As you can see, ram actually is a variable defined in the namespace of our current Python process, hidden in the functions' __globals__ dict, which is actually the exact same dictionary for any function imported from the same module; read.__globals__ is write.__globals__ evaluates to True (even if you don't import them at the same time!).
So, to wrap it all up, ram is contained in the __globals__ dict for the disk module, which is created separately in the namespace of each process you import into:
Python interpreter #1:
from disk import read,write
print(id(read.__globals__),id(write.__globals__)) #139775502955080 139775502955080
Python interpreter #2:
from disk import read,write
print(id(read.__globals__),id(write.__globals__)) #139797009773128 139797009773128
Solution hint:
There are many approaches on how to do this practically that are beyond the scope of this answer, but I will suggest that pickle is the standard way to send objects between Python interpreters using files, and has a really standard interface. You can just write, read, etc your ram object using a pickle file. To write:
import pickle
with open('./my_ram_file.pkl','wb') as ram_f:
pickle.dump(ram,ram_f)
To read:
import pickle
with open('./my_ram_file.pkl','rb') as ram_f:
ram = pickle.load(ram_f)

Related

Dask delayed function call with non-passed parameters

I am seeking to better understand the following behavior when using dask.delayed to call a function that depends on parameters. The issue seems to arise when parameters are specified in a parameters file read by configparser. Here is a complete example:
parameter file:
#zpar.ini: parameter file for configparser
[my pars]
my_zpar = 2.
parser:
#zippy_parser
import configparser
def read(_rundir):
global rundir
rundir = _rundir
cp = configparser.ConfigParser()
cp.read(rundir + '/zpar.ini')
#[my pars]
global my_zpar
my_zpar = cp['my pars'].getfloat('my_zpar')
and the main python file:
# dask test with configparser
import dask
from dask.distributed import Client
import zippy_parser as zpar
def my_func(x, y):
# print stuff
print("parameter from main is: {}".format(main_par))
print("parameter from configparser is: {}".format(zpar.my_zpar))
# do stuff
return x + y
if __name__ == '__main__':
client = Client(n_workers = 4)
#read parameters from input file
rundir = '/path/to/parameter/file'
zpar.read(rundir)
#test zpar
print("zpar is {}".format(zpar.my_zpar))
#define parameter and call my_func
main_par = 5.
z = dask.delayed(my_func)(1., 2.)
z.compute()
client.close()
The first print statement in my_func() executes just fine, but the second print statement raises an exception. The output is:
zpar is 2.0
parameter from main is: 5.0
distributed.worker - WARNING - Compute Failed
Function: my_func
args: (1.0, 2.0)
kwargs: {}
Exception: AttributeError("module 'zippy_parser' has no attribute 'my_zpar'",)
I am new to dask. I suppose this has something to do with the serialization, which I do not understand. Can someone enlighten me and/or point to relevant documentation? Thanks!

I will try to keep this brief.
When a function is serialised in order to be sent to workers, python also sends local variables and functions needed by the function (its "closure"). However, it stores the modules it references by name, it does not try to serialise your whole runtime.
This means that zippy_parser is imported in the worker, not deserialised. Since the function read has never been called
in the worker, the global variable is never initialised.
So, you could call read in the workers as part of your function or otherwise, but probably the pattern or setting module-global variables from with a function isn't great. Dask's delayed mechanism prefers functional purity, that the result you get should not depend on the current state of the runtime.
(note that if you had created the client after calling read in the main script, the workers might have got the in-memory version, depending on how subprocesses are configured to be created on your system)

I encourage you to pass in all parameters to your dask delayed functions explicitly, rather than relying on the global namespace.

CANoe: How to select and start test cases from XML Test Module from Python using CANoe COM interface?

currently I am able to:
start CANoe application
load a CANoe configuration file
load a test setup file
def load_test_setup(self, canoe_test_setup_file: str = None) -> None:
logger.info(
f'Loading CANoe test setup file <{canoe_test_setup_file}>.')
if self.measurement.Running:
logger.info(
f'Simulation is currently running, so new test setup could \
not be loaded!')
return
self.test_setup.TestEnvironments.Add(canoe_test_setup_file)
test_environment = self.test_setup.TestEnvironments.Item(1)
logger.info(f'Loaded test environment is <{test_environment.Name}>.')
How can I access the XML Test Module loaded with the test setup (tse) file and select tests to be executed?

The last before line in your snippet is most probably causing the issue.
I have been trying to fix this issue for quite some time now and finally found the solution.
Somehow when you execute the line self.test_setup.TestEnvironments.Item(1)
win32com creates an object of type TestSetupItem which doesn't have the necessary properties or methods to access the test cases. Instead we want to access objects of collection types TestSetupFolders or TestModules. win32com creates object of TestSetupItem type even though I have a single XML Test Module (called AutomationTestSeq) in the Test Environment as you can see here.
There are three possible solutions that I found.
Manually clearing the generated cache before each run.
Using win32com.client.DispatchWithEvents or win32com.client.gencache.EnsureDispatch generates a bunch of python files that describe CANoe's object model.
If you had used either of those before, TestEnvironments.Item(1) will always return TestSetupItem instead of the more appropriate type objects.
To remove the cache you need to delete the C:\Users\{username}\AppData\Local\Temp\gen_py\{python version} folder.
Doing this every time is of course not very practical.
Force win32com to always use dynamic dispatch.
You can do this by using:
canoe = win32com.client.dynamic.Dispatch("CANoe.Application")
Any objects you create using canoe from now on, will be dynamically dispatched.
Forcing dynamic dispatch is easier than manually clearing the cache folder every time. This gave me good results always. But doing this will not let you have any insight into the objects. You won't be able to see the acceptable properties and methods for the objects.
Typecast TestSetupItem to TestSetupFolders or TestModules.
This has the risk that if you typecast incorrectly, you will get unexpected results. But has worked well for me so far.
In short: win32.CastTo(test_env, "ITestEnvironment2"). This will ensure that you are using the recommended object hierarchy as per CANoe technical reference.
Note that you will also have to typecast TestSequenceItem to TestCase to be able to access test case verdict and enable/disable test cases.
Below is a decent example script.
"""Execute XML Test Cases without a pass verdict"""
import sys
from time import sleep
import win32com.client as win32
CANoe = win32.DispatchWithEvents("CANoe.Application")
CANoe.Open("canoe.cfg")
test_env = CANoe.Configuration.TestSetup.TestEnvironments.Item('Test Environment')
# Cast required since test_env is originally of type <ITestEnvironment>
test_env = win32.CastTo(test_env, "ITestEnvironment2")
# Get the XML TestModule (type <TSTestModule>) in the test setup
test_module = test_env.TestModules.Item('AutomationTestSeq')
# {.Sequence} property returns a collection of <TestCases> or <TestGroup>
# or <TestSequenceItem> which is more generic
seq = test_module.Sequence
for i in range(1, seq.Count+1):
# Cast from <ITestSequenceItem> to <ITestCase> to access {.Verdict}
# and the {.Enabled} property
tc = win32.CastTo(seq.Item(i), "ITestCase")
if tc.Verdict != 1: # Verdict 1 is pass
tc.Enabled = True
print(f"Enabling Test Case {tc.Ident} with verdict {tc.Verdict}")
else:
tc.Enabled = False
print(f"Disabling Test Case {tc.Ident} since it has already passed")
CANoe.Measurement.Start()
sleep(5) # Sleep because measurement start is not instantaneous
test_module.Start()
sleep(1)

Just continue what you have done.
The TestEnvironment contains the TestModules. Each TestModule contains a TestSequence which in turn contains the TestCases.
Keep in mind that you cannot individual TestCases but only the TestModule. But you can enable and disable individual TestCases before execution by using the COM-API.
(typing this from the top of my head, might not work 100%)
test_module = test_environment.TestModules.Item(1) # of 2 or whatever
test_sequence = test_module.Sequence
for i in range(1, test_sequence.Count + 1):
test_case = test_sequence.Item(i)
if ...:
test_case.Enabled = False # or True
test_module.Start()
You have to keep in mind that a TestSequence can also contain other TestSequences (i.e. a TestGroup). This depends on how your TestModule is setup. If so, you have to take care of that in your loop and descend into these TestGroups while searching for your TestCase of interest.

How to create a variable whose value persists across file reload?

Common Lisp has defvar which
creates a global variable but only sets it if it is new: if it already
exists, it is not reset. This is useful when reloading a file from a long running interactive process, because it keeps the data.
I want the same in Python.
I have file foo.py which contains something like this:
cache = {}
def expensive(x):
try:
return cache[x]
except KeyError:
# do a lot of work
cache[x] = res
return res
When I do imp.reload(foo), the value of cache is lost which I want
to avoid.
How do I keep cache across reload?
PS. I guess I can follow How do I check if a variable exists? :
if 'cache' not in globals():
cache = {}
but it does not look "Pythonic" for some reason...
If it is TRT, please tell me so!
Answering comments:
I am not interested in cross-invocation persistence; I am already handling that.
I am painfully aware that reloading changes class meta-objects and I am already handling that.
The values in cache are huge, I cannot go to disk every time I need them.

Here are a couple of options. One is to use a temporary file as persistent storage for your cache, and try to load every time you load the module:
# foo.py
import tempfile
import pathlib
import pickle
_CACHE_TEMP_FILE_NAME = '__foo_cache__.pickle'
_CACHE = {}
def expensive(x):
try:
return _CACHE[x]
except KeyError:
# do a lot of work
_CACHE[x] = res
_save_cache()
return res
def _save_cache():
tmp = pathlib.Path(tempfile.gettempdir(), _CACHE_TEMP_FILE_NAME)
with tmp.open('wb') as f:
pickle.dump(_CACHE, f)
def _load_cache():
global _CACHE
tmp = pathlib.Path(tempfile.gettempdir(), _CACHE_TEMP_FILE_NAME)
if not tmp.is_file():
return
try:
with tmp.open('rb') as f:
_CACHE = pickle.load(f)
except pickle.UnpicklingError:
pass
_load_cache()
The only issue with this is that you need to trust the environment not to write anything malicious in place of the temporary file (the pickle module is not secure against erroneous or maliciously constructed data).
Another option is to use another module for the cache, one that does not get reloaded:
# foo_cache.py
Cache = {}
And then:
# foo.py
import foo_cache
def expensive(x):
try:
return foo_cache.Cache[x]
except KeyError:
# do a lot of work
foo_cache.Cache[x] = res
return res

Since the whole point of a reload is to ensure that the executed module's code is run a second time, there is essentially no way to avoid some kind of "reload detection."
The code you use appears to be the best answer from those given in the question you reference.

netcdf4-python: memory increasing with numerous calls to slice data from netcdf object

I'm trying to read data slices from a netcdf4 file using netcdf4-python. This is the first time using python and I am running into memory issues. Below is a simplified version of the code. On each iteration of the loop memory jumps by the equivalent of the data slice I read. How can I clean up the memory as I iterate over each variable?
#!/usr/bin/env python
from netCDF4 import Dataset
import os
import sys
import psutil
process = psutil.Process(os.getpid())
def print_memory_usage():
nr_mbytes = process.get_memory_info()[0] / 1048576.0
sys.stdout.write("{}\n".format(nr_mbytes))
sys.stdout.flush()
# open input file and gather variable info
rootgrp_i = Dataset('data.nc','r')
vargrp_i = rootgrp_i.variables
# lets create a dictionary to store the metadata in
subdomain = {}
for suff in range(1000):
for var in vargrp_i:
v_i = vargrp_i[var]
if v_i.ndim == 1:
a=v_i[:]
elif v_i.ndim == 2:
a=v_i[0:20, 0:20]
elif v_i.ndim == 3:
a=v_i[0, 0:20, 0:20]
elif v_i.ndim == 4:
a=v_i[0, 0:75, 0:20, 0:20]
else:
a=v_i[0]
del a
print_memory_usage()
rootgrp_i.close()

I think the problem is a misinterpretation of del a meaning.
According to Python Language Reference:
Deletion of a name removes the binding of that name from the local or global namespace, depending on whether the name occurs in a global statement in the same code block.
This means that del a dereference the a variable, but this doesn't imply the memory will be immediately released, this depends on how the garbage collector works. You can ask the garbage collector to collects new garbage using the collect() method:
import gc
gc.collect()
This related post can be useful.

how to find list of modules which depend upon a specific module in python

In order to reduce development time of my Python based web application, I am trying to use reload() for the modules I have recently modified. The reload() happens through a dedicated web page (part of the development version of the web app) which lists the modules which have been recently modified (and the modified time stamp of py file is later than the corresponding pyc file). The full list of modules is obtained from sys.modules (and I filter the list to focus on only those modules which are part of my package).
Reloading individual python files seems to work in some cases and not in other cases. I guess, all the modules which depend on a modified module should be reloaded and the reloading should happen in proper order.
I am looking for a way to get the list of modules imported by a specific module. Is there any way to do this kind of introspection in Python?
I understand that my approach might not be 100% guaranteed and the safest way would be to reload everything, but if a fast approach works for most cases, it would be good enough for development purposes.
Response to comments regarding DJango autoreloader
#Glenn Maynard, Thanx, I had read about DJango's autoreloader. My web app is based on Zope 3 and with the amount of packages and a lot of ZCML based initializations, the total restart takes about 10 seconds to 30 seconds or more if the database size is bigger. I am attempting to cut down on this amount of time spent during restart. When I feel I have done a lot of changes, I usually prefer to do full restart, but more often I am changing couple of lines here and there for which I do not wish to spend so much of time. The development setup is completely independent of production setup and usually if something is wrong in reload, it becomes obvious since the application pages start showing illogical information or throwing exceptions. Am very much interested in exploring whether selective reload would work or not.

So - this answers "Find a list of modules which depend on a given one" - instead of how the question was initally phrased - which I answered above.
As it turns out, this is a bit more complex: One have to find the dependency tree for all loaded modules, and invert it for each module, while preserving a loading order that would not break things.
I had also posted this to brazillian's python wiki at:
http://www.python.org.br/wiki/RecarregarModulos
#! /usr/bin/env python
# coding: utf-8
# Author: João S. O. Bueno
# Copyright (c) 2009 - Fundação CPqD
# License: LGPL V3.0
from types import ModuleType, FunctionType, ClassType
import sys
def find_dependent_modules():
"""gets a one level inversed module dependence tree"""
tree = {}
for module in sys.modules.values():
if module is None:
continue
tree[module] = set()
for attr_name in dir(module):
attr = getattr(module, attr_name)
if isinstance(attr, ModuleType):
tree[module].add(attr)
elif type(attr) in (FunctionType, ClassType):
tree[module].add(attr.__module__)
return tree
def get_reversed_first_level_tree(tree):
"""Creates a one level deep straight dependence tree"""
new_tree = {}
for module, dependencies in tree.items():
for dep_module in dependencies:
if dep_module is module:
continue
if not dep_module in new_tree:
new_tree[dep_module] = set([module])
else:
new_tree[dep_module].add(module)
return new_tree
def find_dependants_recurse(key, rev_tree, previous=None):
"""Given a one-level dependance tree dictionary,
recursively builds a non-repeating list of all dependant
modules
"""
if previous is None:
previous = set()
if not key in rev_tree:
return []
this_level_dependants = set(rev_tree[key])
next_level_dependants = set()
for dependant in this_level_dependants:
if dependant in previous:
continue
tmp_previous = previous.copy()
tmp_previous.add(dependant)
next_level_dependants.update(
find_dependants_recurse(dependant, rev_tree,
previous=tmp_previous,
))
# ensures reloading order on the final list
# by postponing the reload of modules in this level
# that also appear later on the tree
dependants = (list(this_level_dependants.difference(
next_level_dependants)) +
list(next_level_dependants))
return dependants
def get_reversed_tree():
"""
Yields a dictionary mapping all loaded modules to
lists of the tree of modules that depend on it, in an order
that can be used fore reloading
"""
tree = find_dependent_modules()
rev_tree = get_reversed_first_level_tree(tree)
compl_tree = {}
for module, dependant_modules in rev_tree.items():
compl_tree[module] = find_dependants_recurse(module, rev_tree)
return compl_tree
def reload_dependences(module):
"""
reloads given module and all modules that
depend on it, directly and otherwise.
"""
tree = get_reversed_tree()
reload(module)
for dependant in tree[module]:
reload(dependant)
This wokred nicely in all tests I made here - but I would not recoment abusing it.
But for updating a running zope2 server after editing a few lines of code, I think I would use this myself.

You might want to take a look at Ian Bicking's Paste reloader module, which does what you want already:
http://pythonpaste.org/modules/reloader?highlight=reloader
It doesn't give you specifically a list of dependent files (which is only technically possible if the packager has been diligent and properly specified dependencies), but looking at the code will give you an accurate list of modified files for restarting the process.

Some introspection to the rescue:
from types import ModuleType
def find_modules(module, all_mods = None):
if all_mods is None:
all_mods = set([module])
for item_name in dir(module):
item = getattr(module, item_name)
if isinstance(item, ModuleType) and not item in all_mods:
all_mods.add(item)
find_modules(item, all_mods)
return all_mods
This gives you a set with all loaded modules - just call the function with your first module as a sole parameter. You can then iterate over the resulting set reloading it, as simply as:
[reload (m) for m in find_modules(<module>)]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.