Python dynamic module loading with multiprocessing - python

The following code works if the module "user.py" is in the same directory as the code, but fails if it is in a different directory. The error message I get is "ModuleNotFoundError: No module named 'user'
import multiprocessing as mp
import imp
class test():
def __init__(self,pool):
pool.processes=1
usermodel=imp.load_source('user','D:\\pool\\test\\user.py').userfun
#file D:\\pool\\test\\user.py looks like this:
# def userfun():
# return 1
vec=[]
for i in range(10):
vec.append([usermodel,i])
pool.map(self.myfunc,vec)
def myfunc(self,A):
userfun=A[0]
i=A[1]
print (i,userfun())
return
if __name__=='__main__':
pool=mp.Pool()
test(pool)
If the function myfunc is called without the pooled process the code is fine regardless of whether user.py is in the same directory of the main code or in \test. Why can't the pooled process find user.py in a separate directory? I have tried different methods such as modifying my path then import user, and importlib, all with the same results.
I am using windows 7 and python 3.6

multiprocessing tries to pretend it's just like threading, but the abstraction leaks like a sieve. One of the ways it leaks is that communicating with worker processes involves a lot of implicit pickling and data copying.
When you try to send usermodel to a worker, multiprocessing implicitly pickles it and tries to have the worker unpickle the pickle. Functions are pickled by recording the module name and function name, so the worker just thinks it's supposed to do from user import userfun to access userfun. It doesn't know that user needs to be loaded with imp.load_source from a specific filesystem location, so it can't reconstruct usermodel.
The way this problem manifests is OS-dependent, because if multiprocessing uses the fork start method, the workers inherit the user module from the master process. fork is default on Unix, but unavailable on Windows.

Related

Passing imports to a script from inside an external method

I have kind of a tricky question, so that it is difficult to even describe it.
Suppose I have this script, which we will call master:
#in master.py
import slave as slv
def import_func():
import time
slv.method(import_func)
I want to make sure method in slave.py, which looks like this:
#in slave.py
def method(import_func):
import_func()
time.sleep(10)
actually runs like I imported the time package. Currently it does not work, I believe because the import stays exists only in the scope of import_func().
Keep in mind that the rules of the game are:
I cannot import anything in slave.py outside method
I need to pass the imports which method needs through import_func() in master.py
the procedure must work for a variable number of imports inside method. In other words, method cannot know how many imports it will receive but needs to work nonetheless.
the procedure needs to work for any import possible. So options like pyforest are not suitable.
I know it can theoretically be done through importlib, but I would prefer a more straightforward idea, because if we have a lot of imports with different 'as' labels it would become extremely tedious and convoluted with importlib.
I know it is kind of a quirky question but I'd really like to know if it is possible. Thanks
What you can do is this in the master file:
#in master.py
import slave as slv
def import_func():
import time
return time
slv.method(import_func)
Now use time return value in the slave file:
#in slave.py
def method(import_func):
time = import_func()
time.sleep(10)
Why would you have to do this? It's because of the application's stack. When import_func() is called on slave.py, it imports the library on the stack. However, when the function terminates, all stack data is released from memory. So the library would get released and collected by the garbage collector.
By returning time from import_func(), you guarantee it continues existing in memory once the function terminates executing.
Now, to import more modules? Simple. Return a list with multiples modules inside. Or maybe a Dictionary for simple access. That's one way of doing it.
[Edit] Using a dictionary and importlib to pass multiple imports to slave.py:
master.py:
import test2 as slv
import importlib
def master_import(packname, imports={}):
imports[packname] = importlib.import_module(packname)
def import_func():
imports = {}
master_import('time', imports)
return imports
slv.method(import_func)
slave.py:
#in slave.py
def method(import_func):
imports = import_func()
imports['time'].sleep(10)
This way, you can literally import any modules you want on master.py side, using master_import() function, and pass them to slave script.
Check this answer on how to use importlib.

Shared global state in a qt5 python 3.6 program

I am attempting to use a module named "global_data" to save global state information without success. The code is getting large already, so I`ll try to post only the bare essentials.
from view import cube_control
from ioserver.ioserver import IOServer
from manager import config_manager, global_data
if __name__ == "__main__":
#sets up initial data
config_manager.init_manager()
#modifies data
io = IOServer()
#verify global data modified from IOServer.__init__
global_data.test() #success
#start pyqt GUI
cube_control.start_view()
So far so good. However in the last line cube_control.start_view() it enters this code:
#inside cube_control.py
def start_view():
#verify global data modified from IOServer.__init__
global_data.test() #fail ?!?!
app = QApplication(sys.argv)
w = MainWindow()
sys.exit(app.exec_())
Running the global_data.test() in this case fails. printing the entire global state reveals it now somehow reverted back to the data setup by config_manager.init_manager()
How is this possible?
While Qt is running I have a scheduler called every 10 seconds, also reporting a failed test.
However once the Qt GUI is stopped (clicked "x"), and I run the test from console, it succeeds again.
Inside the global_data module I`ve attempted to store the data in a dict inside both a simple python object as well as a ZODB in memory database:
#inside global_data
state = {
"units" : {}
}
db = ZODB.DB(None) #creates an in memory db
def test(identity="no-id"):
con = db.open()
r = con.root()
print("test online: ", r["units"]["local-test"]["online"], identity)
con.close()
Both have the exact same problem. Above the test is only done using the db.
The reason I attempted to use a db is that I understand threads can create a completely new global dictionary. However the 2 first tests are in the same thread. The cyclic one is in its own thread and could potentially create such a problem...?
File organization
If it helps my program is organized with the following structure:
There is also a "view" folder with some qt5 GUI files.
The IOServer attempts to connect to a bunch of OPC-UA servers using the opcua module. No threads are manually started there, although I suppose the opcua module does to stay connected.
global_data id()
I attempted to also print(id(global_data)) together with the tests and found that the ID is the same in IOServer AND top level code, but changes inside the cube_control.py#start_view. Should not these always refer to the same module?
I`m still not sure what exactly happened. But apparently this was solved by removing the init.py file inside of the folder named manager. Now all imports of the module named "global_data" points to the same ID.
How using a init.py file caused a second instance of the same module remains a mystery

Importing function from module takes very long

When i import my self written function of a module in a python script it takes about 6 seconds to load. The function contains only about 50 lines of code but that shouldn't even matter since it has not been executed yet right?
This is the script that loads the function:
#/usr/bin/env python
import time
print(time.clock())
from os import chdir
print(time.clock())
from os.path import abspath, dirname
print(time.clock())
from Project.controllers.SpiderController import RunSpider
print(time.clock())
And the output is as follows:
0.193569
0.194114
0.194458
6.315348
I also tried to import the whole module but the result was the same.
What could be the cause of that?
Some side notes:
I use python 2.7.9
The module uses the scrapy framework
The python script is running on a Raspberry Pi 1 Model B
but that shouldn't even matter since it has not been executed yet right?
The code of the function itself is not executed, but the code in the file is executed. This is logical, since that file might contain decorators, library calls, inner constants, etc. It is even possible that the function is build (so that an algorithm constructs the function).
With from <module> import <item> you do a almost normal import, but you create only one reference to an item in that package.
So it can take a long time if there is a program written in the module (that is not scoped in an if __name__ == '__main__':) or when you import a large amount of additional libraries.
It is for instance possible to construct a function like:
def foo(x):
return x + 5
def bar(y):
return y * 2
def qux(x):
return foo(bar(x))
If you then run from module import qux, then it will first have to define foo and bar, since qux depends on these.
Furthermore although the code itself is not executed, the interpreter will analyze the function: it will transform the source code into a syntax tree and do some analysis (which variables are local, etc.).
Finally note that a package typically has an __init__.py file, that initializes the package. That file is also executed and can take considerable time as well. For instance some packages that have a database connection, will already set up the connection to that database, and it can take some time before the database responds to the connection.

Sharing CherryPy's BackgroundTaskQueue object between request handlers

I'm using cherrypy to build a web service. I came across the BackgroundTaskQueue plugin and I want to use it to handle specific time-consuming operations on a separate thread.
The documentation states the usage should be like the following:
import cherrypy
from complicated_logging import log
bgtask = BackgroundTaskQueue(cherrypy.engine)
bgtask.subscribe()
class Root(object):
def index(self):
bgtask.put(log, "index was called", ip=cherrypy.request.remote.ip))
return "Hello, world!"
index.exposed = True
But, IMHO, using the bgtask object like this isn't very elegant. I would like handlers from other python modules to use this object too.
Is there a way to subscribe this plugin once, and then "share" the bgtask object among other handlers (like, for example saving it it in the cherrypy.request)?
How is this done? Does this require writing a cherrypy tool?
Place
queue = BackgroundTaskQueue(cherrypy.engine)
in a separate file named, for instance, tasks.py. This way you create module tasks.
Now you can 'import tasks' in other modules and queue is a single instance
For example, in a file called test.py:
import tasks
def test(): print('works!')
tasks.queue.put(log, test)

Object not working properly when called from child module

Hello generous SO'ers,
This is a somewhat complicated question, but hopefully relevant to the more general use of global objects from a child-module.
I am using some commercial software that provides a python library for interfacing with their application through TCP. (I can't post the code for their library I don't think.)
I am having an issue with calling an object from a child module, that I think is more generally related to global variables or some such. Basically, the object's state is as expected when the child-module is in the same directory as all the other modules (including the module that creates the object).
But when I move the offending child module into a subfolder, it can still access the object but the state appears to have been altered, and the object's connection to the commercial app doesn't work anymore.
Following some advice from this question on global vars, I have organized my module's files as so:
scriptfile.py
pyFIMM/
__init__.py # imports all the other files
__globals.py # creates the connection object used in most other modules
__pyfimm.py # main module functions, such as pyFIMM.connect()
__Waveguide.py # there are many of these files with various classes and functions
(...other files...)
PhotonDesignLib/
__init__.py # imports all files in this folder
pdPythonLib.py # the commercial library
proprietary/
__init__.py # imports all files in this folder
University.py # <-- The offending child-module with issues
pyFIMM/__init__.py imports the sub-files like so:
from __globals import * # import global vars & create FimmWave connection object `fimm`
from __pyfimm import * # import the main module
from __Waveguide import *.
(...import the other files...)
from proprietary import * # imports the subfolder containing `University.py`
The __init__.py's in the subfolders "PhotonDesignLib" & "proprietary" both cause all files in the subfolders to imported, so, for example, scriptfile.py would access my proprietary files as so: import pyFIMM.proprietary.University. This is accomplished via this hint, coded as follows in proprietary/__init__.py:
import os, glob
__all__ = [ os.path.basename(f)[:-3] for f in glob.glob(os.path.dirname(__file__)+"/*.py")]
(Numerous coders from a few different institutions will have their own proprietary code, so we can share the base code but keep our proprietary files/functions to ourselves this way, without having to change any base code/import statements. I now realize that, for the more static PhotonDesignLib folder, this is overkill.)
The file __globals.py creates the object I need to use to communicate with their commercial app, with this code (this is all the code in this file):
import PhotonDesignLib.pdPythonLib as pd # the commercial lib/object
global fimm
fimm = pd.pdApp() # <- - this is the offending global object
All of my sub-modules contain a from __global import * statement, and are able to access the object fimm without specifically declaring it as a global var, without any issue.
So I run scriptfile.py, which has an import statement like from pyFIMM import *.
Most importantly, scriptfile.py initiates the TCP connection made to the application via fimm.connect() right at the top, before issuing any commands that require the communication, and all the other modules call fimm.Exec(<commands for app>) in various routines, which has been working swimmingly well - the fimm object has so-far been accessible to all modules, and keeps it's connection state without issue.
The issue I am running into is that the file proprietary/University.py can only successfully use the fimm object when it's placed in the pyFIMM root-level directory (ie. the same folder as __globals.py etc.). But when University.py is imported from within the proprietary sub-folder, it gives me an "application not initialized" error when I use the fimm object, as if the object had been overwritten or re-initialized or something. The object still exists, it just isn't maintaining it's connection state when called by this sub-module. (I've checked that it's not reinitialized in another module.)
If, after the script fails in proprietary/University.py, I use the console to send a command eg. pyFimm.fimm.Exec(<command to app>), it communicates just fine!
I set proprietary/University.py to print a dir(fimm) as a test right at the beginning, which works fine and looks like the fimm object exists as expected, but a subsequent call in the same file to fimm.Exec() indicates that the object's state is not correct, returning the "application not initialized" error.
This almost looks like there are two fimm objects - one that the main python console (and pyFIMM modules) see, which works great, and another that proprietary/University.py sees which doesn't know that we called fimm.connect() already. Again, if I put University.py in the main module folder "pyFIMM" it works fine - the fimm.Exec() calls operate as expected!
FYI proprietary/University.py imports the __globals.py file as so:
import sys, os, inspect
ScriptDir = inspect.currentframe().f_code.co_filename # get path to this module file
(ParentDir , tail) = os.path.split(ScriptDir) # split off top-level directory from path
(ParentDir , tail) = os.path.split(ParentDir) # split off top-level directory from path
sys.path.append(ParentDir) # add ParentDir to the python search path
from __globals import * # import global vars & FimmWave connection object
global fimm # This line makes no difference, was just trying it.
(FYI, Somewhere on SO it was stated that inspect was better than __file__, hence the code above.)
Why do you think having the sub-module in a sub-folder causes the object to lose it's state?
I suspect the issue is either the way I instruct University.py to import __globals.py or the "import all files in this folder" method I used in proprietary/__init__.py. But I have little idea how to fix it!
Thank you for looking at this question, and thanks in advance for your insightful comments.

Categories

Resources