Why this code doesn't work in parallel python - python

I tried to use pp(Parallel Python) like this:
import glob
import subprocess
import pp
def run(cmd):
print cmd
subprocess.call(cmd, shell=True)
job_server = pp.Server()
job_server.set_ncpus(8)
jobs = []
for a_file in glob.glob("./*"):
cmd = "ls"
jobs.append(job_server.submit(run, (cmd,)))
for j in jobs:
j()
But encountered such an error that subprocess.call is not a global name.
An error has occured during the function execution
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/pp-1.6.1-py2.7.egg/ppworker.py", line 90, in run
__result = __f(*__args)
File "<string>", line 3, in run
NameError: global name 'subprocess' is not defined
I've imported subprocess, why can't it be used here?
According to abarnert's suggestion, I changed my code to this:
import glob
import pp
def run(cmd):
print cmd
subprocess.call(cmd, shell=True)
job_server = pp.Server()
job_server.set_ncpus(8)
jobs = []
for a_file in glob.glob("./*"):
cmd = "ls"
jobs.append(job_server.submit(run, (cmd,),modules=("subprocess",)))
for j in jobs:
j()
But it still doesn't work, it complains like this:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/usr/lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/local/lib/python2.6/dist-packages/pp-1.6.1-py2.6.egg/pp.py", line 721, in _run_local
job.finalize(sresult)
UnboundLocalError: local variable 'sresult' referenced before assignment

The documentation explains this pretty well, and each example shows you how to deal with it.
Among the params of the submit method is "modules - tuple with module names to import". Any modules you want to be available in the submitted job has to be listed here.
So, you can do this:
jobs.append(job_server.submit(run, (cmd,), (), ('subprocess',)))
Or this:
jobs.append(job_server.submit(run, (cmd,), modules=('subprocess',)))

Sorry, untested, but did you try:
from subprocess import call
Inside the 'run' function?
And then use "call" instead of "subprocess.call" ? That would make 'call' local to the function but accessible.

Related

Python - Fabric - Getting files

I am trying to write a simple python code with fabric to transfer a file from one host to another using the get() function although I keep getting error message:
MacBook-Pro-3:PythonsScripts$ fab get:'/tmp/test','/tmp/test'
[hostname] Executing task 'get'
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/fabric/main.py", line 743, in main
*args, **kwargs
File "/Library/Python/2.7/site-packages/fabric/tasks.py", line 387, in execute
multiprocessing
File "/Library/Python/2.7/site-packages/fabric/tasks.py", line 277, in _execute
return task.run(*args, **kwargs)
File "/Library/Python/2.7/site-packages/fabric/tasks.py", line 174, in run
return self.wrapped(*args, **kwargs)
File "/Users/e0126914/Desktop/PYTHON/PythonsScripts/fabfile.py", line 128, in get
get('/tmp/test','/tmp/test') ***This line repeats many times then last error below***
RuntimeError: maximum recursion depth exceeded
My current code is:
from fabric.api import *
from getpass import getpass
from fabric.decorators import runs_once
env.hosts = ['hostname']
env.port = '22'
env.user = 'parallels'
env.password="password"
def abc(remote_path, local_path):
abc('/tmp/test','/tmp/')
Any help would be appreciated!
fabric.api.get already is a method. When you perform from fabric.api import * you are importing fabric's get. You should rename your get function to avoid conflict.
From inside the abc function, you need to call get
def abc(p1,p2):
get(p1, p2)
EDIT:
When executing functions through fabric, the arguments are passed through the command line
ie. $ fab abc:string1,string2

getting iis worker processes from wmi in python

I'm trying to dispay process if and pool names of iis in python.
Here is my python code:
import wmi
c = wmi.WMI('.', namespace="root/WebAdministration")
c.query("select ProcessId from WorkerProcess")
it fails:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\wmi.py", line 1009, in query
return [ _wmi_object (obj, instance_of, fields) for obj in self._raw_query(wql) ]
File "C:\Python27\lib\site-packages\win32com\client\util.py", line 84, in next
return _get_good_object_(self._iter_.next(), resultCLSID = self.resultCLSID)
pywintypes.com_error: (-2147217389, 'OLE error 0x80041013', None, None)
I also tried:
for p in c.WorkerProcess:
print p.ProcessId
which does not work either.
Now here is a very similar visualbasic script code that works fine:
Set oWebAdmin = GetObject("winmgmts:root\WebAdministration")
Set processes = oWebAdmin.InstancesOf("WorkerProcess")
For Each w In processes
WScript.Echo w.ProcessId
WScript.Echo w.AppPoolName
Next
the documentation is:
http://msdn.microsoft.com/en-us/library/microsoft.web.administration.workerprocess(v=vs.90).aspx
It looks like I'm supposed to instantiate but I cannot figure out how.
Any ideas how to get it to work in python?
Actually my code was correct. I just needed to run it with admin preivileges.

Why would it throws "'module' object has no attribute XXX" error when I call on apply_async from multiprocessing.Pool?

The code is as below. When I copy-and-paste it in my cmd prompt, it throws 'module' object has no attribute 'func', but when I save it as a .py file and execute python test.py, it just works fine.
import multiprocessing
import time
def func(msg):
for i in xrange(3):
print msg
time.sleep(1)
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=4)
for i in xrange(5):
msg = "hello %d" %(i)
pool.apply_async(func, (msg, ))
pool.close()
pool.join()
print "Sub-process(es) done."
Could anyone give me an explanation on the difference between in prompt and in file when running a python code? Thanks a lot!
This is happening because on Windows, func needs to be pickled and sent to the child process via IPC. In order for the child to unpickle func, it needs to be able to import it from the parent's __main__ module. When this happens in a normal Python script, the child can re-import your script, and __main__ will contain all the functions declared at the top-level of your script, so it works fine. However, in the interactive interpreter, functions you've defined while in the interpreter can't simply be re-imported from a file like in a normal script, so they will not be in __main__ in the child. This is more clear if you use multiprocessing.Process directly to recreate the issue:
>>> def f():
... print "HI"
...
>>> import multiprocessing
>>> p = multiprocessing.Process(target=f)
>>> p.start()
>>> Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\python27\lib\multiprocessing\forking.py", line 381, in main
self = load(from_parent)
File "C:\python27\lib\pickle.py", line 1378, in load
return Unpickler(file).load()
File "C:\python27\lib\pickle.py", line 858, in load
dispatch[key](self)
File "C:\python27\lib\pickle.py", line 1090, in load_global
klass = self.find_class(module, name)
File "C:\python27\lib\pickle.py", line 1126, in find_class
klass = getattr(mod, name)
AttributeError: 'module' object has no attribute 'f'
This way, it's more clear that pickle can't find the module. If you add some tracing to pickle.py you can see that 'module' is referring to __main__:
def load_global(self):
module = self.readline()[:-1]
name = self.readline()[:-1]
print("module {} name {}".format(module, name)) # I added this.
klass = self.find_class(module, name)
self.append(klass)
Rrerunning the same code again with that extra print statement yields this:
module multiprocessing.process name Process
module __main__ name f
< same traceback as before>
It's worth noting that this example actually works fine on Posix platforms, because os.fork() is used to spawn the child processes, which means that any function defined prior to the Pool being created will be available in the child's __main__ module. So, while the above example will work, this one will still fail, because the worker function is defined after creating the Pool (which means after os.fork() is called):
>>> import multiprocessing
>>> p = multiprocessing.Pool(2)
>>> def f(a):
... print(a)
...
>>> p.apply(f, "hi")
Process PoolWorker-1:
Traceback (most recent call last):
File "/usr/lib64/python2.6/multiprocessing/process.py", line 231, in _bootstrap
self.run()
File "/usr/lib64/python2.6/multiprocessing/process.py", line 88, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib64/python2.6/multiprocessing/pool.py", line 57, in worker
task = get()
File "/usr/lib64/python2.6/multiprocessing/queues.py", line 339, in get
return recv()
AttributeError: 'module' object has no attribute 'f'

why `pdb` states something unrelated and misleading?

My Python script reports where it goes wrong ("line 122" in myscript.py), when I run it in a shell:
$ toc2others.py -i toc -p pg
Traceback (most recent call last):
File "~/myscript.py", line 122, in <module>
p = re.match(keywords[index+1][0], inlines[n+1], re.IGNORECASE)
IndexError: list index out of range
It is because keywords[index+1] goes out of the index range of keywords.
When I run it under pdb, however, it doesn't report where it goes wrong, but says something unrelated (error is reported to take place at import re).
$ pdb ~/myscript.py -i toc -p pg
> /myscript.py(3)<module>()
-> import re
(Pdb) c
Traceback (most recent call last):
File "/usr/lib/python2.7/pdb.py", line 1314, in main
pdb._runscript(mainpyfile)
File "/usr/lib/python2.7/pdb.py", line 1233, in _runscript
self.run(statement)
File "/usr/lib/python2.7/bdb.py", line 387, in run
exec cmd in globals, locals
File "<string>", line 1, in <module>
File "~/myscript.py", line 3, in <module>
import re
IndexError: list index out of range
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
I wonder why pdb states something unrelated and misleading?
Can pdb state where it actually goes wrong?
Thanks.
It's a bug, actually.
See issues:
http://bugs.python.org/issue16482
http://bugs.python.org/issue17277
This only happens if exception is thrown on module-level of executed file, i.e. not inside any function. So if you just put your code in a main() function, this will fix it. Or you can use ipython, which is much more fun for debugging:
ipython ~/myscript.py --pdb -- -i toc -p pg
This will run the script and only stop if there's an error, and it also does not suffer from the above bug.

Parallel Python: Passing a function written in another module to 'submit'

I am using the Parallel Python module (pp), and want to submit a job to a worker. However, the function that I want to execute is in another module (written with Cython), and I don't know how to import the function name to the new worker. The method suggested here, i.e importing the module "walkerc" inside the function cannot work since walk itself is defined in walkerc, from the filename "walkerc.so"
import pp
from walkerc import walk
# Other stuff here
ser = pp.Server()
# Some more definitions
ser.submit(walk, (it, params))
ser.submit(walk, (1000, params), modules = ("walkerc",), globals = globals())
Both the statements above fail, I get the following error:
Traceback (most recent call last):
File "", line 1, in
ser.submit(walk, (1000, params), modules = ("walkerc",), globals = globals())
File "/usr/lib/python2.7/site-packages/pp.py", line 458, in submit
sfunc = self.__dumpsfunc((func, ) + depfuncs, modules)
File "/usr/lib/python2.7/site-packages/pp.py", line 629, in
__dumpsfunc
sources = [self.__get_source(func) for func in funcs]
File "/usr/lib/python2.7/site-packages/pp.py", line 696, in
__get_source
sourcelines = inspect.getsourcelines(func)[0]
File "/usr/lib/python2.7/inspect.py", line 690, in getsourcelines
lines, lnum = findsource(object)
File "/usr/lib/python2.7/inspect.py", line 526, in findsource
file = getfile(object)
File "/usr/lib/python2.7/inspect.py", line 420, in getfile
'function, traceback, frame, or code object'.format(object))
TypeError: '<'built-in function walk'>' is not a module, class, method,
function, traceback, frame, or code object
The function 'walk' itself is imported properly within the main program, it is the process of submitting it to a new worker that is problematic.
How can I specify the function name 'walk' properly?
I do not want to define 'walk' in the same file as which I have called it because I have modified it in Cython and want to have better performance. Is there an alternative?
Try renaming your walk function to something else, mywalk for example. As the exception text suggests, your environment seems to have a built-in function that goes by the name walk, so the inspect module gets confused.
I can successfully pass my imported walk function like this on my system, no conflict here and nothing more needed, the function gets executed using the given argument:
import pp
from walkerc import walk
pps = pp.Server()
pps.submit(walk, args=(1,))
But passing dir, which is a built-in function for sure:
pps.submit(dir)
I get the exact same error as you do:
Traceback (most recent call last):
File "parallel.py", line 9, in
pps.submit(dir)
...
File ".../lib/python2.7/inspect.py", line 420, in getfile
'function, traceback, frame, or code object'.format(object))
TypeError: is not a module, class, method, function, traceback, frame, or code object
Update after the below discussion:
So the problem here is that Python treats the members that come from C extensions as built-ins. The code above works with the regular Python module, but I was able to replicate the OP's error when importing and passing the function from a C extension.
Therefore I wrapped the C extension function call inside a normal Python function, which does the trick. Note that now the walk function import was moved to the wrapping function, so that it can construct it's own context itself when dispatched.
import pp
def walk(n):
import walkerc
return walkerc.walk(n)
def print_callback(result):
print('callback: ', result)
pps = pp.Server()
job = pps.submit(walk, args=(1,), callback=print_callback)

Categories

Resources