Errors using multiprocesing library on python, the scripts dies - python

Im new at python and coding, (I know my code may looks ugly).
I have this little script, that will parse data from some devices.
The inventory function works fine, using a for loop, so only can process one item at the time.
I have added the "main" function, that also works fine if I pass a very short list with a few items in it.
I really do not understand what's going on :/
How can I make the script process more than one item at the time?
any advice pls?
But if add more devices all the script gets broken and the interpreter
UnboundLocalError: local variable 'net_connect' referenced before assignment
lista_dev=["dev1","dev2","dev3","dev4"]
def inventory(router):
"""send commands into network devices and save the collected data as a python dictionary"""
try:
net_connect=jump(router,username,password)
except:
pass
facts=(net_connect.send_command("sho ver", use_textfsm=True))[0]
serial=facts.get('serial')[0]
config_register=facts.get('config_register')
model=facts.get('hardware')[0]
host=net_connect.send_command("show run | in hostname").split()[1]
print(host)
bootflash_content=(net_connect.send_command("dir | in bin", use_textfsm=True))
current_binaries=[i.get("name") for i in bootflash_content]
net_connect.send_command("terminal lengh 0")
space=(net_connect.send_command("dir", use_textfsm=True))
free_space=int(space[-1].get("total_free"))
running_image=facts.get('running_image')
cpld=(net_connect.send_command("show platform diag | in CPLD").splitlines())[0].split(",")[0].split(" ")[-1]
diccionario=({'device': host , 'model': model, 'current_version': running_image,'serial': serial ,
'config_reg': config_register,'space_availale_in_bytes': free_space,'clpd_version': cpld,
'current_binaries': current_binaries })
output_q.put(diccionario)
def main():
"""process the list using a Queue to read multiple devices at the time"""
start_time = datetime.now()
output_q = Queue(maxsize=5)
procs = []
for a_device in lista_dev:
my_proc = Process(target=device_inventory, args=(a_device, output_q))
my_proc.start()
procs.append(my_proc)
# Make sure all processes have finished
for a_proc in procs:
a_proc.join()
while not output_q.empty():
#print(output_q.get())
with open (json_file , "a", encoding='utf-8') as file:
file.write(output_q.get()+'\n')
main()
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "inventory.py", line 38, in device_inventory
if net_connect is not None:
UnboundLocalError: local variable 'net_connect' referenced before assignment```

I don't know what was wrong with my original code, but using the library concurrent.futures worked smother, than using Queues.
This links help me to figure out.
https://beckernick.github.io/faster-web-scraping-python/#fromHistory

Related

P4Python run method does not work on empty folder

I want to search a Perforce depot for files.
I do this from a python script and use the p4python library command:
list = p4.run("files", "//mypath/myfolder/*")
This works fine as long as myfolder contains some files. I get a python list as a return value. But when there is no file in myfolder the program stops running and no error message is displayed. My goal is to get an empty python list, so that I can see that this folder doesn't contain any files.
Does anybody has some ideas? I could not find information in the p4 files documentation and on StackOverflow.
I'm going to guess you've got an exception handler around that command execution that's eating the exception and exiting. I wrote a very simple test script and got this:
C:\Perforce\test>C:\users\samwise\AppData\local\programs\python\Python36-32\python files.py
Traceback (most recent call last):
File "files.py", line 6, in <module>
print(p4.run("files", "//depot/no such path/*"))
File "C:\users\samwise\AppData\local\programs\python\Python36-32\lib\site-packages\P4.py", line 611, in run
raise e
File "C:\users\samwise\AppData\local\programs\python\Python36-32\lib\site-packages\P4.py", line 605, in run
result = P4API.P4Adapter.run(self, *flatArgs)
P4.P4Exception: [P4#run] Errors during command execution( "p4 files //depot/no such path/*" )
[Error]: "//depot/no such path/* - must refer to client 'Samwise-dvcs-1509687817'."
Try something like this ?
import os
if len(os.listdir('//mypath/myfolder/') ) == 0: # Do not execute p4.run if directory is empty
list = []
else:
list = p4.run("files", "//mypath/myfolder/*")

How to pass classes into Pool.map as arguments - Pickling Error

I am trying to process a file by cutting it up into chunks and running them through a function which processes the chunks and returns a numpy array. After looking around it seems the best method would be to use the Pool.map method by passing through classes as the arguments. These classes are initiated with the chunk sections as a variable, and another variable to store the numpy array. The output list of classes can then be parsed to get out the information I need to continue with the problem. Here is a simplified version of the script I am trying to write:
from multiprocessing import Pool
class container():
def __init__(self, k):
self.input_section = k
self.ouput_answer = 0
def compute(object_class):
# Main operation would go on in here....
object_class.output_answer = object_class.input_section
return object_class
def Main():
# Create list of classes to path as arguments
sections = [container(k) for k in range(10)]
# Create pool and compute modified classes
with Pool(4) as p:
results = p.map(compute, sections)
# Decode here to get answers
sections = [k.output_answer for k in results]
# Print answers
print(sections)
if __name__ == '__main__':
Main()
This is the error that I get when I run the script:
Exception in thread Thread-9: Traceback (most recent call last):
File "C:\Users\rbernon\AppData\Local\Continuum\Anaconda3\lib\threading.py", line 916, in _bootstrap_inner
self.run()
File "C:\Users\rbernon\AppData\Local\Continuum\Anaconda3\lib\threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\rbernon\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\pool.py", line 463, in _handle_results
task = get()
File "C:\Users\rbernon\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
AttributeError: Can't get attribute 'container' on module '__main__' from
'C:\\Users\\rbernon\\AppData\\Local\\Continuum\\Anaconda3\\lib\\site-packages\\spyder\\utils\\ipython\\start_kernel.py'>
Any help would be greatly apprectiated!
Keep in mind that every piece of data you want to have processed needs to be pickled and sent to the worker processes.
The overhead of this will reduce (and might even eliminate) the advantages of using multiple processes.
If the data file is large, it is probably better to send each worker a start and end offset as a 2-tuple of numbers, so each worker can read part of the file and process it.

Python3 filling a dictionary concurrently

I want to fill a dictionary in a loop. Iterations in the loop are independent from each other. I want to perform this on a cluster with thousands of processors. Here is a simplified version of what I tried and need to do.
import multiprocessing
class Worker(multiprocessing.Process):
def setName(self,name):
self.name=name
def run(self):
print ('In %s' % self.name)
return
if __name__ == '__main__':
jobs = []
names=dict()
for i in range(10000):
p = Worker()
p.setName(str(i))
names[str(i)]=i
jobs.append(p)
p.start()
for j in jobs:
j.join()
I tried this one in python3 on my own computer and received the following error:
..
In 249
Traceback (most recent call last):
File "test.py", line 16, in <module>
p.start()
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/process.py", line 105, in start
In 250
self._popen = self._Popen(self)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/context.py", line 212, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/context.py", line 267, in _Popen
return Popen(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/popen_fork.py", line 66, in _launch
parent_r, child_w = os.pipe()
OSError: [Errno 24] Too many open files
Is there any better way to do this?
multiprocessing talks to its subprocesses via pipes. Each subprocesses requires two open file descriptors, one for read and one for write. If you launch 10000 workers, you'll end opening 20000 file descriptors which exceeds the default limit on OS X (which your paths indicate you're using).
You can fix the issue by raising the limit. See https://superuser.com/questions/433746/is-there-a-fix-for-the-too-many-open-files-in-system-error-on-os-x-10-7-1 for details - basically, it amounts to setting two sysctl knobs and upping your shell's ulimit setting.
You are spawning 10000 processes at once at the moment. That really isn't a good idea.
The error you see is most definitely raised because the multiprocessing module (seem to) use pipes for the Inter Proccess Communication and there is a limit of open pipes/FDs.
I suggest using an python interpreter without a Global interpreter lock like Jython or IronPython and just replace the multiprocessing module with the threading one.
If you still want to use the multiprocessing module, you could use an Proccess Pool like this to collect the return values:
from multiprocessing import Pool
def worker(params):
name, someArg = params
print ('In %s' % name)
# do something with someArg here
return (name, someArg)
if __name__ == '__main__':
jobs = []
names=dict()
# Spawn 100 worker processes
pool = Pool(processes=100)
# Fill with real data
task_dict = dict(('name_{}'.format(i), i) for i in range(1000))
# Process every task via our pool
results = pool.map(worker, task_dict.items())
# And convert the rsult to a dict
results = dict(results)
print (results)
This should work with minimal changes for the threading module, too.

import issue using exec(compile()) in thread

Windows 10, Python 3.5.1 x64 here.
This is weird... Let's say I have this script, called do.py. Please note the import string statement:
import string
# Please note that if the print statement is OUTSIDE 'main()', it works.
# It's like if 'main()' can't see the imported symbols from 'string'
def main():
print(string.ascii_lowercase)
main()
I want to run it from a "launcher script", in a subthread, like this (launcher.py):
import sys
import threading
sys.argv.append('do.py')
def run(script, filename):
exec(compile(script, filename, 'exec'))
with open(sys.argv[1], 'rb') as _:
script = _.read()
# But this WORKS:
# exec(compile(script, sys.argv[1], 'exec'))
thread = threading.Thread(name='Runner', target=run, args=(script, sys.argv[1]))
thread.start()
thread.join()
It dies with the following error:
Exception in thread Runner:
Traceback (most recent call last):
File "C:\Program Files\Python35\lib\threading.py", line 914, in _bootstrap_inner
self.run()
File "C:\Program Files\Python35\lib\threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "tmpgui.py", line 7, in run
exec(compile(script, filename, 'exec'))
File "do.py", line 6, in <module>
main()
File "do.py", line 4, in main
print(string.ascii_lowercase)
NameError: name 'string' is not defined
That is, the exec'ed code is not importing string properly or something like that, and within main() the string module is not visible.
This is not the full code of my project, which is too big to post here, but the bare minimum I've created which mimics the problem.
Just in case someone is curious, I'm rewriting an old program of mine which imported the main() function of a script and ran that function with the standard output streams redirected to a tkinter text box. Instead of importing a function from the script, I want to load the script and run it. I don't want to use subprocess for a whole variety of reasons, I prefer to run the "redirected" code in a thread and communicate with the main thread which is the one handling the GUI. And that part works perfectly, the only problem I have is this and I can't understand why is happening!
My best bet: I should be passing something in globals or locals dictionaries to exec, but I'm at a lost here...
Thanks a lot in advance!
exec(thing) is equivalent to exec(thing, globals(), locals()).
Thus,
the local symbol table of do.py is the local symbol table of the run function
the global symbol table of do.py is the global symbol table of launcher.py
import string imports the module and binds it to the variable in the local space, which is the local space of the run function. You can verify this:
def run(script, filename):
try:
exec(compile(script, filename, 'exec'))
finally:
assert 'string' in locals(), "won't fail because 'import' worked properly"
main has a separate local scope, but it shares the global symbol table with do.py and, consequently, with launcher.py.
Python tried to find the variable named string inside both local (it's empty) and global symbol tables of main, but failed, and raised the NameError.
Pass one empty dictionary in a call to exec:
def run(script, filename):
exec(compile(script, filename, 'exec'), {})

Python multiprocessing, ValueError: I/O operation on closed file

I'm having a problem with the Python multiprocessing package. Below is a simple example code that illustrates my problem.
import multiprocessing as mp
import time
def test_file(f):
f.write("Testing...\n")
print f.name
return None
if __name__ == "__main__":
f = open("test.txt", 'w')
proc = mp.Process(target=test_file, args=[f])
proc.start()
proc.join()
When I run this, I get the following error.
Process Process-1:
Traceback (most recent call last):
File "C:\Python27\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()
File "C:\Python27\lib\multiprocessing\process.py", line 114, in run
self.target(*self._args, **self._kwargs)
File "C:\Users\Ray\Google Drive\Programming\Python\tests\follow_test.py", line 24, in test_file
f.write("Testing...\n")
ValueError: I/O operation on closed file
Press any key to continue . . .
It seems that somehow the file handle is 'lost' during the creation of the new process. Could someone please explain what's going on?
I had similar issues in the past. Not sure whether it is done within the multiprocessing module or whether open sets the close-on-exec flag by default but I know for sure that file handles opened in the main process are closed in the multiprocessing children.
The obvious work around is to pass the filename as a parameter to the child process' init function and open it once within each child (if using a pool), or to pass it as a parameter to the target function and open/close on each invocation. The former requires the use of a global to store the file handle (not a good thing) - unless someone can show me how to avoid that :) - and the latter can incur a performance hit (but can be used with multiprocessing.Process directly).
Example of the former:
filehandle = None
def child_init(filename):
global filehandle
filehandle = open(filename,...)
../..
def child_target(args):
../..
if __name__ == '__main__':
# some code which defines filename
proc = multiprocessing.Pool(processes=1,initializer=child_init,initargs=[filename])
proc.apply(child_target,args)

Categories

Resources