My program needs a snmp trap listener. It needs to continuously receive snmp traps and perform further calculations on them. I am using multiprocessing module of python.
Right now, my program looks something like this:
import multiprocessing
from snmpListener import snmpCompare
def main():
execute()
def execute():
try:
p=Process(target=snmpCompare)
p.start()
#my code that runs here
#it is basically sending commands to my server
#which sends snmp alerts as response to my commands
p.join()
except (KeyboardInterrupt,SystemExit):
p.terminate()
In snmpListener.py,
import multiprocessing
def trapListener():
snmpTrap= receiveSnmpTrap()
q.put(snmpTrap)
def snmpCompare():
f=open('Alerts.txt','w')
q=Queue()
p=Process(target=trapListener, args=(q,))
p.daemon=True
p.start()
while True:
alert= p.get()
f.write(alert)
#perform calculation on 'alert'
p.join()
f.close()
But, the code is running such that the child process from execute() function runs when it is created. Then all my commands in my parent process are getting executed on the server. The child process and parent process don't seem to running simultaneously. The alerts corresponding to the commands are not being received. i.e., The file "Alerts.txt" is empty.
I haven't been using multiprocessing module of python for a long time. In fact, I have worked very little on multiprocessing. I don't know where I am going wrong and I am a little confused. Any advice would be welcome.
UPDATE: I am calling trapListener when creating a child process from execute function. My calculations are being done in trapListener itself. I have also made trapListener a daemon process. My code is working now. Also there was an error being generated in child process due to which it was getting terminated. Hence Alerts.txt was empty.
Related
I'm trying to use multiprocessing to run multiple scripts. At the start, I launch a loading animation, however I am unable to ever kill it. Below is an example...
Animation: foo.py
import sys
import time
import itertools
# Simple loading animation that runs infinitely.
for c in itertools.cycle(['|', '/', '-', '\\']):
sys.stdout.write('\r' + c)
sys.stdout.flush()
time.sleep(0.1)
Useful script: bar.py
from time import sleep
# Stand-in for a script that does something useful.
sleep(5)
Attempt to run them both:
import multiprocessing
from multiprocessing import Process
import subprocess
pjt_dir = "/home/solebay/path/to/project" # Setup paths..
foo_path = pjt_dir + "/foo.py" # ..
bar_path = pjt_dir + "/bar.py" # ..
def run_script(path): # Simple function that..
"""Launches python scripts.""" # ..allows me to set a..
subprocess.run(["python", path]) # ..script as a process.
foo_p = Process(target=run_script, args=(foo_path,)) # Define the processes..
bar_p = Process(target=run_script, args=(bar_path,)) # ..
foo_p.start() # start loading animation
bar_p.start() # start 'useful' script
bar_p.join() # Wait for useful script to finish executing
foo_p.kill() # Kill loading animation
I get no error messages, and (my_venv) solebay#computer:~$ comes up in my terminal, but the loading animation persists (clipping over my name and environement). How can I kill it?
I've run into a similar situation before where I couldn't terminate the program using ctrl + c. The issue is (more or less) solved by using daemonic processes/threads (see multiprocessing doc). To do this, you simply change
foo_p = Process(target=run_script, args=(foo_path,))
to
foo_p = Process(target=run_script, args=(foo_path,), daemon=True)
and similarly for other children processes that you would like to create.
With that being said, I myself am not exactly sure if this is the correct way to remedy the issue with not being able to terminate the multiprocessing program, or is it just some artifact that happens to help with this. I would suggest this thread that went into the discussion about daemon threads more. But essentially, from my understanding, daemon threads would be terminated automatically whenever their parent process is terminated, regardless of whether they are finished or not. Meanwhile, if a thread is not daemonic, then somehow you need to wait until the children processes to finish before you're able to fully terminate the program.
You are creating too many processes. These two lines:
foo_p = Process(target=run_script, args=(foo_path,)) # Define the processes..
bar_p = Process(target=run_script, args=(bar_path,)) # ..
create two new processes. Let's all them "A" and "B". Each process consists of this function:
def run_script(path): # Simple function that..
"""Launches python scripts.""" # ..allows me to set a..
subprocess.run(["python", path]) # ..script as a process.
which then creates another subprocess. Let's call those two processes "C" and "D". In all you have created 4 extra processes, instead of just the 2 that you need. It is actually process "C" that's producing the output on the terminal. This line:
bar_p.join()
waits for "B" to terminate, which implies that "D" has terminated. But this line:
foo_p.kill()
kills process "A" but orphans process "C". So the output to the terminal continues forever.
This is well documented - see the description of multiprocessing.terminate, which says:
"Note that descendant processes of the process will not be terminated – they will simply become orphaned."
The following program works as you intended, exiting gracefully from the second process after the first one has finished. (I renamed "foo.py" to useless.py and "bar.py" to useful.py, and made small changes so I could run it on my computer.)
import subprocess
import os
def run_script(name):
s = os.path.join(r"c:\pyproj310\so", name)
return subprocess.Popen(["py", s])
if __name__ == "__main__":
useless_p = run_script("useless.py")
useful_p = run_script("useful.py")
useful_p.wait() # Wait for useful script to finish executing
useless_p.kill() # Kill loading animation
You can't use subprocess.run() to launch the new processes since that function will block the main script until the process completes. So I used Popen instead. Also I placed the running code under an if __name__ == "__main__" which is good practice (and maybe necessary on Windows).
I am trying to terminate the processes belonging to a pool. The pool processes are carrying out calculations and a stop button in a Gui should end these calculations.
It seems the simple way to do this is by calling pool.terminate(). This option isn't available to me because I don't have access to the pool variable in my scope. It was created in a file that I'd rather not edit.
I tried an approach by terminating the processes by process ID. I get the pids from a list created by active_children. But it seems that os.kill has no effect as all the processes are still there. Where did I go wrong/how can I solve this? I'd appreciate any help.
Below is a minimal, reproducable example. Also if my post indicates an obvious lack of knowledge, it's probably true and I apologize. thank you
from multiprocessing import Pool
from multiprocessing import active_children
import os, signal
if __name__ == '__main__':
pool = Pool()
print(active_children())
for process in active_children():
pid = process.pid
os.kill(pid, signal.SIGTERM)
print(active_children()) #same output as previous print statement
pool.terminate()
print(active_children()) #returns an empty list
Imports:
from dask.distributed import Client
import streamz
import time
Simulated workload:
def increment(x):
time.sleep(0.5)
return x + 1
Let's suppose I'd like to process some workload on a local Dask client:
if __name__ == "__main__":
with Client() as dask_client:
ps = streamz.Stream()
ps.scatter().map(increment).gather().sink(print)
for i in range(10):
ps.emit(i)
This works as expected, but sink(print) will, of course, enforce waiting for each result, thus the stream will not execute in parallel.
However, if I use buffer() to allow results to be cached, then gather() does not seem to correctly collect all results anymore and the interpreter exits before getting results. This approach:
if __name__ == "__main__":
with Client() as dask_client:
ps = streamz.Stream()
ps.scatter().map(increment).buffer(10).gather().sink(print)
# ^
for i in range(10): # - allow parallel execution
ps.emit(i) # - before gather()
...does not print any results for me. The Python interpreter just exits shortly after starting the script and before buffer() emits it's results, thus nothing gets printed.
However, if the main process is forced to wait for some time, the results are printed in parallel fashion (so they do not wait for each other, but are printed nearly simultaneously):
if __name__ == "__main__":
with Client() as dask_client:
ps = streamz.Stream()
ps.scatter().map(increment).buffer(10).gather().sink(print)
for i in range(10):
ps.emit(i)
time.sleep(10) # <- force main process to wait while ps is working
Why is that? I thought gather() should wait for a batch of 10 results since buffer() should cache exactly 10 results in parallel before flushing them to gather(). Why does gather() not block in this case?
Is there a nice way to otherwise check if a Stream still contains elements being processed in order to prevent the main process from exiting prematurely?
"Why is that?": because the Dask distributed scheduler (which executes the stream mapper and sink functions) and your python script run in different processes. When the "with" block context ends, your Dask Client is closed and execution shuts down before the items emitted to the stream are able reach the sink function.
"Is there a nice way to otherwise check if a Stream still contains elements being processed": not that I am aware of. However: if the behaviour you want is (I'm just guessing here) the parallel processing of a bunch of items, then Streamz is not what you should be using, vanilla Dask should suffice.
I need to make script which on some condition spawns parallel proccess (worker) and makes it to do some IO job. And when it finished - close that process.
But looks like the processes do not tend co exit by default.
Here is my approach:
import multiprocessing
pool = multiprocessing.Pool(4)
def f(x):
sleep(10)
print(x)
return True
r = pool.map_async(f, [1,2,3,4,5,6,7,8,9,10])
But it I run it in the ipython and whait for all prints, after this I can run ps aux | grep ipython and see a lot of processes. So looks like these workers are still alive.
Maybe I'm doind something wrong, but how can I get make these processes terminate when they finished their task? And what approach should I use if I want to spawn a lot of workers one by one (by getting some rmq message, for example)?
Pool spawns worker processes when you declare the pool. They do not get killed until the pool is shut down. Instead, they wait there for more work to appear in the queue.
If you change your code to:
r = pool.map_async(f, [1,2,3,4,5,6,7,8,9,10])
pool.close()
pool.join()
print "check ps ax now"
sleep (10)
you will see the pool processes have disappeared.
Another thing, your program might not work as intended as you declare function f after you declare your pool. I had to change pool = multiprocessing.Pool(4) to follow function f declaration, but this may vary between Python versions. Anyway, if you get odd "module has no attribute" -exceptions, this is the reason.
Hannu
I've been trying to write an interactive wrapper (for use in ipython) for a library that controls some hardware. Some calls are heavy on the IO so it makes sense to carry out the tasks in parallel. Using a ThreadPool (almost) works nicely:
from multiprocessing.pool import ThreadPool
class hardware():
def __init__(IPaddress):
connect_to_hardware(IPaddress)
def some_long_task_to_hardware(wtime):
wait(wtime)
result = 'blah'
return result
pool = ThreadPool(processes=4)
Threads=[]
h=[hardware(IP1),hardware(IP2),hardware(IP3),hardware(IP4)]
for tt in range(4):
task=pool.apply_async(h[tt].some_long_task_to_hardware,(1000))
threads.append(task)
alive = [True]*4
Try:
while any(alive) :
for tt in range(4): alive[tt] = not threads[tt].ready()
do_other_stuff_for_a_bit()
except:
#some command I cannot find that will stop the threads...
raise
for tt in range(4): print(threads[tt].get())
The problem comes if the user wants to stop the process or there is an IO error in do_other_stuff_for_a_bit(). Pressing Ctrl+C stops the main process but the worker threads carry on running until their current task is complete.
Is there some way to stop these threads without having to rewrite the library or have the user exit python? pool.terminate() and pool.join() that I have seen used in other examples do not seem to do the job.
The actual routine (instead of the simplified version above) uses logging and although all the worker threads are shut down at some point, I can see the processes that they started running carry on until complete (and being hardware I can see their effect by looking across the room).
This is in python 2.7.
UPDATE:
The solution seems to be to switch to using multiprocessing.Process instead of a thread pool. The test code I tried is to run foo_pulse:
class foo(object):
def foo_pulse(self,nPulse,name): #just one method of *many*
print('starting pulse for '+name)
result=[]
for ii in range(nPulse):
print('on for '+name)
time.sleep(2)
print('off for '+name)
time.sleep(2)
result.append(ii)
return result,name
If you try running this using ThreadPool then ctrl-C does not stop foo_pulse from running (even though it does kill the threads right away, the print statements keep on coming:
from multiprocessing.pool import ThreadPool
import time
def test(nPulse):
a=foo()
pool=ThreadPool(processes=4)
threads=[]
for rn in range(4) :
r=pool.apply_async(a.foo_pulse,(nPulse,'loop '+str(rn)))
threads.append(r)
alive=[True]*4
try:
while any(alive) : #wait until all threads complete
for rn in range(4):
alive[rn] = not threads[rn].ready()
time.sleep(1)
except : #stop threads if user presses ctrl-c
print('trying to stop threads')
pool.terminate()
print('stopped threads') # this line prints but output from foo_pulse carried on.
raise
else :
for t in threads : print(t.get())
However a version using multiprocessing.Process works as expected:
import multiprocessing as mp
import time
def test_pro(nPulse):
pros=[]
ans=[]
a=foo()
for rn in range(4) :
q=mp.Queue()
ans.append(q)
r=mp.Process(target=wrapper,args=(a,"foo_pulse",q),kwargs={'args':(nPulse,'loop '+str(rn))})
r.start()
pros.append(r)
try:
for p in pros : p.join()
print('all done')
except : #stop threads if user stops findRes
print('trying to stop threads')
for p in pros : p.terminate()
print('stopped threads')
else :
print('output here')
for q in ans :
print(q.get())
print('exit time')
Where I have defined a wrapper for the library foo (so that it did not need to be re-written). If the return value is not needed the neither is this wrapper :
def wrapper(a,target,q,args=(),kwargs={}):
'''Used when return value is wanted'''
q.put(getattr(a,target)(*args,**kwargs))
From the documentation I see no reason why a pool would not work (other than a bug).
This is a very interesting use of parallelism.
However, if you are using multiprocessing, the goal is to have many processes running in parallel, as opposed to one process running many threads.
Consider these few changes to implement it using multiprocessing:
You have these functions that will run in parallel:
import time
import multiprocessing as mp
def some_long_task_from_library(wtime):
time.sleep(wtime)
class MyException(Exception): pass
def do_other_stuff_for_a_bit():
time.sleep(5)
raise MyException("Something Happened...")
Let's create and start the processes, say 4:
procs = [] # this is not a Pool, it is just a way to handle the
# processes instead of calling them p1, p2, p3, p4...
for _ in range(4):
p = mp.Process(target=some_long_task_from_library, args=(1000,))
p.start()
procs.append(p)
mp.active_children() # this joins all the started processes, and runs them.
The processes are running in parallel, presumably in a separate cpu core, but that is to the OS to decide. You can check in your system monitor.
In the meantime you run a process that will break, and you want to stop the running processes, not leaving them orphan:
try:
do_other_stuff_for_a_bit()
except MyException as exc:
print(exc)
print("Now stopping all processes...")
for p in procs:
p.terminate()
print("The rest of the process will continue")
If it doesn't make sense to continue with the main process when one or all of the subprocesses have terminated, you should handle the exit of the main program.
Hope it helps, and you can adapt bits of this for your library.
In answer to the question of why pool did not work then this is due to (as quoted in the Documentation) then main needs to be importable by the child processes and due to the nature of this project interactive python is being used.
At the same time it was not clear why ThreadPool would - although the clue is right there in the name. ThreadPool creates its pool of worker processes using multiprocessing.dummy which as noted here is just a wrapper around the Threading module. Pool uses the multiprocessing.Process. This can be seen by this test:
p=ThreadPool(processes=3)
p._pool[0]
<DummyProcess(Thread23, started daemon 12345)> #no terminate() method
p=Pool(processes=3)
p._pool[0]
<Process(PoolWorker-1, started daemon)> #has handy terminate() method if needed
As threads do not have a terminate method the worker threads carry on running until they have completed their current task. Killing threads is messy (which is why I tried to use the multiprocessing module) but solutions are here.
The one warning about the solution using the above:
def wrapper(a,target,q,args=(),kwargs={}):
'''Used when return value is wanted'''
q.put(getattr(a,target)(*args,**kwargs))
is that changes to attributes inside the instance of the object are not passed back up to the main program. As an example the class foo above can also have methods such as:
def addIP(newIP):
self.hardwareIP=newIP
A call to r=mp.Process(target=a.addIP,args=(127.0.0.1)) does not update a.
The only way round this for a complex object seems to be shared memory using a custom manager which can give access to both the methods and attributes of object a For a very large complex object based on a library this may be best done using dir(foo) to populate the manager. If I can figure out how I'll update this answer with an example (for my future self as much as others).
If for some reasons using threads is preferable, we can use this.
We can send some siginal to the threads we want to terminate. The simplest siginal is global variable:
import time
from multiprocessing.pool import ThreadPool
_FINISH = False
def hang():
while True:
if _FINISH:
break
print 'hanging..'
time.sleep(10)
def main():
global _FINISH
pool = ThreadPool(processes=1)
pool.apply_async(hang)
time.sleep(10)
_FINISH = True
pool.terminate()
pool.join()
print 'main process exiting..'
if __name__ == '__main__':
main()