I encounter a multiprocessing problem. The code is included below. The code can execute as expected, but when uncommenting self.queue = multiprocessing.Queue(), this program will exit immediately and it seems that the subprocess can't be started successfully.
I don't know what happened. Could someone help me out? Many Thanks!
import multiprocessing
import time
class Test:
def __init__(self):
self.pool = multiprocessing.Pool(1)
#self.queue = multiprocessing.Queue()
def subprocess(self):
for i in range(10):
print("Running")
time.sleep(1)
print("Subprocess Completed")
def start(self):
self.pool.apply_async(func=self.subprocess)
print("Subprocess has been started")
self.pool.close()
self.pool.join()
def __getstate__(self):
self_dict = self.__dict__.copy()
del self_dict['pool']
return self_dict
def __setstate__(self, state):
self.__dict__.update(state)
if __name__ == '__main__':
test = Test()
test.start()
I can reproduce your Issue and also no Traceback raised up.
This should raise the following error, don't know why not:
RuntimeError: Queue objects should only be shared between processes through inheritance
Replace your line of code with:
m = multiprocessing.Manager()
self.queue = m.Queue()
Why does this happens?
multiprocessing.Queue is for use with one Process, you are using multiprocessing.Pool, which uses multiple Process, you have to use
multiprocessing.Manager().Queue().
Tested with Python: 3.4.2
you use apply_async, which returns immediately. so you should wait for the result somewhere
under the hood, python will pickle the function to be executed to the child process. but self.process as a method is not pickle-able here (because of the self.pool attribute, see comment by ShadowRanger below).
import multiprocessing
import time
def subprocess(): # is a plain old (pickle-able) function
for i in range(10):
print("Running")
time.sleep(1)
print("Subprocess Completed")
return True
class Test:
def __init__(self):
self.pool = multiprocessing.Pool(1)
def start(self):
result = self.pool.apply_async(subprocess)
print("Subprocess has been started")
result.get() # wait for the end of subprocess
self.pool.close()
self.pool.join()
if __name__ == '__main__':
test = Test()
test.start()
Related
I create sub-class from multiprocessing.Process.
Object p.run() can update instance.ret_value from the long_runtime_proc, but p.start() can't get the ret_value updated though long_runtime_proc called and ran.
How can I get ret_value with p.start()?
*class myProcess (multiprocessing.Process):
def __init__(self, pid, name, ret_value=0):
multiprocessing.Process.__init__(self)
self.id = pid
self.ret_value = ret_value
def run(self):
self.ret_value = long_runtime_proc (self.id)*
Calling Process.run() directly does not start a new process, i.e. the code in Process.run() is executed in the same process that invoked it. That's why changes to self.ret_value are effective. However, you are not supposed to call Process.run() directly.
When you start the subprocess with Process.start() a new child process is created and then the code in Process.run() is executed in this new process. When you assign the return value of long_runtime_proc to self.ret_value, this occurs in the child process, not the parent and thus the parent ret_vaule is not updated.
What you probably need to do is to use a pipe or a queue to send the return value to the parent process. See the documentation for details. Here is an example using a queue:
import time
import multiprocessing
def long_runtime_proc():
'''Simulate a long running process'''
time.sleep(10)
return 1234
class myProcess(multiprocessing.Process):
def __init__(self, result_queue):
self.result_queue = result_queue
super(myProcess, self).__init__()
def run(self):
self.result_queue.put(long_runtime_proc())
q = multiprocessing.Queue()
p = myProcess(q)
p.start()
ret_value = q.get()
p.join()
With this code ret_value will end up being assigned the value off the queue which will be 1234.
I'm having the following Problem. I want to implement a web crawler, so far this worked but it was so slow, that I tried to use multiprocessing for fetching the URLs.
Unfortunately I'm not very experienced at this field.
After some reading the easiest way seemed to me to use the map method from multiprocessing.pool for this.
But I constantly get the following error:
TypeError: Pickling an AuthenticationString object is disallowed for security reasons
I found very few cases with the same error and they unfortunately did not help me.
I created a stripped version of my code which can reproduce the error:
import multiprocessing
class TestCrawler:
def __init__(self):
self.m = multiprocessing.Manager()
self.queue = self.m.Queue()
for i in range(50):
self.queue.put(str(i))
self.pool = multiprocessing.Pool(6)
def mainloop(self):
self.process_next_url(self.queue)
while True:
self.pool.map(self.process_next_url, (self.queue,))
def process_next_url(self, queue):
url = queue.get()
print(url)
c = TestCrawler()
c.mainloop()
I would be very thankful about any help or suggestion!
Question: But I constantly get the following error:
The Error you'r getting is missleading, the reason are
self.queue = self.m.Queue()
Move the Queue instantiation Outside the class TestCrawler.
This leads to another Error:
NotImplementedError: pool objects cannot be passed between processes or pickled
The reason are:
self.pool = multiprocessing.Pool(6)
Both Errors are indicating that pickle can't find the class Members.
Note: Endless Loop!
Your following while Loop leads to a Endless Loop!
This will overload your System!
Furthermore, your pool.map(... starts only one Process with one Task!
while True:
self.pool.map(self.process_next_url, (self.queue,))
I suggest reading The Examples that demonstrates the use of a pool
Change to the following:
class TestCrawler:
def __init__(self, tasks):
# Assign the Global task to class member
self.queue = tasks
for i in range(50):
self.queue.put(str(i))
def mainloop(self):
# Instantiate the pool local
pool = mp.Pool(6)
for n in range(50):
# .map requires a Parameter pass None
pool.map(self.process_next_url, (None,))
# None is passed
def process_next_url(self, dummy):
url = self.queue.get()
print(url)
if __name__ == "__main__":
# Create the Queue as Global
tasks = mp.Manager().Queue()
# Pass the Queue to your class TestCrawler
c = TestCrawler(tasks)
c.mainloop()
This Example starts 5 Processes each processing 10 Tasks(urls):
class TestCrawler2:
def __init__(self, tasks):
self.tasks = tasks
def start(self):
pool = mp.Pool(5)
pool.map(self.process_url, self.tasks)
def process_url(self, url):
print('self.process_url({})'.format(url))
if __name__ == "__main__":
tasks = ['url{}'.format(n) for n in range(50)]
TestCrawler2(tasks).start()
Tested with Python: 3.4.2
I have this example code:
# some imports that I'm not including in the question
class daemon:
def start(self):
# do something, I'm not including what this script does to not write useless code to the question
self.run()
def run(self):
"""You should override this method when you subclass Daemon.
It will be called after the process has been daemonized by
start() or restart().
"""
class MyDaemon(daemon):
def run(self):
while True:
time.sleep(1)
if __name__ == "__main__":
daemonz = MyDaemon('/tmp/daemon-example.pid')
daemonz.start()
def firstfunction():
# do something
secondfunction()
def secondfunction():
# do something
thirdfunction()
def thirdfunction():
# do something
# here are some variables set that I am not writing
firstfunction()
How can I exit from the run(self) function of class "daemon" and going on executing the firstfunction() like written in the last line? I'm a newbie with Python, and I'm trying to learn
# EDIT
I managed to implement the daemon class into the treading class. But I'm in the same situation of first, the script stays in daemon class and doesn't execute the other lines.
class MyThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def daemonize(self):
# istructions to daemonize my script process
def run(self):
self.daemonize()
def my_function():
print("MyFunction executed") # never executed
thread = MyThread()
thread.start()
my_function() # the process is successfully damonized but
# this function is never executed
You may use the breakkeyword to exit loops, and continue to the next line. return can be used to exit functions.
class daemon:
def start(self):
self.run()
def run(self):
while True:
break
return
print() # This never executes
If you want MyDaemon to run alongside the rest of your code, you have to make it a process or thread. Code then automatically continues to the next line, while the MyDaemon class (thread/process) runs.
import threading
class MyThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self):
print("Thread started")
while True:
pass
def my_function():
print("MyFunction executed")
thread = MyThread()
thread.start() # executes run(self)
my_function()
This code produces the following result:
Thread started
MyFunction executed
To make thread a daemon, you can use thread.setDaemon(True). That function must be called before the thread is started:
thread = MyThread()
thread.setDaemon(True)
thread.start()
my_function()
I have a code snippet taken from another stackoverflow post
Python Workers and Queues
from multiprocessing import Process
from Queue import Queue
class Worker(Process):
def __init__(self, queue):
super(Worker, self).__init__()
self.queue= queue
def run(self):
print 'Worker started'
# do some initialization here
print 'Computing things!'
for data in iter( self.queue.get, None ):
print(data)
if __name__ == '__main__':
request_queue = Queue()
for i in range(4):
Worker( request_queue ).start()
for data in range(100):
request_queue.put( data )
# Sentinel objects to allow clean shutdown: 1 per worker.
for i in range(4):
request_queue.put( None )
Why does this process hang and not process the queue contents?
Found my error. I did not know there are two Queues.
changing
from multiprocessing import Process
from Queue import Queue
to
from multiprocessing import Process,Queue
Now I have the expected behavior
This code:
import multiprocessing as mp
from threading import Thread
import subprocess
import time
class WorkerProcess(mp.Process):
def run(self):
# Simulate long running task
self.subprocess = subprocess.Popen(['python', '-c', 'import time; time.sleep(1000)'])
self.code = self.subprocess.wait()
class ControlThread(Thread):
def run():
jobs = []
for _ in range(2):
job = WorkerProcess()
jobs.append(job)
job.start()
# wait for a while and then kill jobs
time.sleep(2)
for job in jobs:
job.terminate()
if __name__ == "__main__":
controller = ControlThread()
controller.start()
When I terminate the spawned WorkerProcess instances. They die just fine, however the subprocesses python -c 'import time; time.sleep(1000) runs until completition. This is well documented in the official docs, but how do I kill the child processes of a killed process?
A possbile soultion might be:
Wrap WorkerProcess.run() method inside try/except block catching SIGTERM, and terminating the subprocess.call call. But I am not sure how to catch the SIGTERM in the WorkerProcess
I also tried setting signal.signal(signal.SIGINT, handler) in the WorkerProcess, but I am getting ValuError, because it is allowed to be set only in the main thread.
What do I do now?
EDIT: As #svalorzen pointed out in comments this doesn't really work since the reference to self.subprocess is lost.
Finally came to a clean, acceptable solution. Since mp.Process.terminate is a method, we can override it.
class WorkerProcess(mp.Process):
def run(self):
# Simulate long running task
self.subprocess = subprocess.Popen(['python', '-c', 'import time; time.sleep(1000)'])
self.code = self.subprocess.wait()
# HERE
def terminate(self):
self.subprocess.terminate()
super(WorkerProcess, self).terminate()
You can use queues to message to your subprocesses and ask them nicely to terminate their children before exiting themselves. You can't use signals in anywhere else but your main thread, so signals are not suitable for this.
Curiously, when I modify the code like this, even if I interrupt it with control+C, subprocesses will die as well. This may be OS related thing, though.
import multiprocessing as mp
from threading import Thread
import subprocess
import time
from Queue import Empty
class WorkerProcess(mp.Process):
def __init__(self,que):
super(WorkerProcess,self).__init__()
self.queue = que
def run(self):
# Simulate long running task
self.subprocess = subprocess.Popen(['python', '-c', 'import time; time.sleep(1000)'])
while True:
a = self.subprocess.poll()
if a is None:
time.sleep(1)
try:
if self.queue.get(0) == "exit":
print "kill"
self.subprocess.kill()
self.subprocess.wait()
break
else:
pass
except Empty:
pass
print "run"
else:
print "exiting"
class ControlThread(Thread):
def run(self):
jobs = []
queues = []
for _ in range(2):
q = mp.Queue()
job = WorkerProcess(q)
queues.append(q)
jobs.append(job)
job.start()
# wait for a while and then kill jobs
time.sleep(5)
for q in queues:
q.put("exit")
time.sleep(30)
if __name__ == "__main__":
controller = ControlThread()
controller.start()
Hope this helps.
Hannu