Pika Consumer as a Python Process (multiprocessing) - python

I am trying to use the example Pika Async consumer (http://pika.readthedocs.io/en/0.10.0/examples/asynchronous_consumer_example.html) as a multiprocessing process (by making the ExampleConsumer class subclass multiprocessing.Process). However, I'm running into some issues with gracefully shutting down everything.
Let's say for example I have defined my procs as below:
for k, v in queues_callbacks.iteritems():
proc = ExampleConsumer(queue, k, v, rabbit_user, rabbit_pw, rabbit_host, rabbit_port)
"queues_callbacks" is basically just a dictionary of exchange : callback_function (ideally I'd like to be able to connect to several exchanges with this architecture).
Then I do the normal python way of dealing with starting processes:
try:
for proc in self.consumers:
proc.start()
for proc in self.consumers:
proc.join()
except KeyboardInterrupt:
for proc in self.consumers:
proc.terminate()
proc.join(1)
The issue is coming when I try to stop everything. Let's say I've overriden the "terminate" method to call the consumer's "stop" method then continue on with the normal terminate of Process. With this structure, I am getting some strange attribute errors
Traceback (most recent call last):
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 154, in <module>
main()
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 150, in main
mybot.start()
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 71, in start
self.stop()
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 53, in stop
self.__stop_consumers__()
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 130, in __stop_consumers__
self.consumers[0].terminate()
File "/Users/christopheralexander/PycharmProjects/new_bot/rabbit_consumer.py", line 414, in terminate
self.stop()
File "/Users/christopheralexander/PycharmProjects/new_bot/rabbit_consumer.py", line 399, in stop
self._connection.ioloop.start()
AttributeError: 'NoneType' object has no attribute 'ioloop'
It's as if these attributes somehow disappear at some point. In the particular case above, _connection is initialized as None, but then gets set when the Consumer is started. However, when the "stop" method is called, it has already reverted back to None (with nothing set to do so). I'm also observing other strange behavior, such as times when it appears that things are getting called twice (even though "stop" is called once). Any ideas as to what is going on here, or is this not the proper way of architecting this?
Thanks!

Related

Implementation details of Python multiprocessing fork vs spawn

According to the multiprocessing documentation on picklability, it states
Picklability
Ensure that the arguments to the methods of proxies are picklable.
More picklability
Ensure that all arguments to Process.init() are picklable. Also, if you subclass Process then make sure that instances will be picklable when the Process.start method is called.
I think it basically means that whatever is sent through arguments of Process will be pickled/unpickled.
But in Better to inherit than pickle/unpickle session, it states
When using the spawn or forkserver start methods many types from multiprocessing need to be picklable so that child processes can use them. However, one should generally avoid sending shared objects to other processes using pipes or queues. Instead you should arrange the program so that a process which needs access to a shared resource created elsewhere can inherit it from an ancestor process.
I conducted the experiment which shows the output Read successfully..
def read_dataset(dataset, window):
return dataset.read(window=window)
if __name__ == "__main__":
mp.set_start_method("fork")
with rasterio.open(Path("test.tiff").absolute()) as dataset:
window = Window(col_off=0, row_off=0, width=100, height=100)
p1 = mp.Process(target=read_dataset, args=(dataset, window))
p1.start()
p1.join()
print("Read successfully.")
But when changing to mp.set_start_method("spawn"), it shows the error below.
Traceback (most recent call last):
File "test.py", line 88, in <module>
p1.start()
File "/usr/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/usr/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File "stringsource", line 2, in rasterio._io.DatasetReaderBase.__reduce_cython__
TypeError: self._hds cannot be converted to a Python object for pickling
My question is the following.
When a child process is generated with fork, the variable is inherited instead of pickled/unpickled. But when a child process is generated with spawn, then the arguments are sent through pickling/unpickling. Where can I find such implementation detail? Thanks.
multiprocessing context
popen fork
spawn fork
fork shares a value in memory and starts it.
spawn is implemented by creating a cmd for the source code and sharing some variables through a pipe.
If you edit and save the source code just before spawning, you can get the result of the modified code.

Exception handling in Pool callback

Given this example scenario:
def _callback(result):
if result == 2:
# introduce an exception into one of the callbacks
raise Exception("foo")
print (result)
def _target(v):
return v
worker_pool = Pool()
for i in range(10):
worker_pool.apply_async(_target, args=(i,), callback=_callback)
worker_pool.close()
worker_pool.join()
I was hoping to see each value of i printed except for i=2, which would instead have yielded an exception.
Instead I see something like the following:
0
1
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 479, in _handle_results
cache[job]._set(i, obj)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 649, in _set
self._callback(self._value)
File "test3.py", line 6, in _callback
raise Exception("foo")
Exception: foo
... and then execution just hangs.
I'm aware that Pool handles callbacks on a separate thread, but why would execution hang and how can I reliably guard against errors in a task's callback?
This is happening because the exception inside the callback method is basically killing the thread that handles the Pool, as it does not have an except block to handle this kind of situation. After the Thread is dead, it's unable to join the worker_pool, so your application hangs.
I believe that's a decision made by the Python maintainers, so the best way to handle this exception is to envelop your code inside a try/except block and handle it, instead of bubbling and letting the thread be killed.

HTTP request concurrency and reverse proxy

Apologies if the title is not descriptive and I struggle to find a good title for the question.
My question involves Python, CouchDB (to a lesser degree), multiprocessing and networking. It started out as I was trying to debug a co-worker's program using Python's multiprocessing module to parallelize requests to a CouchDB database using couchdb-python. I created a minimal program to exhibit the bug and eventually solved the issue, but the solution drew another question which I was not able to answer to my best knowledge. I'm hoping experts on SO could help me with this, so here it goes.
The premise of the problem is pretty simple. We have n resources, all of which can be retrieved concurrently. Instead of making n serial requests, my co-worker is using the multiprocessing module to fetch all n resources in parallel. Here's a program I wrote to demonstrate the issue:
The Script (bug.py)
import couchdb
import multiprocessing
server = couchdb.Server(SERVER)
try:
database = server.create('test')
except:
server.delete('test')
database = server.create('test')
database.save({'_id': '1', 'type': 'dog', 'name': 'chase'})
database.save({'_id': '2', 'type': 'dog', 'name': 'rubble'})
database.save({'_id': '3', 'type': 'cat', 'name': 'kali'})
def query_id(id):
print(dict(database[id]))
def main():
args = [
['dog', 'chase'],
['dog', 'rubble'],
['cat', 'kali'],
]
print('-' * 80)
processes = []
for id_ in ['1', '2', '3']:
proc = multiprocessing.Process(target=query_id, args=(id_))
processes.append(proc)
proc.start()
for proc in processes:
proc.join()
if __name__ == '__main__':
main()
Pretty innocent code, right? Well, running it on the latest couchdb and couchdb-python gives the following error:
The output
--------------------------------------------------------------------------------
Process Process-2:
Process Process-1:
Traceback (most recent call last):
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
self._target(*self._args, **self._kwargs)
File "bug.py", line 25, in query_id
File "bug.py", line 25, in query_id
print(dict(database[id]))
File "/home/kevin/src/couchdb-python/couchdb/client.py", line 418, in __getitem__
print(dict(database[id]))
File "/home/kevin/src/couchdb-python/couchdb/client.py", line 418, in __getitem__
return Document(data)
TypeError: 'ResponseBody' object is not iterable
return Document(data)
TypeError: 'ResponseBody' object is not iterable
After some digging, I finally found out that couchdb-python's implementation of ConnectionPool is not multiprocess safe. See this PR for more details. Basically, all processes share the same ConnectionPool object, and was given the same httplib.HTTPConnection object, and when they all simultaneously try to read from the connection, the string being returned is garbled, and the bug ensued. You can see the evidence of it if you put print(os.getpid(), line) inside httplib.HTTPResponse._read_status method. Here's a sample output after the print statement is added:
(26490, 'TP1.120 O\r\n')
(26489, 'T/ 0KServer: CouchDB/1.6.1 (Erlang OTP/17)\r\n')
Process Process-2:
Process Process-3:
Traceback (most recent call last):
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "bug.py", line 25, in query_id
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
print(dict(database[id]))
File "/home/kevin/src/couchdb-python/couchdb/client.py", line 418, in __getitem__
self._target(*self._args, **self._kwargs)
File "bug.py", line 25, in query_id
print(dict(database[id]))
File "/home/kevin/src/couchdb-python/couchdb/client.py", line 418, in __getitem__
return Document(data)
TypeError: 'ResponseBody' object is not iterable
return Document(data)
TypeError: 'ResponseBody' object is not iterable
As seen here, the first line being read from the sub-processes are only partial, indicating a race condition here. If I further inspect the HTTPConnection object, I can see all three processes are sharing the same connection object, the socket to the server and the file descriptor from the socket that's being used for reading.
Puzzle
So far so good. I've identified the root cause of the problem and put together a fix. However, complication arises when I put the couchdb instance behind a reverse proxy. In this case, I'm using haproxy. Here's a sample config:
global
...
defaults
...
listen couchdb
bind *:9999
mode http
stats enable
option httpclose
option forwardfor
server couchdb-1 127.0.0.1:5984 check
and point the couchdb server url to http://localhost:9999 in the bug script, reran the script, and everything was fine! I also inspected the connection object, the socket and the file descriptor, and there were also shared among all processes.
This got me puzzled. I brought up mitmproxy and inspected what's going on in the two cases: with or without haproxy.
Without haproxy
When the parallel requests are made without haproxy, I observed in the mitmproxy details tab (I showed a single request, but the timing sequence is the same for all 3 concurrent requests):
The event sequence here suggests a blocking synchronous request.
With haproxy
You can see the sequence here is different from that without haproxy. Request is considered complete without the server connection being initiated.
Question
I'm not used to working at this low level so I know my knowledge in this regard is pretty lacking here. I want to understand what difference did putting haproxy in front of it brought that subverted the multiprocessing bug in couchdb-python? haproxy is event-based, so I suspect that has something to do with it, but would really appreciate someone explaining the difference!
Thanks a bunch in advance!

Put several Threads in sleep/wait not using Time.Sleep()

I wrote this function that handles the "rate limit error" of a Tweepy's cursor in order to keep downloading from Twitter APIs.
def limit_handled(cursor, user):
over = False
while True:
try:
if (over == True):
print "Routine Riattivata, Serviamo il numero:", user
over = False
yield cursor.next()
except tweepy.RateLimitError:
print "Raggiunto Limite, Routine in Pausa"
threading.Event.wait(15*60 + 15)
over = True
except tweepy.TweepError:
print "TweepError"
threading.Event.wait(5)
Since I am using serveral threads to connect I would like to stop each one of them when the RateLimitError error raises and restart them after 15 minutes.
I previously used the function:
time.sleep(x)
But I understood that doesn't work well for threads (the counter do not increase if the thread is not active) so I tried to use:
threading.Event.wait(x)
But it this error raises:
Exception in thread Thread-15:
Traceback (most recent call last):
File "/home/xor/anaconda/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/home/xor/anaconda/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/xor/spyder/algo/HW2/hw2.py", line 117, in work
storeFollowersOnMDB(ids, api, k)
File "/home/xor/spyder/algo/HW2/hw2.py", line 111, in storeFollowersOnMDB
for followersPag in limit_handled(tweepy.Cursor(api.followers_ids, id = user, count=5000).pages(), user):
File "/home/xor/spyder/algo/HW2/hw2.py", line 52, in limit_handled
threading.Event.wait(15*60 + 15)
AttributeError: 'function' object has no attribute 'wait'
How can I "sleep/wait" my threads being sure that they will wake up at the right moment?
Try doing it like this instead:
import threading
dummy_event = threading.Event()
dummy_event.wait(timeout=1)
also try google-ing next time first: Issues with time.sleep and Multithreading in Python

Error with multiprocessing, atexit and global data

Sorry in advance, this is going to be long ...
Possibly related:
Python Multiprocessing atexit Error "Error in atexit._run_exitfuncs"
Definitely related:
python parallel map (multiprocessing.Pool.map) with global data
Keyboard Interrupts with python's multiprocessing Pool
Here's a "simple" script I hacked together to illustrate my problem...
import time
import multiprocessing as multi
import atexit
cleanup_stuff=multi.Manager().list([])
##################################################
# Some code to allow keyboard interrupts
##################################################
was_interrupted=multi.Manager().list([])
class _interrupt(object):
"""
Toy class to allow retrieval of the interrupt that triggered it's execution
"""
def __init__(self,interrupt):
self.interrupt=interrupt
def interrupt():
was_interrupted.append(1)
def interruptable(func):
"""
decorator to allow functions to be "interruptable" by
a keyboard interrupt when in python's multiprocessing.Pool.map
**Note**, this won't actually cause the Map to be interrupted,
It will merely cause the following functions to be not executed.
"""
def newfunc(*args,**kwargs):
try:
if(not was_interrupted):
return func(*args,**kwargs)
else:
return False
except KeyboardInterrupt as e:
interrupt()
return _interrupt(e) #If we really want to know about the interrupt...
return newfunc
#atexit.register
def cleanup():
for i in cleanup_stuff:
print(i)
return
#interruptable
def func(i):
print(i)
cleanup_stuff.append(i)
time.sleep(float(i)/10.)
return i
#Must wrap func here, otherwise it won't be found in __main__'s dict
#Maybe because it was created dynamically using the decorator?
def wrapper(*args):
return func(*args)
if __name__ == "__main__":
#This is an attempt to use signals -- I also attempted something similar where
#The signals were only caught in the child processes...Or only on the main process...
#
#import signal
#def onSigInt(*args): interrupt()
#signal.signal(signal.SIGINT,onSigInt)
#Try 2 with signals (only catch signal on main process)
#import signal
#def onSigInt(*args): interrupt()
#signal.signal(signal.SIGINT,onSigInt)
#def startup(): signal.signal(signal.SIGINT,signal.SIG_IGN)
#p=multi.Pool(processes=4,initializer=startup)
#Try 3 with signals (only catch signal on child processes)
#import signal
#def onSigInt(*args): interrupt()
#signal.signal(signal.SIGINT,signal.SIG_IGN)
#def startup(): signal.signal(signal.SIGINT,onSigInt)
#p=multi.Pool(processes=4,initializer=startup)
p=multi.Pool(4)
try:
out=p.map(wrapper,range(30))
#out=p.map_async(wrapper,range(30)).get() #This doesn't work either...
#The following lines don't work either
#Effectively trying to roll my own p.map() with p.apply_async
# results=[p.apply_async(wrapper,args=(i,)) for i in range(30)]
# out = [ r.get() for r in results() ]
except KeyboardInterrupt:
print ("Hello!")
out=None
finally:
p.terminate()
p.join()
print (out)
This works just fine if no KeyboardInterrupt is raised. However, if I raise one, the following exception occurs:
10
7
9
12
^CHello!
None
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "test.py", line 58, in cleanup
for i in cleanup_stuff:
File "<string>", line 2, in __getitem__
File "/usr/lib/python2.6/multiprocessing/managers.py", line 722, in _callmethod
self._connect()
File "/usr/lib/python2.6/multiprocessing/managers.py", line 709, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 143, in Client
c = SocketClient(address)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 263, in SocketClient
s.connect(address)
File "<string>", line 1, in connect
error: [Errno 2] No such file or directory
Error in sys.exitfunc:
Traceback (most recent call last):
File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "test.py", line 58, in cleanup
for i in cleanup_stuff:
File "<string>", line 2, in __getitem__
File "/usr/lib/python2.6/multiprocessing/managers.py", line 722, in _callmethod
self._connect()
File "/usr/lib/python2.6/multiprocessing/managers.py", line 709, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 143, in Client
c = SocketClient(address)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 263, in SocketClient
s.connect(address)
File "<string>", line 1, in connect
socket.error: [Errno 2] No such file or directory
Interestingly enough, the code does exit the Pool.map function without calling any of the additional functions ... The problem seems to be that the KeyboardInterrupt isn't handled properly at some point, but it is a little confusing where that is, and why it isn't handled in interruptable. Thanks.
Note, the same problem happens if I use out=p.map_async(wrapper,range(30)).get()
EDIT 1
A little closer ... If I enclose the out=p.map(...) in a try,except,finally clause, it gets rid of the first exception ... the other ones are still raised in atexit however. The code and traceback above have been updated.
EDIT 2
Something else that does not work has been added to the code above as a comment. (Same error). This attempt was inspired by:
http://jessenoller.com/2009/01/08/multiprocessingpool-and-keyboardinterrupt/
EDIT 3
Another failed attempt using signals added to the code above.
EDIT 4
I have figured out how to restructure my code so that the above is no longer necessary. In the (unlikely) event that someone stumbles upon this thread with the same use-case that I had, I will describe my solution ...
Use Case
I have a function which generates temporary files using the tempfile module. I would like those temporary files to be cleaned up when the program exits. My initial attempt was to pack each temporary file name into a list and then delete all the elements of the list with a function registered via atexit.register. The problem is that the updated list was not being updated across multiple processes. This is where I got the idea of using multiprocessing.Manager to manage the list data. Unfortunately, this fails on a KeyboardInterrupt no matter how hard I tried because the communication sockets between processes were broken for some reason. The solution to this problem is simple. Prior to using multiprocessing, set the temporary file directory ... something like tempfile.tempdir=tempfile.mkdtemp() and then register a function to delete the temporary directory. Each of the processes writes to the same temporary directory, so it works. Of course, this solution only works where the shared data is a list of files that needs to be deleted at the end of the program's life.

Categories

Resources