I am using Python 2.6 and the multiprocessing module for multi-threading. Now I would like to have a synchronized dict (where the only atomic operation I really need is the += operator on a value).
Should I wrap the dict with a multiprocessing.sharedctypes.synchronized() call? Or is another way the way to go?
Intro
There seems to be a lot of arm-chair suggestions and no working examples. None of the answers listed here even suggest using multiprocessing and this is quite a bit disappointing and disturbing. As python lovers we should support our built-in libraries, and while parallel processing and synchronization is never a trivial matter, I believe it can be made trivial with proper design. This is becoming extremely important in modern multi-core architectures and cannot be stressed enough! That said, I am far from satisfied with the multiprocessing library, as it is still in its infancy stages with quite a few pitfalls, bugs, and being geared towards functional programming (which I detest). Currently I still prefer the Pyro module (which is way ahead of its time) over multiprocessing due to multiprocessing's severe limitation in being unable to share newly created objects while the server is running. The "register" class-method of the manager objects will only actually register an object BEFORE the manager (or its server) is started. Enough chatter, more code:
Server.py
from multiprocessing.managers import SyncManager
class MyManager(SyncManager):
pass
syncdict = {}
def get_dict():
return syncdict
if __name__ == "__main__":
MyManager.register("syncdict", get_dict)
manager = MyManager(("127.0.0.1", 5000), authkey="password")
manager.start()
raw_input("Press any key to kill server".center(50, "-"))
manager.shutdown()
In the above code example, Server.py makes use of multiprocessing's SyncManager which can supply synchronized shared objects. This code will not work running in the interpreter because the multiprocessing library is quite touchy on how to find the "callable" for each registered object. Running Server.py will start a customized SyncManager that shares the syncdict dictionary for use of multiple processes and can be connected to clients either on the same machine, or if run on an IP address other than loopback, other machines. In this case the server is run on loopback (127.0.0.1) on port 5000. Using the authkey parameter uses secure connections when manipulating syncdict. When any key is pressed the manager is shutdown.
Client.py
from multiprocessing.managers import SyncManager
import sys, time
class MyManager(SyncManager):
pass
MyManager.register("syncdict")
if __name__ == "__main__":
manager = MyManager(("127.0.0.1", 5000), authkey="password")
manager.connect()
syncdict = manager.syncdict()
print "dict = %s" % (dir(syncdict))
key = raw_input("Enter key to update: ")
inc = float(raw_input("Enter increment: "))
sleep = float(raw_input("Enter sleep time (sec): "))
try:
#if the key doesn't exist create it
if not syncdict.has_key(key):
syncdict.update([(key, 0)])
#increment key value every sleep seconds
#then print syncdict
while True:
syncdict.update([(key, syncdict.get(key) + inc)])
time.sleep(sleep)
print "%s" % (syncdict)
except KeyboardInterrupt:
print "Killed client"
The client must also create a customized SyncManager, registering "syncdict", this time without passing in a callable to retrieve the shared dict. It then uses the customized SycnManager to connect using the loopback IP address (127.0.0.1) on port 5000 and an authkey establishing a secure connection to the manager started in Server.py. It retrieves the shared dict syncdict by calling the registered callable on the manager. It prompts the user for the following:
The key in syncdict to operate on
The amount to increment the value accessed by the key every cycle
The amount of time to sleep per cycle in seconds
The client then checks to see if the key exists. If it doesn't it creates the key on the syncdict. The client then enters an "endless" loop where it updates the key's value by the increment, sleeps the amount specified, and prints the syncdict only to repeat this process until a KeyboardInterrupt occurs (Ctrl+C).
Annoying problems
The Manager's register methods MUST be called before the manager is started otherwise you will get exceptions even though a dir call on the Manager will reveal that it indeed does have the method that was registered.
All manipulations of the dict must be done with methods and not dict assignments (syncdict["blast"] = 2 will fail miserably because of the way multiprocessing shares custom objects)
Using SyncManager's dict method would alleviate annoying problem #2 except that annoying problem #1 prevents the proxy returned by SyncManager.dict() being registered and shared. (SyncManager.dict() can only be called AFTER the manager is started, and register will only work BEFORE the manager is started so SyncManager.dict() is only useful when doing functional programming and passing the proxy to Processes as an argument like the doc examples do)
The server AND the client both have to register even though intuitively it would seem like the client would just be able to figure it out after connecting to the manager (Please add this to your wish-list multiprocessing developers)
Closing
I hope you enjoyed this quite thorough and slightly time-consuming answer as much as I have. I was having a great deal of trouble getting straight in my mind why I was struggling so much with the multiprocessing module where Pyro makes it a breeze and now thanks to this answer I have hit the nail on the head. I hope this is useful to the python community on how to improve the multiprocessing module as I do believe it has a great deal of promise but in its infancy falls short of what is possible. Despite the annoying problems described I think this is still quite a viable alternative and is pretty simple. You could also use SyncManager.dict() and pass it to Processes as an argument the way the docs show and it would probably be an even simpler solution depending on your requirements it just feels unnatural to me.
I would dedicate a separate process to maintaining the "shared dict": just use e.g. xmlrpclib to make that tiny amount of code available to the other processes, exposing via xmlrpclib e.g. a function taking key, increment to perform the increment and one taking just the key and returning the value, with semantic details (is there a default value for missing keys, etc, etc) depending on your app's needs.
Then you can use any approach you like to implement the shared-dict dedicated process: all the way from a single-threaded server with a simple dict in memory, to a simple sqlite DB, etc, etc. I suggest you start with code "as simple as you can get away with" (depending on whether you need a persistent shared dict, or persistence is not necessary to you), then measure and optimize as and if needed.
In response to an appropriate solution to the concurrent-write issue. I did very quick research and found that this article is suggesting a lock/semaphore solution. (http://effbot.org/zone/thread-synchronization.htm)
While the example isn't specificity on a dictionary, I'm pretty sure you could code a class-based wrapper object to help you work with dictionaries based on this idea.
If I had a requirement to implement something like this in a thread safe manner, I'd probably use the Python Semaphore solution. (Assuming my earlier merge technique wouldn't work.) I believe that semaphores generally slow down thread efficiencies due to their blocking nature.
From the site:
A semaphore is a more advanced lock mechanism. A semaphore has an internal counter rather than a lock flag, and it only blocks if more than a given number of threads have attempted to hold the semaphore. Depending on how the semaphore is initialized, this allows multiple threads to access the same code section simultaneously.
semaphore = threading.BoundedSemaphore()
semaphore.acquire() # decrements the counter
... access the shared resource; work with dictionary, add item or whatever.
semaphore.release() # increments the counter
Is there a reason that the dictionary needs to be shared in the first place? Could you have each thread maintain their own instance of a dictionary and either merge at the end of the thread processing or periodically use a call-back to merge copies of the individual thread dictionaries together?
I don't know exactly what you are doing, so keep in my that my written plan may not work verbatim. What I'm suggesting is more of a high-level design idea.
Related
I'm working with a django application hosted on heroku with redistogo addon:nano pack. I'm using rq, to execute tasks in the background - the tasks are initiated by online users. I've a constraint on increasing number of connections, limited resources I'm afraid.
I'm currently having a single worker running over 'n' number of queues. Each queue uses an instance of connection from the connection pool to handle 'n' different types of task. For instance, lets say if 4 users initiate same type of task, I would like to have my main worker create child processes dynamically, to handle it. Is there a way to achieve required multiprocessing and concurrency?
I tried with multiprocessing module, initially without introducing Lock(); but that exposes and overwrites user passed data to the initiating function, with the previous request data. After applying locks, it restricts second user to initiate the requests by returning a server error - 500
github link #1: Looks like the team is working on the PR; not yet released though!
github link #2: This post helps to explain creating more workers at runtime.
This solution however also overrides the data. The new request is again processed with the previous requests data.
Let me know if you need to see some code. I'll try to post a minimal reproducible snippet.
Any thoughts/suggestions/guidelines?
Did you get a chance to try AutoWorker?
Spawn RQ Workers automatically.
from autoworker import AutoWorker
aw = AutoWorker(queue='high', max_procs=6)
aw.work()
It makes use of multiprocessing with StrictRedis from redis module and following imports from rq
from rq.contrib.legacy import cleanup_ghosts
from rq.queue import Queue
from rq.worker import Worker, WorkerStatus
After looking under the hood, I realised Worker class is already implementing multiprocessing.
The work function internally calls execute_job(job, queue) which in turn as quoted in the module
Spawns a work horse to perform the actual work and passes it a job.
The worker will wait for the work horse and make sure it executes within the given timeout bounds,
or will end the work horse with SIGALRM.
The execute_job() funtion makes a call to fork_work_horse(job, queue) implicitly which spawns a work horse to perform the actual work and passes it a job as per the following logic:
def fork_work_horse(self, job, queue):
child_pid = os.fork()
os.environ['RQ_WORKER_ID'] = self.name
os.environ['RQ_JOB_ID'] = job.id
if child_pid == 0:
self.main_work_horse(job, queue)
else:
self._horse_pid = child_pid
self.procline('Forked {0} at {1}'.format(child_pid, time.time()))
The main_work_horse makes an internal call to perform_job(job, queue) which makes a few other calls to actually perform the job.
All the steps about The Worker Lifecycle mentioned over rq's official documentation page are taken care within these calls.
It's not the multiprocessing I was expecting, but I guess they have a way of doing things. However my original post is still not answered with this, also I'm still not sure about concurrency..
The documentation there still needs to be worked upon, since it hardly covers the true essence of this library!
I'm writing a machine learning program with the following components:
A shared "Experience Pool" with a binary-tree-like data structure.
N simulator processes. Each adds an "experience object" to the pool every once in a while. The pool is responsible for balancing its tree.
M learner processes that sample a batch of "experience objects" from the pool every few moments and perform whatever learning procedure.
I don't know what's the best way to implement the above. I'm not using Tensorflow, so I cannot take advantage of its parallel capability. More concretely,
I first think of Python3's built-in multiprocessing library. Unlike multithreading, however, multiprocessing module cannot have different processes update the same global object. My hunch is that I should use the server-proxy model. Could anyone please give me a rough skeleton code to start with?
Is MPI4py a better solution?
Any other libraries that would be a better fit? I've looked at celery, disque, etc. It's not obvious to me how to adapt them to my use case.
Based on the comments, what you're really looking for is a way to update a shared object from a set of processes that are carrying out a CPU-bound task. The CPU-bounding makes multiprocessing an obvious choice - if most of your work was IO-bound, multithreading would have been a simpler choice.
Your problem follows a simpler server-client model: the clients use the server as a simple stateful store, no communication between any child processes is needed, and no process needs to be synchronised.
Thus, the simplest way to do this is to:
Start a separate process that contains a server.
Inside the server logic, provide methods to update and read from a single object.
Treat both your simulator and learner processes as separate clients that can periodically read and update the global state.
From the server's perspective, the identity of the clients doesn't matter - only their actions do.
Thus, this can be accomplished by using a customised manager in multiprocessing as so:
# server.py
from multiprocessing.managers import BaseManager
# this represents the data structure you've already implemented.
from ... import ExperienceTree
# An important note: the way proxy objects work is by shared weak reference to
# the object. If all of your workers die, it takes your proxy object with
# it. Thus, if you have an instance, the instance is garbage-collected
# once all references to it have been erased. I have chosen to sidestep
# this in my code by using class variables and objects so that instances
# are never used - you may define __init__, etc. if you so wish, but
# just be aware of what will happen to your object once all workers are gone.
class ExperiencePool(object):
tree = ExperienceTree()
#classmethod
def update(cls, experience_object):
''' Implement methods to update the tree with an experience object. '''
cls.tree.update(experience_object)
#classmethod
def sample(cls):
''' Implement methods to sample the tree's experience objects. '''
return cls.tree.sample()
# subclass base manager
class Server(BaseManager):
pass
# register the class you just created - now you can access an instance of
# ExperiencePool using Server.Shared_Experience_Pool().
Server.register('Shared_Experience_Pool', ExperiencePool)
if __name__ == '__main__':
# run the server on port 8080 of your own machine
with Server(('localhost', 8080), authkey=b'none') as server_process:
server_process.get_server().serve_forever()
Now for all of your clients you can just do:
# client.py - you can always have a separate client file for a learner and a simulator.
from multiprocessing.managers import BaseManager
from server import ExperiencePool
class Server(BaseManager):
pass
Server.register('Shared_Experience_Pool', ExperiencePool)
if __name__ == '__main__':
# run the server on port 8080 of your own machine forever.
server_process = Server(('localhost', 8080), authkey=b'none')
server_process.connect()
experience_pool = server_process.Shared_Experience_Pool()
# now do your own thing and call `experience_call.sample()` or `update` whenever you want.
You may then launch one server.py and as many workers as you want.
Is This The Best Design?
Not always. You may run into race conditions in that your learners may receive stale or old data if they are forced to compete with a simulator node writing at the same time.
If you want to ensure a preference for latest writes, you may additionally use a lock whenever your simulators are trying to write something, preventing your other processes from getting a read until the write finishes.
Is there a way to get in Python3 what in C is:
int msgget(key_t key, int flags);
I have to create a game that has two players and the communication is done by message passing.
For this purpose I can create an object multiprocessing.Queue() but I can't seem to find a way to pass this object from player 1 to player 2. Each player is running in its own terminal so they are not in a parent-child relation.
To solve this I would need something like the above function in C that can get a Queue object based on some key or id. Does any simple way of sharing this object between two processes running in different terminal exist?
Your two processes cannot access objects in each other's memory, so they will have to communicate over some kind of connection - most likely a socket connection. A message queue such as the one recommended as a comment, or perhaps ZeroMQ, is perfect for that job, and not hard to set up. Have a look at http://zguide.zeromq.org/py:all.
Allthough using a messaging protocol or messaging service that can do many things may seem like a lot of work and overhead in such cases, it's actually a really simple and intuitive way to handle this.
Basically what I'm trying to do is fetch a couple of websites using proxies and process the data. The problem is that the requests rarely fail in a convincing way, setting socket timeouts wasnt very helpful either because they often didn't work.
So what I did is:
q = Queue()
s = ['google.com','ebay.com',] # And so on
for item in s:
q.put(item)
def worker():
item = q.get()
data = fetch(item) # This is the buggy part
# Process the data, yadayada
for i in range(workers):
t = InterruptableThread(target=worker)
t.start()
# Somewhere else
if WorkerHasLivedLongerThanTimeout:
worker.terminate()
(InterruptableThread class)
The problem is that I only want to kill threads which are still stuck on the fetching part. Also, I want the item to return to the queue. Ie:
def worker():
self.status = 0
item = q.get()
data = fetch(item) # This is the buggy part
self.status = 1 # Don't kill me now, bro!
# Process the data, yadayada
# Somewhere else
if WorkerHasLivedLongerThanTimeout and worker.status != 1:
q.put(worker.item)
worker.terminate()
How can this be done?
edit: breaking news; see below · · · ······
I decided recently that I wanted to do something pretty similar, and what came out of it was the pqueue_fetcher module. It ended up being mainly a learning endeavour: I learned, among other things, that it's almost certainly better to use something like twisted than to try to kill Python threads with any sort of reliability.
That being said, there's code in that module that more or less answers your question. It basically consists of a class whose objects can be set up to get locations from a priority queue and feed them into a fetch function that's supplied at object instantiation. If the location's resources get successfully received before their thread is killed, they get forwarded on to the results queue; otherwise they're returned to the locations queue with a downgraded priority. Success is determined by a passed-in function that defaults to bool.
Along the way I ended up creating the terminable_thread module, which just packages the most mature variation I could find of the code you linked to as InterruptableThread. It also adds a fix for 64-bit machines, which I needed in order to use that code on my ubuntu box. terminable_thread is a dependency of pqueue_fetcher.
Probably the biggest stumbling block I hit is that raising an asynchronous exception as do terminable_thread, and the InterruptableThread you mentioned, can have some weird results. In the test suite for pqueue_fetcher, the fetch function blocks by calling time.sleep. I found that if a thread is terminate()d while so blocking, and the sleep call is the last (or not even the last) statement in a nested try block, execution will actually bounce to the except clause of the outer try block, even if the inner one has an except matching the raised exception. I'm still sort of shaking my head in disbelief, but there's a test case in pqueue_fetcher that reenacts this. I believe "leaky abstraction" is the correct term here.
I wrote a hacky workaround that just does some random thing (in this case getting a value from a generator) to break up the "atomicity" (not sure if that's actually what it is) of that part of the code. This workaround can be overridden via the fission parameter to pqueue_fetcher.Fetcher. It (i.e. the default one) seems to work, but certainly not in any way that I would consider particularly reliable or portable.
So my call after discovering this interesting piece of data was to heretofore avoid using this technique (i.e. calling ctypes.pythonapi.PyThreadState_SetAsyncExc) altogether.
In any case, this still won't work if you need to guarantee that any request whose entire data set has been received (and i.e. acknowledged to the server) gets forwarded on to results. In order to be sure of that, you have to guarantee that the bit that does that last network transaction and the forwarding is guarded from being interrupted, without guarding the entire retrieval operation from being interrupted (since this would prevent timeouts from working..). And in order to do that you need to basically rewrite the retrieval operation (i.e. the socket code) to be aware of whichever exception you're going to raise with terminable_thread.Thread.raise_exc.
I've yet to learn twisted, but being the Premier Python Asynchronous Networking Framework©™®, I expect it must have some elegant or at least workable way of dealing with such details. I'm hoping it provides a parallel way to implement fetching from non-network sources (e.g. a local filestore, or a DB, or an etc.), since I'd like to build an app that can glean data from a variety of sources in a medium-agnostic way.
Anyhow, if you're still intent on trying to work out a way to manage the threads yourself, you can perhaps learn from my efforts. Hope this helps.
· · · · ······ this just in:
I've realized that the tests that I thought had stabilized have actually not, and are giving inconsistent results. This appears to be related to the issues mentioned above with exception handling and the use of the fission function. I'm not really sure what's going on with it, and don't plan to investigate in the immediate future unless I end up having a need to actually do things this way.
This question is related to others I have asked on here, mainly regarding sorting huge sets of data in memory.
Basically this is what I want / have:
Twisted XMLRPC server running. This server keeps several (32) instances of Foo class in memory. Each Foo class contains a list bar (which will contain several million records). There is a service that retrieves data from a database, and passes it to the XMLRPC server. The data is basically a dictionary, with keys corresponding to each Foo instance, and values are a list of dictionaries, like so:
data = {'foo1':[{'k1':'v1', 'k2':'v2'}, {'k1':'v1', 'k2':'v2'}], 'foo2':...}
Each Foo instance is then passed the value corresponding to it's key, and the Foo.bar dictionaries are updated and sorted.
class XMLRPCController(xmlrpc.XMLRPC):
def __init__(self):
...
self.foos = {'foo1':Foo(), 'foo2':Foo(), 'foo3':Foo()}
...
def update(self, data):
for k, v in data:
threads.deferToThread(self.foos[k].processData, v)
def getData(self, fookey):
# return first 10 records of specified Foo.bar
return self.foos[fookey].bar[0:10]
class Foo():
def __init__(self):
bar = []
def processData(self, new_bar_data):
for record in new_bar_data:
# do processing, and add record, then sort
# BUNCH OF PROCESSING CODE
self.bar.sort(reverse=True)
The problem is that when the update function is called in the XMLRPCController with a lot of records (say 100K +) it stops responding to my getData calls until all 32 Foo instances have completed the process_data method. I thought deferToThread would work, but I think I am misunderstanding where the problem is.
Any suggestions... I am open to using something else, like Cherrypy if it supports this required behavior.
EDIT
#Troy: This is how the reactor is set up
reactor.listenTCP(port_no, server.Site(XMLRPCController)
reactor.run()
As far as GIL, would it be a viable option to change
sys.setcheckinterval()
value to something smaller, so the lock on the data is released so it can be read?
The easiest way to get the app to be responsive is to break up the CPU-intensive processing in smaller chunks, while letting the twisted reactor run in between. For example by calling reactor.callLater(0, process_next_chunk) to advance to next chunk. Effectively implementing cooperative multitasking by yourself.
Another way would be to use separate processes to do the work, then you will benefit from multiple cores. Take a look at Ampoule: https://launchpad.net/ampoule It provides an API similar to deferToThread.
I don't know how long your processData method runs nor how you're setting up your twisted reactor. By default, the twisted reactor has a thread pool of between 0 and 10 threads. You may be trying to defer as many as 32 long-running calculations to as many as 10 threads. This is sub-optimal.
You also need to ask what role the GIL is playing in updating all these collections.
Edit:
Before you make any serious changes to your program (like calling sys.setcheckinterval()) you should probably run it using the profiler or the python trace module. These should tell you what methods are using all your time. Without the right information, you can't make the right changes.