This question is related to others I have asked on here, mainly regarding sorting huge sets of data in memory.
Basically this is what I want / have:
Twisted XMLRPC server running. This server keeps several (32) instances of Foo class in memory. Each Foo class contains a list bar (which will contain several million records). There is a service that retrieves data from a database, and passes it to the XMLRPC server. The data is basically a dictionary, with keys corresponding to each Foo instance, and values are a list of dictionaries, like so:
data = {'foo1':[{'k1':'v1', 'k2':'v2'}, {'k1':'v1', 'k2':'v2'}], 'foo2':...}
Each Foo instance is then passed the value corresponding to it's key, and the Foo.bar dictionaries are updated and sorted.
class XMLRPCController(xmlrpc.XMLRPC):
def __init__(self):
...
self.foos = {'foo1':Foo(), 'foo2':Foo(), 'foo3':Foo()}
...
def update(self, data):
for k, v in data:
threads.deferToThread(self.foos[k].processData, v)
def getData(self, fookey):
# return first 10 records of specified Foo.bar
return self.foos[fookey].bar[0:10]
class Foo():
def __init__(self):
bar = []
def processData(self, new_bar_data):
for record in new_bar_data:
# do processing, and add record, then sort
# BUNCH OF PROCESSING CODE
self.bar.sort(reverse=True)
The problem is that when the update function is called in the XMLRPCController with a lot of records (say 100K +) it stops responding to my getData calls until all 32 Foo instances have completed the process_data method. I thought deferToThread would work, but I think I am misunderstanding where the problem is.
Any suggestions... I am open to using something else, like Cherrypy if it supports this required behavior.
EDIT
#Troy: This is how the reactor is set up
reactor.listenTCP(port_no, server.Site(XMLRPCController)
reactor.run()
As far as GIL, would it be a viable option to change
sys.setcheckinterval()
value to something smaller, so the lock on the data is released so it can be read?
The easiest way to get the app to be responsive is to break up the CPU-intensive processing in smaller chunks, while letting the twisted reactor run in between. For example by calling reactor.callLater(0, process_next_chunk) to advance to next chunk. Effectively implementing cooperative multitasking by yourself.
Another way would be to use separate processes to do the work, then you will benefit from multiple cores. Take a look at Ampoule: https://launchpad.net/ampoule It provides an API similar to deferToThread.
I don't know how long your processData method runs nor how you're setting up your twisted reactor. By default, the twisted reactor has a thread pool of between 0 and 10 threads. You may be trying to defer as many as 32 long-running calculations to as many as 10 threads. This is sub-optimal.
You also need to ask what role the GIL is playing in updating all these collections.
Edit:
Before you make any serious changes to your program (like calling sys.setcheckinterval()) you should probably run it using the profiler or the python trace module. These should tell you what methods are using all your time. Without the right information, you can't make the right changes.
Related
I'm writing a machine learning program with the following components:
A shared "Experience Pool" with a binary-tree-like data structure.
N simulator processes. Each adds an "experience object" to the pool every once in a while. The pool is responsible for balancing its tree.
M learner processes that sample a batch of "experience objects" from the pool every few moments and perform whatever learning procedure.
I don't know what's the best way to implement the above. I'm not using Tensorflow, so I cannot take advantage of its parallel capability. More concretely,
I first think of Python3's built-in multiprocessing library. Unlike multithreading, however, multiprocessing module cannot have different processes update the same global object. My hunch is that I should use the server-proxy model. Could anyone please give me a rough skeleton code to start with?
Is MPI4py a better solution?
Any other libraries that would be a better fit? I've looked at celery, disque, etc. It's not obvious to me how to adapt them to my use case.
Based on the comments, what you're really looking for is a way to update a shared object from a set of processes that are carrying out a CPU-bound task. The CPU-bounding makes multiprocessing an obvious choice - if most of your work was IO-bound, multithreading would have been a simpler choice.
Your problem follows a simpler server-client model: the clients use the server as a simple stateful store, no communication between any child processes is needed, and no process needs to be synchronised.
Thus, the simplest way to do this is to:
Start a separate process that contains a server.
Inside the server logic, provide methods to update and read from a single object.
Treat both your simulator and learner processes as separate clients that can periodically read and update the global state.
From the server's perspective, the identity of the clients doesn't matter - only their actions do.
Thus, this can be accomplished by using a customised manager in multiprocessing as so:
# server.py
from multiprocessing.managers import BaseManager
# this represents the data structure you've already implemented.
from ... import ExperienceTree
# An important note: the way proxy objects work is by shared weak reference to
# the object. If all of your workers die, it takes your proxy object with
# it. Thus, if you have an instance, the instance is garbage-collected
# once all references to it have been erased. I have chosen to sidestep
# this in my code by using class variables and objects so that instances
# are never used - you may define __init__, etc. if you so wish, but
# just be aware of what will happen to your object once all workers are gone.
class ExperiencePool(object):
tree = ExperienceTree()
#classmethod
def update(cls, experience_object):
''' Implement methods to update the tree with an experience object. '''
cls.tree.update(experience_object)
#classmethod
def sample(cls):
''' Implement methods to sample the tree's experience objects. '''
return cls.tree.sample()
# subclass base manager
class Server(BaseManager):
pass
# register the class you just created - now you can access an instance of
# ExperiencePool using Server.Shared_Experience_Pool().
Server.register('Shared_Experience_Pool', ExperiencePool)
if __name__ == '__main__':
# run the server on port 8080 of your own machine
with Server(('localhost', 8080), authkey=b'none') as server_process:
server_process.get_server().serve_forever()
Now for all of your clients you can just do:
# client.py - you can always have a separate client file for a learner and a simulator.
from multiprocessing.managers import BaseManager
from server import ExperiencePool
class Server(BaseManager):
pass
Server.register('Shared_Experience_Pool', ExperiencePool)
if __name__ == '__main__':
# run the server on port 8080 of your own machine forever.
server_process = Server(('localhost', 8080), authkey=b'none')
server_process.connect()
experience_pool = server_process.Shared_Experience_Pool()
# now do your own thing and call `experience_call.sample()` or `update` whenever you want.
You may then launch one server.py and as many workers as you want.
Is This The Best Design?
Not always. You may run into race conditions in that your learners may receive stale or old data if they are forced to compete with a simulator node writing at the same time.
If you want to ensure a preference for latest writes, you may additionally use a lock whenever your simulators are trying to write something, preventing your other processes from getting a read until the write finishes.
A bit of background:
I am writing a function in Django to get the next invoice number, which needs to be sequential (not gaps), so the function looks like this:
def get_next_invoice_number():
"""
Returns the max(invoice_number) + 1 from the payment records
Does NOT pre-allocate number
"""
# TODO ensure this is thread safe
max_num = Payment.objects.aggregate(Max('invoice_number'))['invoice_number__max']
if max_num is not None:
return max_num + 1
return PaymentConfig.min_invoice_number
Now the problem is, this function only returns the max()+1, in my production environment I have multiple Django processes, so if this function is called twice for 2 different payments (before the first record saved), they will get the same invoice number.
To mitigate this problem I can override the save() function to call the get_next_invoice_number() to minimise the time gap between these function calls, but there is still a very tiny chance for problem to happen.
So I want to implement a lock in the approve method, something like
from multiprocessing import Lock
lock = Lock()
class Payment(models.Model):
def approve(self):
lock.acquire()
try:
self.invoice_number = get_next_invoice_number()
self.save()
except:
pass
finally:
lock.release()
So my questions are:
Does this look okay?
The lock is for multiprocess, how about threads?
UPDATE:
As my colleague pointed out, this is not going to work when it's deployed to multiple servers, the locks will be meaningless.
Looks like DB transaction locking is the way to go.
The easiest way to do this, by far, is with your database's existing tools for creating sequences. In fact, if you don't mind the value starting from 1 you can just use Django's AutoField.
If your business requirements are such that you need to choose a starting number, you'll have to see how to do this in the database. Here are some questions that might help.
Trying to ensure this with locks or transactions will be harder to do and slower to perform.
I face a potential race condition in a web application:
# get the submissions so far from the cache
submissions = cache.get('user_data')
# add the data from this user to the local dict
submissions[user_id] = submission
# update the cached dict on server
submissions = cache.update('user_data', submissions)
if len(submissions) == some_number:
...
The logic is simple, we first fetch a shared dictionary stored in the cache of web server, add the submission (delivered by each request to the server) to its local copy, and then we update the cached copy by replacing it with this updated local copy. Finally we do something else if we have received a certain number of pieces of data. Notice that
submissions = cache.update('user_data', submissions)
will return the latest copy of dictionary from the cache, i.e. the newly updated one.
Because the server may serve multiple requests (each in its own thread) at the same time, and all these threads access the shared dictionary in cache as described above, thus creating potential race conditions.
I wonder, in the context of web programming, how should I efficiently handle threading to prevent race conditions in this particular case, without sacrificing too much performance. Some code examples would be much appreciated.
My preferred solution would be to have a single thread that modifies the submissions dict and a queue that feed that thread. If you are paranoid, you can even expose a read-only view on the submissions dict. Using a queue and consumer pattern, you will not have a problem with locking.
Of course, this assumes that you have a web framework that will let you create that thread.
EDIT: multiprocess was not a good suggestion; removed.
EDIT: This sort of stuff is really simple in Python:
import threading, Queue
Stop = object()
def consumer(real_dict, queue):
while True:
try:
item = queue.get(timeout=100)
if item == Stop:
break
user, submission = item
real_dict[user] = submission
except Queue.Empty:
continue
q = Queue.Queue()
thedict={}
t = threading.Thread(target=consumer, args=(thedict,q,))
t.start()
Then, you can try:
>>> thedict
{}
>>> q.put(('foo', 'bar'))
>>> thedict
{'foo': 'bar'}
>>> q.put(Stop)
>>> q.put(('baz', 'bar'))
>>> thedict
{'foo': 'bar'}
You appear to be transferring lots of data back and forth between your web application and your cache. That's already a problem. You're also right to be suspicious, since it would be possible for the pattern to be like this (remembering that sub is local to each thread):
Thread A Thread B Cache
--------------------------------------------
[A]=P, [B]=Q
sub = get()
[A]=P, [B]=Q
>>>> suspend
sub = get()
[A]=P, [B]=Q
sub[B] = Y
[A]=P, [B]=Y
update(sub)
[A]=P, [B]=Y
>>>> suspend
sub[A] = X
[A]=X, [B]=Q
update(sub)
[A]=X, [B]=Q !!!!!!!!
This sort of pattern can happen for real, and it results in state getting wiped out. It's also inefficient because thread A should usually only need to know about its current user, not everything.
While you could fix this by great big gobs of locking, that would be horribly inefficient. So, you need to redesign so that you transfer much less data around, which will give a performance boost and reduce the amount of locking you need.
This is one of the more difficult questions to answer because it seems to be a bigger design problem.
One potential solution to this problem would be to have one well-defined place where this is updated. For instance, you might want to set up another service that's dedicated to updating the cache and nothing else. Alternatively, if these updates aren't time-sensitive, you may also want to consider using a task queue.
Another solution: you could give each item a separate key and store a list of the keys under a separate key. This doesn't necessarily solve the problem, but it does make it more manageable. Instead of worrying about separate threads overwriting the entire submissions cache, you just have to worry about them overwriting individual elements within it.
If you have the time to add a new piece to your infrastructure, I'd highly recommend looking at Redis, more specifically Redis hashes[1]. The reason being that Redis handles this problem out of the box, with about the same speed as you'd get with memcache (although I definitely encourage you to benchmark it for yourself).
[1] Note: I just found this link through a quick Google search, and haven't verified it. I don't vouch for its correctness.
I have a file called filterspecs.py that contains 3 things:
tls = threading.local()
class A which inherits from django.contrib.admin.views.main.ChangeList
class B which inherits from django.contrib.admin.filterspecs.FilterSpec
Goal: I want to pass a value, a list, available to an instance of A to an instance of B. Since the lifecycle of A and B are managed by the framework (Django), I cannot think of private way to pass the data between the instances (Using a Queue would be an overkill).
Attempt #1 fails when using WSGI (daemon) mode.
In class A, the list is added to the threadlocal.
1. tls.list_display = ['foo', 'bar']
But in class B, the following returns False:
2. hasattr(tls, 'list_display')
Just for comparison sake, this works outside of apache/mod_wsgi if I run it via
manage.py runserver
I looked at the logs and discovered that different threads were executing lines 1 & 2.
What other ways can I solve this problem ?
It sounds like you want to share data between not just two classes, but between two completely different HTTP requests. HTTP is stateless -- Apache is not designed to accommodate stateful web applications. You must save the data to a database and read it back in the second request. (Or you can run your web application in a separate process and manage the state yourself -- which essentially adds "database programmer" to your job description. This is not recommended.)
Your two pieces of code might not just be running in different threads, they might be running in different processes.
I am using Python 2.6 and the multiprocessing module for multi-threading. Now I would like to have a synchronized dict (where the only atomic operation I really need is the += operator on a value).
Should I wrap the dict with a multiprocessing.sharedctypes.synchronized() call? Or is another way the way to go?
Intro
There seems to be a lot of arm-chair suggestions and no working examples. None of the answers listed here even suggest using multiprocessing and this is quite a bit disappointing and disturbing. As python lovers we should support our built-in libraries, and while parallel processing and synchronization is never a trivial matter, I believe it can be made trivial with proper design. This is becoming extremely important in modern multi-core architectures and cannot be stressed enough! That said, I am far from satisfied with the multiprocessing library, as it is still in its infancy stages with quite a few pitfalls, bugs, and being geared towards functional programming (which I detest). Currently I still prefer the Pyro module (which is way ahead of its time) over multiprocessing due to multiprocessing's severe limitation in being unable to share newly created objects while the server is running. The "register" class-method of the manager objects will only actually register an object BEFORE the manager (or its server) is started. Enough chatter, more code:
Server.py
from multiprocessing.managers import SyncManager
class MyManager(SyncManager):
pass
syncdict = {}
def get_dict():
return syncdict
if __name__ == "__main__":
MyManager.register("syncdict", get_dict)
manager = MyManager(("127.0.0.1", 5000), authkey="password")
manager.start()
raw_input("Press any key to kill server".center(50, "-"))
manager.shutdown()
In the above code example, Server.py makes use of multiprocessing's SyncManager which can supply synchronized shared objects. This code will not work running in the interpreter because the multiprocessing library is quite touchy on how to find the "callable" for each registered object. Running Server.py will start a customized SyncManager that shares the syncdict dictionary for use of multiple processes and can be connected to clients either on the same machine, or if run on an IP address other than loopback, other machines. In this case the server is run on loopback (127.0.0.1) on port 5000. Using the authkey parameter uses secure connections when manipulating syncdict. When any key is pressed the manager is shutdown.
Client.py
from multiprocessing.managers import SyncManager
import sys, time
class MyManager(SyncManager):
pass
MyManager.register("syncdict")
if __name__ == "__main__":
manager = MyManager(("127.0.0.1", 5000), authkey="password")
manager.connect()
syncdict = manager.syncdict()
print "dict = %s" % (dir(syncdict))
key = raw_input("Enter key to update: ")
inc = float(raw_input("Enter increment: "))
sleep = float(raw_input("Enter sleep time (sec): "))
try:
#if the key doesn't exist create it
if not syncdict.has_key(key):
syncdict.update([(key, 0)])
#increment key value every sleep seconds
#then print syncdict
while True:
syncdict.update([(key, syncdict.get(key) + inc)])
time.sleep(sleep)
print "%s" % (syncdict)
except KeyboardInterrupt:
print "Killed client"
The client must also create a customized SyncManager, registering "syncdict", this time without passing in a callable to retrieve the shared dict. It then uses the customized SycnManager to connect using the loopback IP address (127.0.0.1) on port 5000 and an authkey establishing a secure connection to the manager started in Server.py. It retrieves the shared dict syncdict by calling the registered callable on the manager. It prompts the user for the following:
The key in syncdict to operate on
The amount to increment the value accessed by the key every cycle
The amount of time to sleep per cycle in seconds
The client then checks to see if the key exists. If it doesn't it creates the key on the syncdict. The client then enters an "endless" loop where it updates the key's value by the increment, sleeps the amount specified, and prints the syncdict only to repeat this process until a KeyboardInterrupt occurs (Ctrl+C).
Annoying problems
The Manager's register methods MUST be called before the manager is started otherwise you will get exceptions even though a dir call on the Manager will reveal that it indeed does have the method that was registered.
All manipulations of the dict must be done with methods and not dict assignments (syncdict["blast"] = 2 will fail miserably because of the way multiprocessing shares custom objects)
Using SyncManager's dict method would alleviate annoying problem #2 except that annoying problem #1 prevents the proxy returned by SyncManager.dict() being registered and shared. (SyncManager.dict() can only be called AFTER the manager is started, and register will only work BEFORE the manager is started so SyncManager.dict() is only useful when doing functional programming and passing the proxy to Processes as an argument like the doc examples do)
The server AND the client both have to register even though intuitively it would seem like the client would just be able to figure it out after connecting to the manager (Please add this to your wish-list multiprocessing developers)
Closing
I hope you enjoyed this quite thorough and slightly time-consuming answer as much as I have. I was having a great deal of trouble getting straight in my mind why I was struggling so much with the multiprocessing module where Pyro makes it a breeze and now thanks to this answer I have hit the nail on the head. I hope this is useful to the python community on how to improve the multiprocessing module as I do believe it has a great deal of promise but in its infancy falls short of what is possible. Despite the annoying problems described I think this is still quite a viable alternative and is pretty simple. You could also use SyncManager.dict() and pass it to Processes as an argument the way the docs show and it would probably be an even simpler solution depending on your requirements it just feels unnatural to me.
I would dedicate a separate process to maintaining the "shared dict": just use e.g. xmlrpclib to make that tiny amount of code available to the other processes, exposing via xmlrpclib e.g. a function taking key, increment to perform the increment and one taking just the key and returning the value, with semantic details (is there a default value for missing keys, etc, etc) depending on your app's needs.
Then you can use any approach you like to implement the shared-dict dedicated process: all the way from a single-threaded server with a simple dict in memory, to a simple sqlite DB, etc, etc. I suggest you start with code "as simple as you can get away with" (depending on whether you need a persistent shared dict, or persistence is not necessary to you), then measure and optimize as and if needed.
In response to an appropriate solution to the concurrent-write issue. I did very quick research and found that this article is suggesting a lock/semaphore solution. (http://effbot.org/zone/thread-synchronization.htm)
While the example isn't specificity on a dictionary, I'm pretty sure you could code a class-based wrapper object to help you work with dictionaries based on this idea.
If I had a requirement to implement something like this in a thread safe manner, I'd probably use the Python Semaphore solution. (Assuming my earlier merge technique wouldn't work.) I believe that semaphores generally slow down thread efficiencies due to their blocking nature.
From the site:
A semaphore is a more advanced lock mechanism. A semaphore has an internal counter rather than a lock flag, and it only blocks if more than a given number of threads have attempted to hold the semaphore. Depending on how the semaphore is initialized, this allows multiple threads to access the same code section simultaneously.
semaphore = threading.BoundedSemaphore()
semaphore.acquire() # decrements the counter
... access the shared resource; work with dictionary, add item or whatever.
semaphore.release() # increments the counter
Is there a reason that the dictionary needs to be shared in the first place? Could you have each thread maintain their own instance of a dictionary and either merge at the end of the thread processing or periodically use a call-back to merge copies of the individual thread dictionaries together?
I don't know exactly what you are doing, so keep in my that my written plan may not work verbatim. What I'm suggesting is more of a high-level design idea.