How to create a multiprocessing queue by key in Python3

How to create a multiprocessing queue by key in Python3 - python

Is there a way to get in Python3 what in C is:
int msgget(key_t key, int flags);
I have to create a game that has two players and the communication is done by message passing.
For this purpose I can create an object multiprocessing.Queue() but I can't seem to find a way to pass this object from player 1 to player 2. Each player is running in its own terminal so they are not in a parent-child relation.
To solve this I would need something like the above function in C that can get a Queue object based on some key or id. Does any simple way of sharing this object between two processes running in different terminal exist?

Your two processes cannot access objects in each other's memory, so they will have to communicate over some kind of connection - most likely a socket connection. A message queue such as the one recommended as a comment, or perhaps ZeroMQ, is perfect for that job, and not hard to set up. Have a look at http://zguide.zeromq.org/py:all.
Allthough using a messaging protocol or messaging service that can do many things may seem like a lot of work and overhead in such cases, it's actually a really simple and intuitive way to handle this.

Related

Python multi-processing one worker dynimc number of recievers of all worker data (1:n)

I am planing to setup a small proxy service for a remote sensor, that only accepts one connection. I have a temporary solution and I am now designing a more robust version, and therefore dived deeper into the python multiprocessing module.
I have written a couple of systems in python using a main process, which spawns subprocesses using the multiprocessing module and used multiprocessing.Queue to communicate between them. This works quite well and some of theses programs/scripts are doing their job in a production environment.
The new case is slightly different since it uses 2+n processes:
One data-collector, that reads data from the sensor (at 100Hz) and every once in a while receives short ASCII strings as command
One main-server, that binds to a socket and listens, for new connections and spawns...
n child-servers, that handle clients who want to have the sensor data
while communication from the child servers to the data collector seems pretty straight forward using a multiprocessing.Queue which manages a n:1 connection well enough, I have problems with the other way. I can't use a queue for that as well, because all child-servers need to get all the data the sensor produces, while they are active. At least I haven't found a way to configure a Queue to mimic that behaviour, as get takes the top most out of the Queue by design.
I looked into shared memory already, which massively increases the management overhead, since as far as I understand it while using it, I would basically need to implement a streaming buffer myself.
The only safe way I see right now, is using a redis server and messages queues, but I am a bit hesitant, since that would need more infrastructure than I like.
Is there a pure python internal way?

maybe You can use MQTT for that ?
You did not clearly specify, but sounds like observer pattern -
or do You want the clients to poll each time they need data ?
It depends which delays / data rate / jitter etc. You can accept.
after You provided the information :
The whole setup runs on one machine in one process space. What I would like to have, is a way without going through a third party process
I would suggest to check for observer pattern.
More informations can be found for example:
https://www.youtube.com/watch?v=_BpmfnqjgzQ&t=1882s
and
https://refactoring.guru/design-patterns/observer/python/example
and
https://www.protechtraining.com/blog/post/tutorial-the-observer-pattern-in-python-879
and
https://python-3-patterns-idioms-test.readthedocs.io/en/latest/Observer.html
Your Server should fork for each new connection and register with the observer, and will be therefore informed about every change.

How to share an object between processes?

I'm working on a program that has 2-3 different 'processes' or 'workflows' that all need to execute at once (not necessarily in parallel, they just all need to run at the same time). These processes run and function on their own independently to perform different but related tasks, but they also need to share a few objects (lists and vars).
I'm wondering what's the best way to implement this? To give some basic background info and context, this is a program to integrate an sms messaging platform with our CMS. The 3 modules are sender, receiver, and a module that reads and queues new outbound jobs called jobHandler.
I got each module working fine, but my problem lies in tying them all together. I implemented multiprocessing but found trouble when I realized I would have to share objects across both different modules and different processes, which simply did not work.
Is there a better way to do this that I'm completely missing? How would you implement something like this?
This is python 3.6.
Simplified example (all modules started up using multiprocessing),
jobHandler:
import pyodbc
q=[]
def queueJobs():
~do stuff~
x=new job
q.append(x)
sender:
from jobHandler import q
def send():
~do stuff~
print(q)
q will always show as "[]" when printed from sender. I realize this is because when the modules are all started with multiprocessing, q is empty. How can I share the current state of objects across different processes?

Python-C api concurrency issue

We are developing a small c server application. The server application does some data processing and responds back to the client. To keep the data processing part configurable and flexible we decided to go for scripting and based on the availability of various ready modules we decided to go for Python. We are using the Python-C api to send/receive the data between c and python.
The Algorithm works something like this:-
Server receives some data from client, this data is stored in a dictionary created in c. The dictionary is created using the api function PyDict_New(); from c. The input is stored as a key value pair in the dictionary using the api function PyDict_SetItemString();
Next, we execute the python script PyRun_SimpleString(); passing the script as a parameter. This script makes use of the dictionary created in c. Please note, we make the dictionary created in c, accessible to the script using the methods PyImport_AddModule(); and PyModule_AddObject();
We store the result of the data processing in the script as a key value pair in the same dictionary created above. The c code can then simply access the result variable(key-value pair) after the script has executed.
The problem
The problem we are facing is in the case of concurrent requests coming in from different clients. When multiple requests come in from different clients we tend to object reference count exceptions. Please note, that for each request which comes in for a user, we create an independent dictionary for that user alone. To overcome this problem we encompassed the call to PyRun_SimpleString(); within PyEval_AcquireLock(); and PyEval_ReleaseLock();, but doing this has resulted in the script execution being a blocking call. So if a script is taking long time to execute, all the other users are also waiting for a response.
Could you please suggest the best possible approach or give pointers to where we are going wrong. Please ping me for more information.
Any help/guidance will be appreciated.

Perhaps you are missing one of the calls mentioned in this answer.

I suggest you investigate the multiprocessing module.

You should probably read http://docs.python.org/c-api/init.html#thread-state-and-the-global-interpreter-lock Your problem is explained in the first paragraph.
When you acquire the GIL, do so around your direct manipulation of Python objects. The call to PyRun_SimpleString will handle the GIL internally and will give it up on long-running operations or just every X instructions. It WILL NOT be truly multi-threaded, however.
Edit:
You need to acquire the lock and you need to ensure that Python knows it's in a different thread state:
// acquire the lock and switch thread state
PyEval_AcquireLock();
PyThreadState_Swap(perThreadState);
// execute some python code
PyEval_SimpleString("print 123");
// clear the thread state and release the lock
PyThreadState_Swap(NULL);
PyEval_ReleaseLock();

A good persistent synchronous queue in python

I don't immediately care about fifo or filo options, but it might be nice in the future..
What I'm looking for a is a nice fast simple way to store (at most a gig of data or tens of millions of entries) on disk that can be get and put by multiple processes. The entries are just simple 40 byte strings, not python objects. Don't really need all the functionality of shelve.
I've seen this http://code.activestate.com/lists/python-list/310105/
It looks simple. It needs to be upgraded to the new Queue version.
Wondering if there's something better? I'm concerned that in the event of a power interruption, the entire pickled file becomes corrupt instead of just one record.

Try using Celery. It's not pure python, as it uses RabbitMQ as a backend, but it's reliable, persistent and distributed, and, all in all, far better then using files or database in the long run.

I think that PyBSDDB is what you want. You can choose a queue as the access type. PyBSDDB is a Python module based on Oracle Berkeley DB.
It has synchronous access and can be accessed from different processes although I don't know if that is possible from the Python bindings. About multiple processes writing to the db I found this thread.

This is a very old question, but persist-queue seems to be a nice tool for this kind of task.
persist-queue implements a file-based queue and a serial of
sqlite3-based queues. The goals is to achieve following requirements:
Disk-based: each queued item should be stored in disk in case of any crash.
Thread-safe: can be used by multi-threaded producers and multi-threaded consumers.
Recoverable: Items can be read after process restart.
Green-compatible: can be used in greenlet or eventlet environment.
By default, persist-queue use pickle object serialization module to
support object instances. Most built-in type, like int, dict, list are
able to be persisted by persist-queue directly, to support customized
objects, please refer to Pickling and unpickling extension
types(Python2) and Pickling Class Instances(Python3)

Using files is not working?...
Use a journaling file system to recover from power interruptions. That's their purpose.

How to synchronize a python dict with multiprocessing

I am using Python 2.6 and the multiprocessing module for multi-threading. Now I would like to have a synchronized dict (where the only atomic operation I really need is the += operator on a value).
Should I wrap the dict with a multiprocessing.sharedctypes.synchronized() call? Or is another way the way to go?

Intro
There seems to be a lot of arm-chair suggestions and no working examples. None of the answers listed here even suggest using multiprocessing and this is quite a bit disappointing and disturbing. As python lovers we should support our built-in libraries, and while parallel processing and synchronization is never a trivial matter, I believe it can be made trivial with proper design. This is becoming extremely important in modern multi-core architectures and cannot be stressed enough! That said, I am far from satisfied with the multiprocessing library, as it is still in its infancy stages with quite a few pitfalls, bugs, and being geared towards functional programming (which I detest). Currently I still prefer the Pyro module (which is way ahead of its time) over multiprocessing due to multiprocessing's severe limitation in being unable to share newly created objects while the server is running. The "register" class-method of the manager objects will only actually register an object BEFORE the manager (or its server) is started. Enough chatter, more code:
Server.py
from multiprocessing.managers import SyncManager
class MyManager(SyncManager):
pass
syncdict = {}
def get_dict():
return syncdict
if __name__ == "__main__":
MyManager.register("syncdict", get_dict)
manager = MyManager(("127.0.0.1", 5000), authkey="password")
manager.start()
raw_input("Press any key to kill server".center(50, "-"))
manager.shutdown()
In the above code example, Server.py makes use of multiprocessing's SyncManager which can supply synchronized shared objects. This code will not work running in the interpreter because the multiprocessing library is quite touchy on how to find the "callable" for each registered object. Running Server.py will start a customized SyncManager that shares the syncdict dictionary for use of multiple processes and can be connected to clients either on the same machine, or if run on an IP address other than loopback, other machines. In this case the server is run on loopback (127.0.0.1) on port 5000. Using the authkey parameter uses secure connections when manipulating syncdict. When any key is pressed the manager is shutdown.
Client.py
from multiprocessing.managers import SyncManager
import sys, time
class MyManager(SyncManager):
pass
MyManager.register("syncdict")
if __name__ == "__main__":
manager = MyManager(("127.0.0.1", 5000), authkey="password")
manager.connect()
syncdict = manager.syncdict()
print "dict = %s" % (dir(syncdict))
key = raw_input("Enter key to update: ")
inc = float(raw_input("Enter increment: "))
sleep = float(raw_input("Enter sleep time (sec): "))
try:
#if the key doesn't exist create it
if not syncdict.has_key(key):
syncdict.update([(key, 0)])
#increment key value every sleep seconds
#then print syncdict
while True:
syncdict.update([(key, syncdict.get(key) + inc)])
time.sleep(sleep)
print "%s" % (syncdict)
except KeyboardInterrupt:
print "Killed client"
The client must also create a customized SyncManager, registering "syncdict", this time without passing in a callable to retrieve the shared dict. It then uses the customized SycnManager to connect using the loopback IP address (127.0.0.1) on port 5000 and an authkey establishing a secure connection to the manager started in Server.py. It retrieves the shared dict syncdict by calling the registered callable on the manager. It prompts the user for the following:
The key in syncdict to operate on
The amount to increment the value accessed by the key every cycle
The amount of time to sleep per cycle in seconds
The client then checks to see if the key exists. If it doesn't it creates the key on the syncdict. The client then enters an "endless" loop where it updates the key's value by the increment, sleeps the amount specified, and prints the syncdict only to repeat this process until a KeyboardInterrupt occurs (Ctrl+C).
Annoying problems
The Manager's register methods MUST be called before the manager is started otherwise you will get exceptions even though a dir call on the Manager will reveal that it indeed does have the method that was registered.
All manipulations of the dict must be done with methods and not dict assignments (syncdict["blast"] = 2 will fail miserably because of the way multiprocessing shares custom objects)
Using SyncManager's dict method would alleviate annoying problem #2 except that annoying problem #1 prevents the proxy returned by SyncManager.dict() being registered and shared. (SyncManager.dict() can only be called AFTER the manager is started, and register will only work BEFORE the manager is started so SyncManager.dict() is only useful when doing functional programming and passing the proxy to Processes as an argument like the doc examples do)
The server AND the client both have to register even though intuitively it would seem like the client would just be able to figure it out after connecting to the manager (Please add this to your wish-list multiprocessing developers)
Closing
I hope you enjoyed this quite thorough and slightly time-consuming answer as much as I have. I was having a great deal of trouble getting straight in my mind why I was struggling so much with the multiprocessing module where Pyro makes it a breeze and now thanks to this answer I have hit the nail on the head. I hope this is useful to the python community on how to improve the multiprocessing module as I do believe it has a great deal of promise but in its infancy falls short of what is possible. Despite the annoying problems described I think this is still quite a viable alternative and is pretty simple. You could also use SyncManager.dict() and pass it to Processes as an argument the way the docs show and it would probably be an even simpler solution depending on your requirements it just feels unnatural to me.

I would dedicate a separate process to maintaining the "shared dict": just use e.g. xmlrpclib to make that tiny amount of code available to the other processes, exposing via xmlrpclib e.g. a function taking key, increment to perform the increment and one taking just the key and returning the value, with semantic details (is there a default value for missing keys, etc, etc) depending on your app's needs.
Then you can use any approach you like to implement the shared-dict dedicated process: all the way from a single-threaded server with a simple dict in memory, to a simple sqlite DB, etc, etc. I suggest you start with code "as simple as you can get away with" (depending on whether you need a persistent shared dict, or persistence is not necessary to you), then measure and optimize as and if needed.

In response to an appropriate solution to the concurrent-write issue. I did very quick research and found that this article is suggesting a lock/semaphore solution. (http://effbot.org/zone/thread-synchronization.htm)
While the example isn't specificity on a dictionary, I'm pretty sure you could code a class-based wrapper object to help you work with dictionaries based on this idea.
If I had a requirement to implement something like this in a thread safe manner, I'd probably use the Python Semaphore solution. (Assuming my earlier merge technique wouldn't work.) I believe that semaphores generally slow down thread efficiencies due to their blocking nature.
From the site:
A semaphore is a more advanced lock mechanism. A semaphore has an internal counter rather than a lock flag, and it only blocks if more than a given number of threads have attempted to hold the semaphore. Depending on how the semaphore is initialized, this allows multiple threads to access the same code section simultaneously.
semaphore = threading.BoundedSemaphore()
semaphore.acquire() # decrements the counter
... access the shared resource; work with dictionary, add item or whatever.
semaphore.release() # increments the counter

Is there a reason that the dictionary needs to be shared in the first place? Could you have each thread maintain their own instance of a dictionary and either merge at the end of the thread processing or periodically use a call-back to merge copies of the individual thread dictionaries together?
I don't know exactly what you are doing, so keep in my that my written plan may not work verbatim. What I'm suggesting is more of a high-level design idea.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.