Python inter-process communication - python

I'm trying to implement a python script that can send serial data generated by two different threads. Now I was able to set the threads running, I would like to create a shared memory with semaphores between the manager process and the sub threads. The code of the manager is the following
import DetectorService
import HearingService
if __name__=='__main__':
global shm01
t1 = Thread(target = DetectorService.start)
t2 = Thread(target = HearingService.start)
t1.setDaemon(True)
t2.setDaemon(True)
t1.start()
t2.start()
while True:
# Get data from thread01
# Get data from thread02
# Send data via serial port
pass
Then there is a minimal representation of a thread:
import time
def start():
while True:
a = a + 1
time.sleep(0.5)
For instance, my goal is to access the a variable and send the value through the serial port. The serial communication is not a problem, I know how to do it.
So my question is, how can I implement the shared memory?

There are various ways to implement this. One simple way to serialize the access to a shared resource (which will make the access safe) is by using a Lock. You can read more here.
As an example, your threads might collect the data that needs to be sent in a shared list, like this:
data_to_send = []
and when you need to read from or write to that list, simply do so in the context of a global Lock object. Like this:
with lock:
data_to_send.append(some_data)

Related

Separate computation from socket work in Python

I'm serializing column data and then sending it over a socket connection.
Something like:
import array, struct, socket
## Socket setup
s = socket.create_connection((ip, addr))
## Data container setup
ordered_col_list = ('col1', 'col2')
columns = dict.fromkeys(ordered_col_list)
for i in range(num_of_chunks):
## Binarize data
columns['col1'] = array.array('i', range(10000))
columns['col2'] = array.array('f', [float(num) for num in range(10000)])
.
.
.
## Send away
chunk = b''.join(columns[col_name] for col_name in ordered_col_list]
s.sendall(chunk)
s.recv(1000) #get confirmation
I wish to separate the computation from the sending, put them on separate threads or processes, so I can keep doing computations while data is sent away.
I've put the binarizing part as a generator function, then sent the generator to a separate thread, which then yielded binary chunks via a queue.
I collected the data from the main thread and sent it away. Something like:
import array, struct, socket
from time import sleep
try:
import thread
from Queue import Queue
except:
import _thread as thread
from queue import Queue
## Socket and queue setup
s = socket.create_connection((ip, addr))
chunk_queue = Queue()
def binarize(num_of_chunks):
''' Generator function that yields chunks of binary data. In reality it wouldn't be the same data'''
ordered_col_list = ('col1', 'col2')
columns = dict.fromkeys(ordered_col_list)
for i in range(num_of_chunks):
columns['col1'] = array.array('i', range(10000)).tostring()
columns['col2'] = array.array('f', [float(num) for num in range(10000)]).tostring()
.
.
yield b''.join((columns[col_name] for col_name in ordered_col_list))
def chunk_yielder(queue):
''' Generate binary chunks and put them on a queue. To be used from a thread '''
while True:
try:
data_gen = queue.get_nowait()
except:
sleep(0.1)
continue
else:
for chunk in data_gen:
queue.put(chunk)
## Setup thread and data generator
thread.start_new_thread(chunk_yielder, (chunk_queue,))
num_of_chunks = 100
data_gen = binarize(num_of_chunks)
queue.put(data_gen)
## Get data back and send away
while True:
try:
binary_chunk = queue.get_nowait()
except:
sleep(0.1)
continue
else:
socket.sendall(binary_chunk)
socket.recv(1000) #Get confirmation
However, I did not see and performance imporovement - it did not work faster.
I don't understand threads/processes too well, and my question is whether it is possible (at all and in Python) to gain from this type of separation, and what would be a good way to go about it, either with threads or processess (or any other way - async etc).
EDIT:
As far as I've come to understand -
Multirpocessing requires serializing any sent data, so I'm double-sending every computed data.
Sending via socket.send() should release the GIL
Therefore I think (please correct me if I am mistaken) that a threading solution is the right way. However I'm not sure how to do it correctly.
I know cython can release the GIL off of threads, but since one of them is just socket.send/recv, my understanding is that it shouldn't be necessary.
You have two options for running things in parallel in Python, either use the multiprocessing (docs) library , or write the parallel code in cython and release the GIL. The latter is significantly more work and less applicable generally speaking.
Python threads are limited by the Global Interpreter Lock (GIL), I won't go into detail here as you will find more than enough information online on it. In short, the GIL, as the name suggests, is a global lock within the CPython interpreter that ensures multiple threads do not modify objects, that are within the confines of said interpreter, simultaneously. This is why, for instance, cython programs can run code in parallel because they can exist outside the GIL.
As to your code, one problem is that you're running both the number crunching (binarize) and the socket.send inside the GIL, this will run them strictly serially. The queue is also connected very strangely, and there is a NameError but let's leave those aside.
With the caveats already pointed out by Jeremy Friesner in mind, I suggest you re-structure the code in the following manner: you have two processes (not threads) one for binarising the data and the other for sending data. In addition to those, there is also the parent process that started both children, and a queue connecting child 1 to child 2.
Subprocess-1 does number crunching and produces crunched data into a queue
Subprocess-2 consumes data from a queue and does socket.send
in code the setup would look something like
from multiprocessing import Process, Queue
work_queue = Queue()
p1 = Process(target=binarize, args=(100, work_queue))
p2 = Process(target=send_data, args=(ip, port, work_queue))
p1.start()
p2.start()
p1.join()
p2.join()
binarize can remain as it is in your code, with the exception that instead of a yield at the end, you add elements into the queue
def binarize(num_of_chunks, q):
''' Generator function that yields chunks of binary data. In reality it wouldn't be the same data'''
ordered_col_list = ('col1', 'col2')
columns = dict.fromkeys(ordered_col_list)
for i in range(num_of_chunks):
columns['col1'] = array.array('i', range(10000)).tostring()
columns['col2'] = array.array('f', [float(num) for num in range(10000)]).tostring()
data = b''.join((columns[col_name] for col_name in ordered_col_list))
q.put(data)
send_data should just be the while loop from the bottom of your code, with the connection open/close functionality
def send_data(ip, addr, q):
s = socket.create_connection((ip, addr))
while True:
try:
binary_chunk = q.get(False)
except:
sleep(0.1)
continue
else:
socket.sendall(binary_chunk)
socket.recv(1000) # Get confirmation
# maybe remember to close the socket before killing the process
Now you have two (three actually if you count the parent) processes that are processing data independently. You can force the two processes to synchronise their operations by setting the max_size of the queue to a single element. The operation of these two separate processes is also easy to monitor from the process manager on your computer top (Linux), Activity Monitor (OsX), don't remember what it's called under Windows.
Finally, Python 3 comes with the option of using co-routines which are neither processes nor threads, but something else entirely. Co-routines are pretty cool from a CS point of view, but a bit of a head scratcher at first. There is plenty of resources to learn from though, like this post on Medium and this talk by David Beazley.
Even more generally, you might want to look into the producer/consumer pattern, if you are not already familiar with it.
If you are trying to use concurrency to improve performance in CPython I would strongly recommend using multiprocessing library instead of multithreading. It is because of GIL (Global Interpreter Lock), which can have a huge impact on execution speed (in some cases, it may cause your code to run slower than single threaded version). Also, if you would like to learn more about this topic, I recommend reading this presentation by David Beazley. Multiprocessing bypasses this problem by spawning a new Python interpreter instance for each process, thus allowing you to take full advantage of multi core architecture.

Multiprocesses python with shared memory

I have an object that connects to a websocket remote server. I need to do a parallel process at the same time. However, I don't want to create a new connection to the server. Since threads are the easier way to do this, this is what I have been using so far. However, I have been getting a huge latency because of GIL. Can I achieve the same thing as threads but with multiprocesses in parallel?
This is the code that I have:
class WebSocketApp(object):
def on_open(self):
# Create another thread to make sure the commands are always been read
print "Creating thread..."
try:
thread.start_new_thread( self.read_commands,() )
except:
print "Error: Unable to start thread"
Is there an equivalent way to do this with multiprocesses?
Thanks!
The direct equivalent is
import multiprocessing
class WebSocketApp(object):
def on_open(self):
# Create another process to make sure the commands are always been read
print "Creating process..."
try:
multiprocessing.Process(target=self.read_commands,).start()
except:
print "Error: Unable to start process"
However, this doesn't address the "shared memory" aspect, which has to be handled a little differently than it is with threads, where you can just use global variables. You haven't really specified what objects you need to share between processes, so it's hard to say exactly what approach you should take. The multiprocessing documentation does cover ways to deal with shared state, however. Do note that in general it's better to avoid shared state if possible, and just explicitly pass state between the processes, either as an argument to the Process constructor or via a something like a Queue.
You sure can, use something along the lines of:
from multiprocessing import Process
class WebSocketApp(object):
def on_open(self):
# Create another thread to make sure the commands are always been read
print "Creating thread..."
try:
p = Process(target = WebSocketApp.read_commands, args = (self, )) # Add other arguments to this tuple
p.start()
except:
print "Error: Unable to start thread"
It is important to note, however, that as soon as the object is sent to the other process the two objects self and self in the different threads diverge and represent different objects. If you wish to communicate you will need to use something like the included Queue or Pipe in the multiprocessing module.
You may need to keep a reference of all the processes (p in this case) in your main thread in order to be able to communicate that your program is terminating (As a still-running child process will appear to hang the parent when it dies), but that depends on the nature of your program.
If you wish to keep the object the same, you can do one of a few things:
Make all of your object properties either single values or arrays and then do something similar to this:
from multiprocessing import Process, Value, Array
class WebSocketApp(object):
def __init__(self):
self.my_value = Value('d', 0.3)
self.my_array = Array('i', [4 10 4])
# -- Snip --
And then these values should work as shared memory. The types are very restrictive though (You must specify their types)
A different answer is to use a manager:
from multiprocessing import Process, Manager
class WebSocketApp(object):
def __init__(self):
self.my_manager = Manager()
self.my_list = self.my_manager.list()
self.my_dict = self.my_manager.dict()
# -- Snip --
And then self.my_list and self.my_dict act as a shared-memory list and dictionary respectively.
However, the types for both of these approaches can be restrictive so you may have to roll your own technique with a Queue and a Semaphore. But it depends what you're doing.
Check out the multiprocessing documentation for more information.

Maintaining log file from multiple threads in Python

I have my python baseHTTPServer server, which handles post requests.
I used ThreadingMixIn and its now opens a thread for each connection.
I wish to do several multithreaded actions, such as:
1. Monitoring successful/failed connections activities, by adding 1 to a counter for each.
I need a lock for that. My counter is in global scope of the same file. How can I do that?
2. I wish to handle some sort of queue and write it to a file, where the content of the queue is a set of strings, written from my different threads, that simply sends some information for logging issues. How can it be done? I fail to accomplish that since my threading is done "behind the scenes", as each time Im in do_POST(..) method, Im already in a different thread.
Succcessful_Logins = 0
Failed_Logins = 0
LogsFile = open(logfile)
class httpHandler(BaseHTTPRequestHandler):
def do_POST(self):
..
class ThreadingHTTPServer(ThreadingMixIn, HTTPServer):
pass
server = ThreadingHTTPServer(('localhost', PORT_NUMBER), httpHandler)
server.serve_forever()
this is a small fragile of my server.
Another thing that bothers my is the face I want to first send the post response back to the client, and only then possibly get delayed due to locking mechanism or whatever.
From your code, it looks like a new httpHandler is constructed in each thread? If that's the case you can use a class variable for the count and a mutex to protect the count like:
class httpHandler(...):
# Note that these are class variables and are therefore accessable
# to all instances
numSuccess = 0
numSuccessLock = new threading.Lock()
def do_POST(self):
self.numSuccessLock.aquire()
self.numSuccess += 1
self.numSuccessLock.release()
As for writing to a file from different threads, there are a few options:
Use the logging module, "The logging module is intended to be thread-safe without any special work needing to be done by its clients." from http://docs.python.org/2/library/logging.html#thread-safety
Use a Lock object like above to serialize writes to the file
Use a thread safe queue to queue up writes and then read from the queue and write to the file from a separate thread. See http://docs.python.org/2/library/queue.html#module-Queue for examples.

Share a variable between two threads in python

I want to share a variable between two threads using atomic operations in the interpreter, as described here http://effbot.org/zone/thread-synchronization.htm. A simple assignment (single bytecode operation) of a core data type should be thread safe, beacuse of the GIL in python < 3.2. So far the theory. The follwing code can be run in either master or slave mode (-m or -s). The master mode does constantly send data via UDP. The slave mode does create a thread to read data from a udp port and update a variable on each received packet.
The example code does pass the shared variable as an argument to the thread on creation. I've tried also by using a global variable or passing a thread local store to the thread.
The result is alwas the same. Inside the read_time_master thread the variable gets assigned. But in the main thread, the value of shared variable isn't updated.
#!/usr/bin/env python
import socket
import itertools
import multiprocessing
from optparse import OptionParser
from time import sleep
PORT = 1666
def read_time_master(sock, time_master):
while True:
time_master = float(sock.recvfrom(1024)[0])
def main():
time_master = 0.0
p = OptionParser()
p.add_option('--master', '-m', action='store_true')
p.add_option('--slave', '-s', action='store_true')
options, arguments = p.parse_args()
if options.master or options.slave:
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, 0)
if options.master:
sock.connect(('127.0.0.1', PORT))
if options.slave:
sock.bind(('0.0.0.0', PORT))
recv_thread = multiprocessing.Process(target=read_time_master, args=(sock, time_master))
recv_thread.start()
for time in itertools.count():
print time
if options.slave:
print "master: %f" % time_master # -> not updated from other thread
if options.master:
try:
sock.send(str(time))
except socket.error:
pass
sleep(1)
if options.master or options.slave:
sock.close()
if __name__ == '__main__':
main()
You're using multiprocessing, not threading, which isn't helping your situation. If you were using threading.Thread to create the background worker you'd probably be able to get what you needed by simply throwing in a global time_master call within the function that's being controlled by your background operation. Because you're using multiprocessing, not threading, you will likely need to look into the multiprocessing.Queue class for containers that you can use to pass information back and forth between your processes or to synchronize them. You can also create variables that are shared between the processes as well (all of this is covered in the multiprocessing documentation / examples at the Python Homepage
You can use Shared Memory, as explained here http://docs.python.org/2/library/multiprocessing.html#sharing-state-between-processes. Remember to wait for the process to finish before reading from the shared space.

"select" on multiple Python multiprocessing Queues?

What's the best way to wait (without spinning) until something is available in either one of two (multiprocessing) Queues, where both reside on the same system?
Actually you can use multiprocessing.Queue objects in select.select. i.e.
que = multiprocessing.Queue()
(input,[],[]) = select.select([que._reader],[],[])
would select que only if it is ready to be read from.
No documentation about it though. I was reading the source code of the multiprocessing.queue library (at linux it's usually sth like /usr/lib/python2.6/multiprocessing/queue.py) to find it out.
With Queue.Queue I didn't have found any smart way to do this (and I would really love to).
It doesn't look like there's an official way to handle this yet. Or at least, not based on this:
http://bugs.python.org/issue3831
You could try something like what this post is doing -- accessing the underlying pipe filehandles:
http://haltcondition.net/?p=2319
and then use select.
Not sure how well the select on a multiprocessing queue works on windows. As select on windows listens for sockets and not file handles, I suspect there could be problems.
My answer is to make a thread to listen to each queue in a blocking fashion, and to put the results all into a single queue listened to by the main thread, essentially multiplexing the individual queues into a single one.
My code for doing this is:
"""
Allow multiple queues to be waited upon.
queue,value = multiq.select(list_of_queues)
"""
import queue
import threading
class queue_reader(threading.Thread):
def __init__(self,inq,sharedq):
threading.Thread.__init__(self)
self.inq = inq
self.sharedq = sharedq
def run(self):
while True:
data = self.inq.get()
print ("thread reads data=",data)
result = (self.inq,data)
self.sharedq.put(result)
class multi_queue(queue.Queue):
def __init__(self,list_of_queues):
queue.Queue.__init__(self)
for q in list_of_queues:
qr = queue_reader(q,self)
qr.start()
def select(list_of_queues):
outq = queue.Queue()
for q in list_of_queues:
qr = queue_reader(q,outq)
qr.start()
return outq.get()
The following test routine shows how to use it:
import multiq
import queue
q1 = queue.Queue()
q2 = queue.Queue()
q3 = multiq.multi_queue([q1,q2])
q1.put(1)
q2.put(2)
q1.put(3)
q1.put(4)
res=0
while not res==4:
while not q3.empty():
res = q3.get()[1]
print ("returning result =",res)
Hope this helps.
Tony Wallace
Seems like using threads which forward incoming items to a single Queue which you then wait on is a practical choice when using multiprocessing in a platform independent manner.
Avoiding the threads requires either handling low-level pipes/FDs which is both platform specific and not easy to handle consistently with the higher-level API.
Or you would need Queues with the ability to set callbacks which i think are the proper higher level interface to go for. I.e. you would write something like:
singlequeue = Queue()
incoming_queue1.setcallback(singlequeue.put)
incoming_queue2.setcallback(singlequeue.put)
...
singlequeue.get()
Maybe the multiprocessing package could grow this API but it's not there yet. The concept works well with py.execnet which uses the term "channel" instead of "queues", see here http://tinyurl.com/nmtr4w
As of Python 3.3 you can use multiprocessing.connection.wait to wait on multiple Queue._reader objects at once.
You could use something like the Observer pattern, wherein Queue subscribers are notified of state changes.
In this case, you could have your worker thread designated as a listener on each queue, and whenever it receives a ready signal, it can work on the new item, otherwise sleep.
New version of above code...
Not sure how well the select on a multiprocessing queue works on windows. As select on windows listens for sockets and not file handles, I suspect there could be problems.
My answer is to make a thread to listen to each queue in a blocking fashion, and to put the results all into a single queue listened to by the main thread, essentially multiplexing the individual queues into a single one.
My code for doing this is:
"""
Allow multiple queues to be waited upon.
An EndOfQueueMarker marks a queue as
"all data sent on this queue".
When this marker has been accessed on
all input threads, this marker is returned
by the multi_queue.
"""
import queue
import threading
class EndOfQueueMarker:
def __str___(self):
return "End of data marker"
pass
class queue_reader(threading.Thread):
def __init__(self,inq,sharedq):
threading.Thread.__init__(self)
self.inq = inq
self.sharedq = sharedq
def run(self):
q_run = True
while q_run:
data = self.inq.get()
result = (self.inq,data)
self.sharedq.put(result)
if data is EndOfQueueMarker:
q_run = False
class multi_queue(queue.Queue):
def __init__(self,list_of_queues):
queue.Queue.__init__(self)
self.qList = list_of_queues
self.qrList = []
for q in list_of_queues:
qr = queue_reader(q,self)
qr.start()
self.qrList.append(qr)
def get(self,blocking=True,timeout=None):
res = []
while len(res)==0:
if len(self.qList)==0:
res = (self,EndOfQueueMarker)
else:
res = queue.Queue.get(self,blocking,timeout)
if res[1] is EndOfQueueMarker:
self.qList.remove(res[0])
res = []
return res
def join(self):
for qr in self.qrList:
qr.join()
def select(list_of_queues):
outq = queue.Queue()
for q in list_of_queues:
qr = queue_reader(q,outq)
qr.start()
return outq.get()
The follow code is my test routine to show how it works:
import multiq
import queue
q1 = queue.Queue()
q2 = queue.Queue()
q3 = multiq.multi_queue([q1,q2])
q1.put(1)
q2.put(2)
q1.put(3)
q1.put(4)
q1.put(multiq.EndOfQueueMarker)
q2.put(multiq.EndOfQueueMarker)
res=0
have_data = True
while have_data:
res = q3.get()[1]
print ("returning result =",res)
have_data = not(res==multiq.EndOfQueueMarker)
The one situation where I'm usually tempted to multiplex multiple queues is when each queue corresponds to a different type of message that requires a different handler. You can't just pull from one queue because if it isn't the type of message you want, you need to put it back.
However, in this case, each handler is essentially a separate consumer, which makes it an a multi-producer, multi-consumer problem. Fortunately, even in this case you still don't need to block on multiple queues. You can create different thread/process for each handler, with each handler having its own queue. Basically, you can just break it into multiple instances of a multi-producer, single-consumer problem.
The only situation I can think of where you would have to wait on multiple queues is if you were forced to put multiple handlers in the same thread/process. In that case, I would restructure it by creating a queue for my main thread, spawning a thread for each handler, and have the handlers communicate with the main thread using the main queue. Each handler could then have a separate queue for its unique type of message.
Don't do it.
Put a header on the messages and send them to a common queue. This simplifies the code and will be cleaner overall.

Categories

Resources