I was trying to look for a Bidirectional/Omnidirectional Queue to send jobs back and forth between processes.
the best solution I could come up with was to use two multiprocessing queues that are filled from one process and read through the other (or a Pipe which is apparently faster, still haven't tried it yet).
I came across this answer that describes the difference between a Pipe and a Queue, it states that
A Queue() can have multiple producers and consumers.
I know a queue can be shared between multiple processes( > 2 processes ), but how should I organize the communication between the processes so that a message has a targeted process, or at least the process does not read the jobs it inserted to the queue, and how I scale it to more than 2 processes.
EX: I have 2 (or more) Processes (A, B) they they share the same Queue, A needs to send a job to B and B sends a job to A, if I simply use queue.put(job), the job might be read from either processes depending on who called queue.get() first, so the job that was put by A intended to B might be read by A, which is not the targeted process, if I added a flag of which process it should be executed by, it would destroy the sequentiality of the queue.
For those facing the same problem, I have found the solution, it is multiprocessing.Pipe() it is faster than queues but it only works if you have 2 processes.
Here is a simple example to help
import multiprocessing as mp
from time import time
def process1_function(conn, events):
for event in events:
# send jobs to the process_2
conn.send((event, time()))
print(f"Event Sent: {event}")
# check if there are any messages in the pipe from process_2
if conn.poll():
# read the message from process_2
print(conn.recv())
# continue checking the messages in the pipe from process_2
while conn.poll():
print(conn.recv())
def process2_function(conn):
while True:
# check if there are any messages in the pipe from process_1
if conn.poll():
# read messages in the pipe from process_1
event, sent = conn.recv()
# send messages to process_1
conn.send(f"{event} complete, {time() - sent}")
if event == "eod":
break
conn.send("all events finished")
def run():
events = ["get up", "brush your teeth", "shower", "work", "eod"]
conn1, conn2 = mp.Pipe()
process_1 = mp.Process(target=process1_function, args=(conn1, events))
process_2 = mp.Process(target=process2_function, args=(conn2,))
process_1.start()
process_2.start()
process_1.join()
process_2.join()
if __name__ == "__main__":
run()
Related
Scenarion
Sensor is continuously sending data in an interval of 100 milliseconds ( time needs to be configurable)
One Thread read the data continuously from sensor and write it to a common queue
This process is continuous until keyboard interrupt press happens
Thread 2 locks queue, ( may momentarily block Thread1)
Read full data from queue to temp structure
Release the queue
process the data in it. It is a computational task. While performing this task. Thread 1 should keep on filling the buffer with sensor data.
I have read about threading and GIL, so step 7 cannot afford to have any loss in data sent by the sensor while performing the computational process() on thread 2.
How this can be implemented using Python?
What I started with it is
import queue
from threading import Thread
import queue
from queue import Queue
q = Queue(maxsize=10)
def fun1():
fun2Thread = Thread(target=fun2)
fun2Thread.start()
while True:
try:
q.put(1)
except KeyboardInterrupt:
print("Key Interrupt")
fun2Thread.join()
def fun2():
print(q.get())
def read():
fun1Thread = Thread(target=fun1)
fun1Thread.start()
fun1Thread.join()
read()
The issue I'm facing in this is the terminal is stuck after printing 1. Can someone please guide me on how to implement this scenario?
Here's an example that may help.
We have a main program (driver), a client and a server. The main program manages queue construction and the starting and ending of the subprocesses.
The client sends a range of values via a queue to the client. When the range is exhausted it tells the server to terminate. There's a delay (sleep) in enqueueing the data for demonstration purposes.
Try running it once without any interrupt and note how everything terminates nicely. Then run again and interrupt (Ctrl-C) and again note a clean termination.
from multiprocessing import Queue, Process
from signal import signal, SIGINT, SIG_IGN
from time import sleep
def client(q, default):
signal(SIGINT, default)
try:
for i in range(10):
sleep(0.5)
q.put(i)
except KeyboardInterrupt:
pass
finally:
q.put(-1)
def server(q):
while (v := q.get()) != -1:
print(v)
def main():
q = Queue()
default = signal(SIGINT, SIG_IGN)
(server_p := Process(target=server, args=(q,))).start()
(client_p := Process(target=client, args=(q, default))).start()
client_p.join()
server_p.join()
if __name__ == '__main__':
main()
EDIT:
Edited to ensure that the server process continues to drain the queue if the client is terminated due to a KeyboardInterrupt (SIGINT)
I've being using python multiprocessing for some task handling. The dev enviroment is Windows Server 2016 and python 3.7.0.
Sometimes there were child processes that stayed in the task list. But actually, they seemed to be completed(data were writen into database). The impact is that the logging stucked there, being unable to append latest logs.
Here is the code. Main function starts a listener process and several worker processes:
queue = multiprocessing.Queue(-1)
listener = multiprocessing.Process(target=listener_process, args=(queue, listener_configurer))
listener.start()
...
workers = []
for loop:
worker = multiprocessing.Process(target=process_start, args=(queue, worker_configurer, plist))
workers.append(worker)
worker.start()
for w in workers:
w.join()
...
queue.put_nowait(None)
listener.join()
The listener process ends when it gets None, thus resulting the whole task to end.
def listener_process(queue, configurer):
configurer()
while True:
try:
record = queue.get()
if record is None:
break
if type(record) is not int:
Logger = logging.getLogger(record.name)
Logger.handle(record)
except Exception as e:
Logger.error(str(e), exc_info=True)
Task is scheduled to run by windows task scheduler.
Any idea why some multiprocessing processes were 'stuck' there?
It's being bothering me for some time. Thanks in advance.
Can I say for sure what is your problem? No. Can I say for sure you are doing something that can lead to a deadlock? Yes.
If you read the documentation carefully on multiprocessing.Queue, you will see the following warning:
Warning:
As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe.
This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.
Note that a queue created using a manager does not have this issue. See Programming guidelines.
This means that to be completely safe, you must join the listener process (which is issuing gets from the queue) first before joining the workers processes (which are issuing puts to the queue) to ensure that all the messages put to the queue have been read off the queue before you attempt to join the tasks that have done the puts to the queue.
But then how will the listener process know when to terminate if currently it is looking for the main process to write a None sentinel message to the queue signifying that it is quitting time but in the new design the main process must first wait for the listener to terminate before it waits for the workers to terminate? Presumably you have control over the source of the process_start function that implements the producer of messages that are written to the queue and presumably something triggers its decision to terminate. When these processes terminate it is they that must each write a None sentinel message to the queue signifying that they will not be producing any more messages. Then funtion listener_process must be passed an additional argument, i.e. the number of message producers so that it knows how many of these sentinels it should expect to see. Unfortunately, I can't discern from what you have coded, i.e. for loop:, what that number of processes is and it appears that you are instantiating each process with identical arguments. But for the sake of clarity I will modify your code to something that is more explicit:
queue = multiprocessing.Queue(-1)
listener = multiprocessing.Process(target=listener_process, args=(queue, listener_configurer, len(plist)))
listener.start()
...
workers = []
# There will be len(plist) producer of messages:
for p in plist:
worker = multiprocessing.Process(target=process_start, args=(queue, worker_configurer, p))
workers.append(worker)
worker.start()
listener.join() # join the listener first
for w in workers:
w.join()
....
def listener_process(queue, configurer, n_producers):
configurer()
sentinel_count = 0
while True:
try:
record = queue.get()
if record is None:
sentinel_count += 1
if sentinel_count == n_producers:
break # we are done
continue
if type(record) is not int:
Logger = logging.getLogger(record.name)
Logger.handle(record)
except Exception as e:
Logger.error(str(e), exc_info=True)
Update
Here is a complete example. But to avoid the complexities of configuring various loggers with handlers, I am just using a simple print statement. But as you can see, everything is "logged.":
import multiprocessing
def process_start(queue, p):
for i in range(3):
queue.put(p)
queue.put(None) # Sentinel
def listener_process(queue, n_producers):
sentinel_count = 0
while True:
try:
record = queue.get()
if record is None:
sentinel_count += 1
if sentinel_count == n_producers:
break # we are done
continue
if type(record) is not int:
print(record)
except Exception as e:
print(e)
class Record:
def __init__(self, name, value):
self.name = name
self.value = value
def __repr__(self):
return f'name={self.name}, value={self.value}'
def main():
plist = [Record('basic', 'A'), Record('basic', 'B'), Record('basic', 'C')]
queue = multiprocessing.Queue(-1)
listener = multiprocessing.Process(target=listener_process, args=(queue, len(plist)))
listener.start()
workers = []
# There will be len(plist) producer of messages:
for p in plist:
worker = multiprocessing.Process(target=process_start, args=(queue, p))
workers.append(worker)
worker.start()
listener.join() # join the listener first
for w in workers:
w.join()
# Required for Windows:
if __name__ == '__main__':
main()
Prints:
name=basic, value=A
name=basic, value=A
name=basic, value=A
name=basic, value=B
name=basic, value=B
name=basic, value=B
name=basic, value=C
name=basic, value=C
name=basic, value=C
I am developing a server (daemon).
The server has one "worker thread". The worker thread runs a queue of commands. When the queue is empty, the worker thread is paused (but does not exit, because it should preserve certain state in memory). To have exactly one copy of the state in memory, I need to run all time exactly one (not several and not zero) worker thread.
Requests are added to the end of this queue when a client connects to a Unix socket and sends a command.
After the command is issued, it is added to the queue of commands of the worker thread. After it is added to the queue, the server replies something like "OK". There should be not a long pause between server receiving a command and it "OK" reply. However, running commands in the queue may take some time.
The main "work" of the worker thread is split into small (taking relatively little time) chunks. Between chunks, the worker thread inspects ("eats" and empties) the queue and continues to work based on the data extracted from the queue.
How to implement this server/daemon in Python?
This is a sample code with internet sockets, easily replaced with unix domain sockets. It takes whatever you write to the socket, passes it as a "command" to worker, responds OK as soon as it has queued the command. The single worker simulates a lengthy task with sleep(30). You can queue as many tasks as you want, receive OK immediately and every 30 seconds, your worker prints a command from the queue.
import Queue, threading, socket
from time import sleep
class worker(threading.Thread):
def __init__(self,q):
super(worker,self).__init__()
self.qu = q
def run(self):
while True:
new_task=self.qu.get(True)
print new_task
i=0
while i < 10:
print "working ..."
sleep(1)
i += 1
try:
another_task=self.qu.get(False)
print another_task
except Queue.Empty:
pass
task_queue = Queue.Queue()
w = worker(task_queue)
w.daemon = True
w.start()
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind(('localhost', 4200))
sock.listen(1)
try:
while True:
conn, addr = sock.accept()
data = conn.recv(32)
task_queue.put(data)
conn.sendall("OK")
conn.close()
except:
sock.close()
I am trying to implement a Python (2.6.x/2.7.x) thread pool that would check for network connectivity(ping or whatever), the entire pool threads must be killed/terminated when the check is successful.
So I am thinking of creating a pool of, let's say, 10 worker threads. If any one of them is successful in pinging, the main thread should terminate all the rest.
How do I implement this?
This is not a compilable code, this is just to give you and idea how to make threads communicate..
Inter process or threads communication happens through queues or pipes and some other ways..here I'm using queues for communication.
It works like this.. I'll send ip addresses in in_queue and add response to out_queue, my main thread monitors out_queue and if it gets desired result, it marks all the threads to terminate.
Below is the pinger thread definition..
import threading
from Queue import Queue, Empty
# A thread that pings ip.
class Pinger(threading.Thread):
def __init__(self, kwargs=None):
threading.Thread.__init__(self)
self.kwargs = kwargs
self.stop_pinging = False
def run(self):
ip_queue = self.kwargs.get('in_queue')
out_queue = self.kwargs.get('out_queue')
while not self.stop_pinging:
try:
data = ip_quque.get(timeout=1)
ping_status = ping(data)
# This is pseudo code, you've to takecare of
# your own ping.
if ping_status:
out_queue.put('success')
# you can even break here if you don't want to
# continue after one success
else:
out_queue.put('failure')
if ip_queue.empty()
break
except Empty, e:
pass
Here is the main thread block..
# Create the shared queue and launch both thread pools
in_queue = Queue()
out_queue = Queue()
ip_list = ['ip1', 'ip2', '....']
# This is to add all the ips to the queue or you can
# customize to add through some producer way.
for ip in ip_list:
in_queue.put(ip)
pingerer_pool = []
for i in xrange(1, 10):
pingerer_worker = Pinger(kwargs={'in_queue': in_queue, 'out_queue': out_queue}, name=str(i))
pingerer_pool.append(pinger_worker)
pingerer_worker.start()
while 1:
if out_queue.get() == 'success':
for pinger in pinger_pool:
pinger_worker.stop_pinging = True
break
Note: This is a pseudo code, you should make this workable as you like.
I was reading a article on Python multi threading using Queues and have a basic question.
Based on the print stmt, 5 threads are started as expected. So, how does the queue works?
1.The thread is started initially and when the queue is populated with a item does it gets restarted and starts processing that item?
2.If we use the queue system and threads process each item by item in the queue, how there is a improvement in performance..Is it not similar to serial processing ie; 1 by 1.
import Queue
import threading
import urllib2
import datetime
import time
hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
"http://ibm.com", "http://apple.com"]
queue = Queue.Queue()
class ThreadUrl(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
print 'threads are created'
self.queue = queue
def run(self):
while True:
#grabs host from queue
print 'thread startting to run'
now = datetime.datetime.now()
host = self.queue.get()
#grabs urls of hosts and prints first 1024 bytes of page
url = urllib2.urlopen(host)
print 'host=%s ,threadname=%s' % (host,self.getName())
print url.read(20)
#signals to queue job is done
self.queue.task_done()
start = time.time()
if __name__ == '__main__':
#spawn a pool of threads, and pass them queue instance
print 'program start'
for i in range(5):
t = ThreadUrl(queue)
t.setDaemon(True)
t.start()
#populate queue with data
for host in hosts:
queue.put(host)
#wait on the queue until everything has been processed
queue.join()
print "Elapsed Time: %s" % (time.time() - start)
A queue is similar to a list container, but with internal locking to make it a thread-safe way to communicate data.
What happens when you start all of your threads is that they all block on the self.queue.get() call, waiting to pull an item from the queue. When an item is put into the queue from your main thread, one of the threads will become unblocked and receive the item. It can then continue to process it until it finishes and returns to a blocking state.
All of your threads can run concurrently because they all are able to receive items from the queue. This is where you would see your improvement in performance. If the urlopen and read take time in one thread and it is waiting on IO, that means another thread can do work. The queue objects job is simply to manage the locking access, and popping off items to the callers.