I am building a non-blocking chat application for my website, and I decided to implement some multiprocessing to deal with DB querying and real-time messaging.
I assume that when a user lands on a given URL to see their conversation with the other person, I will fire off the script, the multiprocessing will begin, the messages will be added to a queue and displayed on the page, new messages will be sent to a separate queue that interacts with the DB, etc. (Regular message features ensue.)
However, what happens when the user leaves this page? I assume I need to exit these various processes, but currently, this does not lend itself to a "clean" exit. I would have to terminate processes and according to the multiprocessing docs:
Warning: If this method (terminate()) is used when the associated process is using a pipe
or queue then the pipe or queue is liable to become corrupted and may become
unusable by other process.Similarly, if the process has acquired a lock or
semaphore etc. then terminating it is liable to cause other processes to
deadlock.
I have also looked into sys.exit(); however, it doesn't fully exit the script without the use of terminate() on the various processes.
Here is my code that is simplified for the purposes of this question. If I need to change it, that's completely fine. I simply want to make sure I am going about this appropriately.
import multiprocessing
import Queue
import time
import sys
## Get all past messages
def batch_messages():
# The messages list here will be attained via a db query
messages = [">> This is the message.", ">> Hello, how are you doing today?", ">> Really good!"]
for m in messages:
print m
## Add messages to the DB
def add_messages(q2):
while True:
# Retrieve from the queue
message_to_db = q2.get()
# For testing purposes only; perfrom another DB query to add the message to the DB
print message_to_db, "(Add to DB)"
## Recieve new, inputted messages.
def receive_new_message(q1, q2):
while True:
# Add the new message to the queue:
new_message = q1.get()
# Print the message to the (other user's) screen
print ">>", new_message
# Add the q1 message to q2 for databse manipulation
q2.put(new_message)
def shutdown():
print "Shutdown initiated"
p_rec.terminate()
p_batch.terminate()
p_add.terminate()
sys.exit()
if __name__ == "__main__":
# Set up the queue
q1 = multiprocessing.Queue()
q2 = multiprocessing.Queue()
# Set up the processes
p_batch = multiprocessing.Process(target=batch_messages)
p_add = multiprocessing.Process(target=add_messages, args=(q2,))
p_rec = multiprocessing.Process(target=receive_new_message, args=(q1, q2,))
# Start the processes
p_batch.start() # Perfrom batch get
p_rec.start()
p_add.start()
time.sleep(0.1) # Test: Sleep to allow proper formatting
while True:
# Enter a new message
input_message = raw_input("Type a message: ")
# TEST PURPOSES ONLY: shutdown
if input_message == "shutdown_now":
shutdown()
# Add the new message to the queue:
q1.put(input_message)
# Let the processes catch up before printing "Type a message: " again. (Shell purposes only)
time.sleep(0.1)
How should I deal with this situation? Does my code need to be fundamentally revised?, and if so, what should I do to fix it?
Any thoughts, comments, revisions, or resources appreciated.
Thank you!
Disclaimer: I don't actually know python. But multithreadding concepts are similar enough in all the langauges I do know that I feel confident enough to try to answer anyway.
When using multiple threads/proccesses each one should have a step in it's loop to check a variable, (I often call the variable "active", or "keepGoing" or something and it's usually a boolean.)
That variable is usually either shared between the threads, or sent as a message to each thread depending on your programming language and when you want the proccessing to stop, (finish your work first y/n?)
Once the variable is set all threads quit their proccessing loops and proceed to exit their threads.
In your case you have a loop "while true". This never exits. Change it to exit when a variable is set and the thread should close itself when the function exit is reached.
Related
I have a big dataset in a data acquisition system I wrote in python that takes infinitely long to pass over a queue from the child process to the parent. I want to save the data acquired at the end of the acquisition and tried this using the queue function in Multiprocessing. Instead of doing it this way I would prefer it if I could instead pass a message over the queue from the parent to the child to save my data before I kill the child process. Is this possible? An example of what I thought it might look like is:
def acquireData(self, var1, queue):
import h5py
# Put my acquisition code here
queue.get()
if queue == True:
f = h5py.File("FileName","w")
f.create_dataset('Data',data=data)
f.close()
if __name__ == '__main__':
from multiprocessing import Process, Queue
queue = Queue()
inter_thread = Process(target=acquireData, args=(var1,queue))
queue.put(False)
inter_thread.start()
while True:
if not args.automate:
# Let c++ threads run for given amount of time
# Wait for stop from OP GUI
else:
queue.put(True)
break
print("Acquisition finished, cleaning up...")
sleep(2)
inter_thread.terminate()
Is this allowed? If this type of interfacing between processes is allowed then do I have the right notation? For some reference I have on the order of 9e7 data points in the array I'm trying to save and I have 7 arrays which is simply not being passed to my parent process in a timely manner by putting these arrays into the queue. Thank you.
First, yes, passing a queue to a child is not only legal, but the main use case for queues. See the first example in the docs, which does exactly that.
However, you've got some problems with your code:
queue.get()
if queue == True:
First, your queue is never going to be the boolean value True, it's going to be a Queue. You almost never want to check if x == True: in Python; you want to check if x:. For example, if [1, 2]: will pass, while if [1, 2] == True: will not.
Second, your queue isn't even the thing you want to check in the first place. It isn't truthy or falsey (or it isn't relevant whether it is); it's the value the main process put on the queue and you pulled off that's either truthy or falsey. Which you discarded as soon as you retrieved it.
So, do this:
flag = queue.get()
if flag:
Or, more simply:
if queue.get():
I'm not sure whether this is exactly what you want or not. That queue.get() will block forever until the main process puts something there. Is that what you wanted? If so, great; you're done with this part of your code. If not, you need to think about what you wanted instead.
As designed, the parent will always wait 2 seconds, even if the child finished long before that. A better solution is to join the child with a timeout of 2 seconds. Then you can terminate it if times out.
Plus, are you sure the termination behavior you've designed is what you want? You're doing a "soft kill request" with the queue, then waiting 2 seconds, then doing a "medium-hard kill request" with terminate, and never doing a "hard kill" with kill. That could be a perfectly reasonable design—but if it's not your design, you've implemented the wrong thing.
I am debugging application that gather information from 2 sensors : a webcam and a microphone.
The general architecture is quite simple :
the main process sends messages (start, stop, get_data) via pipes to the child processes (one for each).
child processes gather the data and send it to the main process
Child & main processes are in infinite loops to process commands (the main process from the user, the child process from the main process).
It globally works but I have trouble stopping the child processes.
I have logged the code and it seems to happen 2 things :
The 'stop' message is sent but doesn't get through the pipe.
The child process continue to send data and the conn.send(data) blocks.
The behavior is clearly linked to the state of the connection, as child processes that send nothing back don't have this behavior. Still, I don't see how to debug/modify the current architecture which seems reasonnable.
So, what cause this blocking behavior and how to avoid it ?
This is the code which is executed for each iteration of the infinite loop in the child process :
def do(self):
while self.cnx.poll():
msg = self.cnx.recv()
self.queue.append(msg)
#==
if not self.queue:
func_name = 'default_action'
self.queue.append([func_name, ])
#==
msg = self.queue.pop()
func_name, args = msg[0], msg[1:]
#==
res = self.target.__getattribute__(func_name)(*args)
#==
running = func_name != 'stop'
#==
if res and self.send:
assert running
self.output_queue.append(res[0])
if self.output_queue and running:
self.cnx.send(self.output_queue.popleft())
#==
return running
update : it seems that the Pipe cannot be written simultaneously on both end. It works if change the last few lines of the above code to :
if self.output_queue and running:
if not self.cnx.poll():
self.cnx.send(self.output_queue.popleft())
The question stays open though as Pipe are documented as full duplex by default and this behavior is not documented at all. I must have misunderstood something. Please, enlight me!
update 2 : just to be clear, no connection is closed during in this situation. To describe the sequence of events :
the main process sends a messsage ("stop") (it empties the connection before sending the message)
the main process enter an (infinite) loop that stops when the child process is terminated.
meanwhile, the child process is blocked in the send and never gets the message.
A full duplex multiprocessing.Pipe is implemented as socketpair(). Calling .send can block for all the normal reasons when talking to a socket. Based on your description I think it's likely that the reader of your Pipe has quit reading and data has built up in the buffers in the kernel to the point where your .send blocks.
If you explicitly .close the receiving side you'll probably get some kind of error (although possibly SIGPIPE as well, not sure) when you try to .send. If your receiving connection was going out of scope this would probably happen automatically. You may be able to fix the problem by just being more careful not to store references (direct or indirect) to the receiving side so it gets deallocated when that thread goes away.
Trivial demo of blocking .send:
import multiprocessing
a, b = multiprocessing.Pipe()
while True:
print "send!"
a.send("hello world")
Now note that after a while it quits printing "send!"
Im trying to create a Python 3 program that has one or more child processes.
The Parent process spawns the child processes and then goes on with its own buisiness, now and then I want to send a message to a specific child process that catches it and takes action.
Also the child process need to be non locked while waiting for message, it will run a own loop maintaning a server connection and send any recived messages on to parent.
Im currently looking at multiprocessing, threading, subprocess modules in python but have not been able to find any solution.
What Im trying to achive is to have a Main part of the program that interacts with the user, taking care of user inputs and presenting information to the user.
This will be asychronous from the child parts that talks with different servers, reciving messages from server and sending correct messages from user to server.
The child processes will then send information back to main part where they will be pressented to user
My questions are:
Am I going at this in the wrong way
Which module would be the best to use
2.1 How would I set this up
See Doug Hellmann's (multiprocessing) "Communication Between Processes". Part of his Python Module of the Week series. It is fairly simple to use a dictionary or list to communicate with a process.
import time
from multiprocessing import Process, Manager
def test_f(test_d):
""" frist process to run
exit this process when dictionary's 'QUIT' == True
"""
test_d['2'] = 2 ## change to test this
while not test_d["QUIT"]:
print "test_f", test_d["QUIT"]
test_d["ctr"] += 1
time.sleep(1.0)
def test_f2(name):
""" second process to run. Runs until the for loop exits
"""
for j in range(0, 10):
print name, j
time.sleep(0.5)
print "second process finished"
if __name__ == '__main__':
##--- create a dictionary via Manager
manager = Manager()
test_d = manager.dict()
test_d["ctr"] = 0
test_d["QUIT"] = False
##--- start first process and send dictionary
p = Process(target=test_f, args=(test_d,))
p.start()
##--- start second process
p2 = Process(target=test_f2, args=('P2',))
p2.start()
##--- sleep 3 seconds and then change dictionary
## to exit first process
time.sleep(3.0)
print "\n terminate first process"
test_d["QUIT"] = True
print "test_d changed"
print "data from first process", test_d
time.sleep(5.0)
p.terminate()
p2.terminate()
Sounds like you might be familiar with multi-processing, just not with python.
os.pipe will supply you with pipes to connect parent and child. And semaphores can be used to coordinate/signal between parent&child processes. You might want to consider queues for passing messages.
This is my 'game server'. It's nothing serious, I thought this was a nice way of learning a few things about python and sockets.
First the server class initialized the server.
Then, when someone connects, we create a client thread. In this thread we continually listen on our socket.
Once a certain command comes in (I12345001001, for example) it spawns another thread.
The purpose of this last thread is to send updates to the client.
But even though I see the server is performing this code, the data isn't actually being sent.
Could anyone tell where it's going wrong?
It's like I have to receive something before I'm able to send. So I guess somewhere I'm missing something
#!/usr/bin/env python
"""
An echo server that uses threads to handle multiple clients at a time.
Entering any line of input at the terminal will exit the server.
"""
import select
import socket
import sys
import threading
import time
import Queue
globuser = {}
queue = Queue.Queue()
class Server:
def __init__(self):
self.host = ''
self.port = 2000
self.backlog = 5
self.size = 1024
self.server = None
self.threads = []
def open_socket(self):
try:
self.server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.server.bind((self.host,self.port))
self.server.listen(5)
except socket.error, (value,message):
if self.server:
self.server.close()
print "Could not open socket: " + message
sys.exit(1)
def run(self):
self.open_socket()
input = [self.server,sys.stdin]
running = 1
while running:
inputready,outputready,exceptready = select.select(input,[],[])
for s in inputready:
if s == self.server:
# handle the server socket
c = Client(self.server.accept(), queue)
c.start()
self.threads.append(c)
elif s == sys.stdin:
# handle standard input
junk = sys.stdin.readline()
running = 0
# close all threads
self.server.close()
for c in self.threads:
c.join()
class Client(threading.Thread):
initialized=0
def __init__(self,(client,address), queue):
threading.Thread.__init__(self)
self.client = client
self.address = address
self.size = 1024
self.queue = queue
print 'Client thread created!'
def run(self):
running = 10
isdata2=0
receivedonce=0
while running > 0:
if receivedonce == 0:
print 'Wait for initialisation message'
data = self.client.recv(self.size)
receivedonce = 1
if self.queue.empty():
print 'Queue is empty'
else:
print 'Queue has information'
data2 = self.queue.get(1, 1)
isdata2 = 1
if data2 == 'Exit':
running = 0
print 'Client is being closed'
self.client.close()
if data:
print 'Data received through socket! First char: "' + data[0] + '"'
if data[0] == 'I':
print 'Initializing user'
user = {'uid': data[1:6] ,'x': data[6:9], 'y': data[9:12]}
globuser[user['uid']] = user
print globuser
initialized=1
self.client.send('Beginning - Initialized'+';')
m=updateClient(user['uid'], queue)
m.start()
else:
print 'Reset receivedonce'
receivedonce = 0
print 'Sending client data'
self.client.send('Feedback: ' +data+';')
print 'Client Data sent: ' + data
data=None
if isdata2 == 1:
print 'Data2 received: ' + data2
self.client.sendall(data2)
self.queue.task_done()
isdata2 = 0
time.sleep(1)
running = running - 1
print 'Client has stopped'
class updateClient(threading.Thread):
def __init__(self,uid, queue):
threading.Thread.__init__(self)
self.uid = uid
self.queue = queue
global globuser
print 'updateClient thread started!'
def run(self):
running = 20
test=0
while running > 0:
test = test + 1
self.queue.put('Test Queue Data #' + str(test))
running = running - 1
time.sleep(1)
print 'Updateclient has stopped'
if __name__ == "__main__":
s = Server()
s.run()
I don't understand your logic -- in particular, why you deliberately set up two threads writing at the same time on the same socket (which they both call self.client), without any synchronization or coordination, an arrangement that seems guaranteed to cause problems.
Anyway, a definite bug in your code is you use of the send method -- you appear to believe that it guarantees to send all of its argument string, but that's very definitely not the case, see the docs:
Returns the number of bytes sent.
Applications are responsible for
checking that all data has been sent;
if only some of the data was
transmitted, the application needs to
attempt delivery of the remaining
data.
sendall is the method that you probably want:
Unlike send(), this method continues
to send data from string until either
all data has been sent or an error
occurs.
Other problems include the fact that updateClient is apparently designed to never terminate (differently from the other two thread classes -- when those terminate, updateClient instances won't, and they'll just keep running and keep the process alive), redundant global statements (innocuous, just confusing), some threads trying to read a dict (via the iteritems method) while other threads are changing it, again without any locking or coordination, etc, etc -- I'm sure there may be even more bugs or problems, but, after spotting several, one's eyes tend to start to glaze over;-).
You have three major problems. The first problem is likely the answer to your question.
Blocking (Question Problem)
The socket.recv is blocking. This means that execution is halted and the thread goes to sleep until it can read data from the socket. So your third update thread just fills the queue up but it only gets emptied when you get a message. The queue is also emptied by one message at a time.
This is likely why it will not send data unless you send it data.
Message Protocol On Stream Protocol
You are trying to use the socket stream like a message stream. What I mean is you have:
self.server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
The SOCK_STREAM part says it is a stream not a message such as SOCK_DGRAM. However, TCP does not support message. So what you have to do is build messages such as:
data =struct.pack('I', len(msg)) + msg
socket.sendall(data)
Then the receiving end will looking for the length field and read the data into a buffer. Once enough data is in the buffer it can grab out the entire message.
Your current setup is working because your messages are small enough to all be placed into the same packet and also placed into the socket buffer together. However, once you start sending large data over multiple calls with socket.send or socket.sendall you are going to start having multiple messages and partial messages being read unless you implement a message protocol on top of the socket byte stream.
Threads
Even though threads can be easier to use when starting out they come with a lot of problems and can degrade performance if used incorrectly especially in Python. I love threads so do not get me wrong. Python also has a problem with the GIL (global interpreter lock) so you get bad performance when using threads that are CPU bound. Your code is mostly I/O bound at the moment, but in the future it may become CPU bound. Also you have to worry about locking with threading. A thread can be a quick fix but may not be the best fix. There are circumstances where threading is quite simply the easiest way to break some time consuming process. So do not discard threads as evil or terrible. In Python they are considered bad mainly because of the GIL, and in other languages (including Python) because of concurrency issues so most people recommend you to use multiple processes with Python or use asynchronous code. The subject of to use a thread or not is very complex as it depends on the language (way your code is run), the system (single or multiple processors), and contention (trying to share a resource with locking), and other factors, but generally asynchronous code is faster because it utilizes more CPU with less overhead especially if you are not CPU bound.
The solution is the usage of the select module in Python, or something similar. It will tell you when a socket has data to be read, and you can set a timeout parameter.
You can gain more performance by doing asynchronous work (asynchronous sockets). To turn a socket into asynchronous mode you simply call socket.settimeout(0) which will make it not block. However, you will constantly eat CPU spinning waiting for data. The select module and friends will prevent you from spinning.
Generally for performance you want to do as much asynchronous (same thread) as possible, and then expand with more threads that are also doing as much asynchronously as possible. However as previously noted Python is an exception to this rule because of the GIL (global interpreter lock) which can actually degrade performance from what I have read. If you are interesting you should try writing a test case and find out!
You should also check out the thread locking primitives in the threading module. They are Lock, RLock, and Condition. They can help multiple threads share data with out problems.
lock = threading.Lock()
def myfunction(arg):
with lock:
arg.do_something()
Some Python objects are thread safe and others are not.
Sending Updates Asynchronously (Improvement)
Instead of using a third thread only to send updates you could instead use the client thread to send updates by checking the current time with the last time an update was sent. This would remove the usage of a Queue and a Thread. Also to do this you must convert your client code into asynchronous code and have a timeout on your select so that you can at interval check the current time to see if an update is needed.
Summary
I would recommend you rewrite your code using asynchronous socket code. I would also recommend that you use a single thread for all clients and the server. This will improve performance and decrease latency. It would make debugging easier because you would have no possible concurrency issues like you have with threads. Also, fix your message protocol before it fails.
(I'm using the pyprocessing module in this example, but replacing processing with multiprocessing should probably work if you run python 2.6 or use the multiprocessing backport)
I currently have a program that listens to a unix socket (using a processing.connection.Listener), accept connections and spawns a thread handling the request. At a certain point I want to quit the process gracefully, but since the accept()-call is blocking and I see no way of cancelling it in a nice way. I have one way that works here (OS X) at least, setting a signal handler and signalling the process from another thread like so:
import processing
from processing.connection import Listener
import threading
import time
import os
import signal
import socket
import errno
# This is actually called by the connection handler.
def closeme():
time.sleep(1)
print 'Closing socket...'
listener.close()
os.kill(processing.currentProcess().getPid(), signal.SIGPIPE)
oldsig = signal.signal(signal.SIGPIPE, lambda s, f: None)
listener = Listener('/tmp/asdf', 'AF_UNIX')
# This is a thread that handles one already accepted connection, left out for brevity
threading.Thread(target=closeme).start()
print 'Accepting...'
try:
listener.accept()
except socket.error, e:
if e.args[0] != errno.EINTR:
raise
# Cleanup here...
print 'Done...'
The only other way I've thought about is reaching deep into the connection (listener._listener._socket) and setting the non-blocking option...but that probably has some side effects and is generally really scary.
Does anyone have a more elegant (and perhaps even correct!) way of accomplishing this? It needs to be portable to OS X, Linux and BSD, but Windows portability etc is not necessary.
Clarification:
Thanks all! As usual, ambiguities in my original question are revealed :)
I need to perform cleanup after I have cancelled the listening, and I don't always want to actually exit that process.
I need to be able to access this process from other processes not spawned from the same parent, which makes Queues unwieldy
The reasons for threads are that:
They access a shared state. Actually more or less a common in-memory database, so I suppose it could be done differently.
I must be able to have several connections accepted at the same time, but the actual threads are blocking for something most of the time. Each accepted connection spawns a new thread; this in order to not block all clients on I/O ops.
Regarding threads vs. processes, I use threads for making my blocking ops non-blocking and processes to enable multiprocessing.
Isnt that what select is for??
Only call accept on the socket if the select indicates it will not block...
The select has a timeout, so you can break out occasionally occasionally to check
if its time to shut down....
I thought I could avoid it, but it seems I have to do something like this:
from processing import connection
connection.Listener.fileno = lambda self: self._listener._socket.fileno()
import select
l = connection.Listener('/tmp/x', 'AF_UNIX')
r, w, e = select.select((l, ), (), ())
if l in r:
print "Accepting..."
c = l.accept()
# ...
I am aware that this breaks the law of demeter and introduces some evil monkey-patching, but it seems this would be the most easy-to-port way of accomplishing this. If anyone has a more elegant solution I would be happy to hear it :)
I'm new to the multiprocessing module, but it seems to me that mixing the processing module and the threading module is counter-intuitive, aren't they targetted at solving the same problem?
Anyway, how about wrapping your listen functions into a process itself? I'm not clear how this affects the rest of your code, but this may be a cleaner alternative.
from multiprocessing import Process
from multiprocessing.connection import Listener
class ListenForConn(Process):
def run(self):
listener = Listener('/tmp/asdf', 'AF_UNIX')
listener.accept()
# do your other handling here
listen_process = ListenForConn()
listen_process.start()
print listen_process.is_alive()
listen_process.terminate()
listen_process.join()
print listen_process.is_alive()
print 'No more listen process.'
Probably not ideal, but you can release the block by sending the socket some data from the signal handler or the thread that is terminating the process.
EDIT: Another way to implement this might be to use the Connection Queues, since they seem to support timeouts (apologies, I misread your code in my first read).
I ran into the same issue. I solved it by sending a "stop" command to the listener. In the listener's main thread (the one that processes the incoming messages), every time a new message is received, I just check to see if it's a "stop" command and exit out of the main thread.
Here's the code I'm using:
def start(self):
"""
Start listening
"""
# set the command being executed
self.command = self.COMMAND_RUN
# startup the 'listener_main' method as a daemon thread
self.listener = Listener(address=self.address, authkey=self.authkey)
self._thread = threading.Thread(target=self.listener_main, daemon=True)
self._thread.start()
def listener_main(self):
"""
The main application loop
"""
while self.command == self.COMMAND_RUN:
# block until a client connection is recieved
with self.listener.accept() as conn:
# receive the subscription request from the client
message = conn.recv()
# if it's a shut down command, return to stop this thread
if isinstance(message, str) and message == self.COMMAND_STOP:
return
# process the message
def stop(self):
"""
Stops the listening thread
"""
self.command = self.COMMAND_STOP
client = Client(self.address, authkey=self.authkey)
client.send(self.COMMAND_STOP)
client.close()
self._thread.join()
I'm using an authentication key to prevent would be hackers from shutting down my service by sending a stop command from an arbitrary client.
Mine isn't a perfect solution. It seems a better solution might be to revise the code in multiprocessing.connection.Listener, and add a stop() method. But, that would require sending it through the process for approval by the Python team.