I need to send strings to a separate thread asynchronously (meaning no one knows when). I am using threading because my problem is IO bound.
My current best bet is set a flag and have the receiver thread poll that. When it detects a set flag it reads from some shared memory.
This sounds horribly complicated. Can't I send messages via a mailbox? Or something even better like implicit variable semantics?
The queue.Queue class is ideal for this. Your thread can either block waiting for input, or it can check whether anything new has arrived, and it is thread-safe.
Related
I've been looking around and see some answers saying to use globals, but doesn't seem thread safe. I have also tried using queues but that is apparently blocking, at least in how I did it. Can someone help/show an example on how to launch a thread from the main thread and communicate between one thread to another in a non-blocking thread safe way? Basically, the use case is that the threads will be looping and checking fairly constantly if there's something that needs to be done/changed and act accordingly. Thanks for the help
Python Queues are thread safe according to the documentation. I don't think it should be a problem to push and pop from a shared queue within threads. https://docs.python.org/3/library/queue.html
After looking at the subprocess async documentation I'm left wondering how anyone would run something equivalent to await process.stdout.read(NUMBER_OF_BYTES_TO_READ). The usage of the previous code snippet is advised against right in the documentation, they suggest to use the communicate method, and from what I can tell there is no way of indicating the number of bytes that need to be read with communicate().
What am I missing?
How would I tell communicate to return after reading a certain number of bytes?
Edit - I'm creating my subprocess with async pipes, I am trying to use the pipe asynchronously.
Short answer: stdout.read is blocking.
When there is enough bytes to read, it will return. This is a very happy and unlikely occasion. More likely, there will be little or no bytes to return, so it will wait. Locking the process.
The pipe can be created to be non-blocking, but that behavior is system-specific and fickle in my experience.
The "right" way to use stdout.read is to either be ready in the reading process to be blocked on this operation, possibly indefinitely; or use an external thread to read and push data to a shared buffer. Main thread can then decide to either pull or await on the buffer, retaining control.
In practical terms, and I wrote code like this several times, there will be a listening thread attached to a pipe, reading it until close or a signal from main thread to die. Reader and Main thread will use Queue.Queue to communicate, which is trivial to use in this scenario -- it's thread safe.
So, stdout.read comes with so many caveats, that nobody in the right mind would advise anyone to use it.
I've tried lately to write my own Socket-Server in python.
While i was writing a thread to handle server commands (sort of command line in the server), I've tried to implement a code that will restart the server when the raw_input() receives specific command.
Basically, i want to restart the server as soon as the "Running" variable changes its state from True to False, and when it does, i would like to stop the function (The function that called the thread) from running (get back to main function) and then run it again. Is there a way to do it?
Thank you very much, and i hope i was clear about my problem,
Idan :)
Communication between threads can be done with Events, Queues, Semaphores, etc. Check them out and choose the one, that fits your problem best.
You can't abort a thread, or raise an exception into it asynchronously, in Python.
The standard Unix solution to this problem is to use a non-blocking socket, create a pipe with pipe, replace all your blocking sock.recv calls with a blocking r, _, _ = select.select([sock, pipe], [], []), and then the other thread can write to the pipe to wake up the other thread.
To make this portable to Windows you'll need to create a UDP localhost socket instead of a pipe, which makes things slightly more complicated, but it's still not hard.
Or, of course, you can use a higher-level framework, like asyncio in 3.4+, or twisted or another third-party lib, which will wrap this up for you. (Most of them are already running the equivalent of a loop around select to service lots of clients in one thread or a small thread pool, so it's trivial to toss in a stop pipe.)
Are there other alternatives? Yes, but all less portable and less good in a variety of other ways.
Most platforms have a way to asynchronously kill or signal another thread, which you can access via, e.g., ctypes. But this is a bad idea, because it will prevent Python from doing any normal cleanup. Even if you don't get a segfault, this could mean files never get flushed and end up with incomplete/garbage data, locks are left acquired to deadlock your program somewhere completely unrelated a short time later, memory gets leaked, etc.
If you're specifically trying to interrupt the main thread, and you only care about CPython on Unix, you can use a signal handler and the kill function. The signal will take effect on the next Python bytecode, and if the interpreter is blocked on any kind of I/O (or most other syscalls, e.g., inside a sleep), the system will return to the interpreter with an EINTR, allowing it to interrupt immediately. If the interpreter is blocked on something else, like a call to a C library that blocks signals or just does nothing but CPU work for 30 seconds, then you'll have to wait 30 seconds (although that doesn't come up that often, and you should know if it will in your case). Also, threads and signals don't play nice on some older *nix platforms. And signals don't work the same way on Windows, or in some other Python implementations like Jython.
On some platforms (including Windows--but not most modern *nix plafforms), you can wake up a blocking socket call just by closing the socket out from under the waiting thread. On other platforms, this will not unblock the thread, or will do it sometimes but not other times (and theoretically it could even segfault your program or leave the socket library in an unusable state, although I don't think either of those will happen on any modern platform).
As far as I understand the documentation, and some experiments I've over the last weeks, there is no way to really force another thread to 'stop' or 'abort'. Unless the function is aware of the possibility of being stopped and has a foolproof method of avoiding getting stuck in some of the I/O functions. Then you can use some communication method such as semaphores. The only exception is the specialized Timer function, which has a Cancel method.
So, if you really want to stop the server thread forcefully, you might want to think about running it in a separate process, not a thread.
EDIT: I'm not sure why you want to restart the server - I just thought it was in case of a failure. Normal procedure in a server is to loop waiting for connections on the socket, and when a connection appears, attend it and return to that loop.
A better way, is to use the GIO library (part of glib), and connect methods to the connection event, to attend the connection even asynchronously. This avoids the loop completely. I don't have any real code for this in Python, but here's an example of a client in Python (which uses GIO for reception events) and a server in C, which uses GIO for connections.
Use of GIO makes life so much easier...
I'm using Python http.client.HTTPResponse.read() to read data from a stream. That is, the server keeps the connection open forever and sends data periodically as it becomes available. There is no expected length of response. In particular, I'm getting Tweets through the Twitter Streaming API.
To accomplish this, I repeatedly call http.client.HTTPResponse.read(1) to get the response, one byte at a time. The problem is that the program will hang on that line if there is no data to read, which there isn't for large periods of time (when no Tweets are coming in).
I'm looking for a method that will get a single byte of the HTTP response, if available, but that will fail instantly if there is no data to read.
I've read that you can set a timeout when the connection is created, but setting a timeout on the connection defeats the whole purpose of leaving it open for a long time waiting for data to come in. I don't want to set a timeout, I want to read data if there is data to be read, or fail if there is not, without waiting at all.
I'd like to do this with what I have now (using http.client), but if it's absolutely necessary that I use a different library to do this, then so be it. I'm trying to write this entirely myself, so suggesting that I use someone else's already-written Twitter API for Python is not what I'm looking for.
This code gets the response, it runs in a separate thread from the main one:
while True:
try:
readByte = dc.request.read(1)
except:
readByte = []
if len(byte) != 0:
dc.responseLock.acquire()
dc.response = dc.response + chr(byte[0])
dc.responseLock.release()
Note that the request is stored in dc.request and the response in dc.response, these are created elsewhere. dc.responseLock is a Lock that prevents dc.response from being accessed by multiple threads at once.
With this running on a separate thread, the main thread can then get dc.response, which contains the entire response received so far. New data is added to dc.response as it comes in without blocking the main thread.
This works perfectly when it's running, but I run into a problem when I want it to stop. I changed my while statement to while not dc.twitterAbort, so that when I want to abort this thread I just set dc.twitterAbort to True, and the thread will stop.
But it doesn't. This thread remains for a very long time afterward, stuck on the dc.request.read(1) part. There must be some sort of timeout, because it does eventually get back to the while statement and stop the thread, but it takes around 10 seconds for that to happen.
How can I get my thread to stop immediately when I want it to, if it's stuck on the call to read()?
Again, this method is working to get Tweets, the problem is only in getting it to stop. If I'm going about this entirely the wrong way, feel free to point me in the right direction. I'm new to Python, so I may be overlooking some easier way of going about this.
Your idea is not new, there are OS mechanisms(*) for making sure that an application is only calling I/O-related system calls when they are guaranteed to be not blocking . These mechanisms are usually used by async I/O frameworks, such as tornado or gevent. Use one of those, and you will find it very easy to run code "while" your application is waiting for an I/O event, such as waiting for incoming data on a socket.
If you use gevent's monkey-patching method, you can proceed using http.client, as requested. You just need to get used to the cooperative scheduling paradigm introduced by gevent/greenlets, in which your execution flow "jumps" between sub-routines.
Of course you can also perform blocking I/O in another thread (like you did), so that it does not affect the responsiveness of your main thread. Regarding your "How can I get my thread to stop immediately" problem:
Forcing a thread that's blocking in a system call to stop is usually not a clean or even valid process (also see Is there any way to kill a Thread in Python?). Either -- if your application has finished its jobs -- you take down the entire process, which also affects all contained threads, or you just leave the thread be and give it as much time to terminate as required (these 10 seconds you were referring to are not a problem -- are they?)
If you do not want to have such long-blocking system calls anywhere in your application (be it in the main thread or not), then use above-mentioned techniques to prevent blocking system calls.
(*) see e.g. O_NONBLOCK option in http://man7.org/linux/man-pages/man2/open.2.html
I'm quite new to python threading/network programming, but have an assignment involving both of the above.
One of the requirements of the assignment is that for each new request, I spawn a new thread, but I need to both send and receive at the same time to the browser.
I'm currently using the asyncore library in Python to catch each request, but as I said, I need to spawn a thread for each request, and I was wondering if using both the thread and the asynchronous is overkill, or the correct way to do it?
Any advice would be appreciated.
Thanks
EDIT:
I'm writing a Proxy Server, and not sure if my client is persistent. My client is my browser (using firefox for simplicity)
It seems to reconnect for each request. My problem is that if I open a tab with http://www.google.com in it, and http://www.stackoverflow.com in it, I only get one request at a time from each tab, instead of multiple requests from google, and from SO.
I answered a question that sounds amazingly similar to your, where someone had a homework assignment to create a client server setup, with each connection being handled in a new thread: https://stackoverflow.com/a/9522339/496445
The general idea is that you have a main server loop constantly looking for a new connection to come in. When it does, you hand it off to a thread which will then do its own monitoring for new communication.
An extra bit about asyncore vs threading
From the asyncore docs:
There are only two ways to have a program on a single processor do
“more than one thing at a time.” Multi-threaded programming is the
simplest and most popular way to do it, but there is another very
different technique, that lets you have nearly all the advantages of
multi-threading, without actually using multiple threads. It’s really
only practical if your program is largely I/O bound. If your program
is processor bound, then pre-emptive scheduled threads are probably
what you really need. Network servers are rarely processor bound,
however.
As this quote suggests, using asyncore and threading should be for the most part mutually exclusive options. My link above is an example of the threading approach, where the server loop (either in a separate thread or the main one) does a blocking call to accept a new client. And when it gets one, it spawns a thread which will then continue to handle the communication, and the server goes back into a blocking call again.
In the pattern of using asyncore, you would instead use its async loop which will in turn call your own registered callbacks for various activity that occurs. There is no threading here, but rather a polling of all the open file handles for activity. You get the sense of doing things all concurrently, but under the hood it is scheduling everything serially.