how to get around a websocket call blocking other websocket calls - python

I am attempting to perform a constant message from the websocket client to server(python/flask/socketio)
I have a submit button on a simple page that starts off a long running job.
$('form#emit').submit(function(event) {
socket.emit('submit', {...});
return false;
In the python code i kick off the long running job like so:
#socketio.on('submit', namespace='/namespace')
def long_running_function(message):
long_running_job_code(message)
What I would expect to happen is python to kick off the long_running_job_code and go back to executing the loop in the form of setInterval:
on client the 'loop' :
setInterval(function() { pinger() }, 1000);
function pinger()
{
socket.emit('ping','test');
}
on server:
#socketio.on('ping', namespace='/namespace')
def ping(message):
emit('my response', {'data': '.'})
Before the submit button is hit, the 'ping' function is placing .... on the screen but it does not continue to perform that function while long_running_job_code is executing.
I believe the issue is blocking on the server side, but I am not sure. The long running job has emits that are still getting to the client, but the ping emit stops while the long running job is going.
Anyone have an idea on how to get around this?
Thanks!

You do not mention this, but my guess is that you are using eventlet or gevent as the web server of your application, because that is what makes the most sense when working with WebSocket and Flask-SocketIO in particular.
Eventlet and gevent are coroutine servers. They can handle multi-tasking, but this is done cooperatively. That means that for a context switch from one task to another to occur, the first task must release the CPU. The CPU is automatically released transparently when certain I/O calls are made, like when reading or writing from a socket. You can also explicitly release the CPU by calling the sleep function. If a task goes off to do some long calculation without doing any I/O or explicitly releasing the CPU, then the whole thing is going to block.
You basically have two ways to keep the machinery going while you run your long function. One way is to regularly issue sleep calls. When a sleep call occurs, the scheduler will give the CPU to other task(s) before returning from the sleep. If your function has a loop, for example, you can add this on each iteration:
eventlet.sleep(0)
The other way to not block is to put the long function in a subprocess, which will probably require more changes that just adding sleeps here and there.
Hope this helps!

Related

Continue with for loop after certain ammount of time

How would you be able to move to the next iteration of a for loop if a given iteration takes more than a certain amount of time? The code should look something like this.
for i in range(0, max_iterations):
timer function
call to api
The timer function will serve the purpose of forcing the for loop to continue onto the next iteration if the api has not finished. It should work in 120 seconds for that iteration. How would the timer function be written? Thank you in advance!
This is only truly possible with a non-blocking API call or an API call with a timeout. For example, if you are using the socket library, you could use socket.setblocking(0)to make the socket API calls non-blocking.
In your case, you have said you are using the Yandex API. This appears to be JSON over https, so you may wish to try urllib2.urlopen(). This method accepts a timeout. This is even easier than using a non-blocking call as urlopen() will simply give up and return an error after the timeout has expired.
Using threads as suggested in some of the comments will give you a partial solution. Since there is no ability to stop a thread started with the threading module, all of the API calls you initiate that do not complete will stay in a blocked state for the life of the python interpreter and those threads will never exit.
If you do use the threading module to solve this problem, you should make all of the threads that run API calls daemon threads thread.setDaemon(True) so that when your main thread exits, the interpreter stops. Otherwise the interpreter will not exit until all of the API calls have completed and returned.

Is it possible to prevent python's http.client.HTTPResponse.read() from hanging when there is no data?

I'm using Python http.client.HTTPResponse.read() to read data from a stream. That is, the server keeps the connection open forever and sends data periodically as it becomes available. There is no expected length of response. In particular, I'm getting Tweets through the Twitter Streaming API.
To accomplish this, I repeatedly call http.client.HTTPResponse.read(1) to get the response, one byte at a time. The problem is that the program will hang on that line if there is no data to read, which there isn't for large periods of time (when no Tweets are coming in).
I'm looking for a method that will get a single byte of the HTTP response, if available, but that will fail instantly if there is no data to read.
I've read that you can set a timeout when the connection is created, but setting a timeout on the connection defeats the whole purpose of leaving it open for a long time waiting for data to come in. I don't want to set a timeout, I want to read data if there is data to be read, or fail if there is not, without waiting at all.
I'd like to do this with what I have now (using http.client), but if it's absolutely necessary that I use a different library to do this, then so be it. I'm trying to write this entirely myself, so suggesting that I use someone else's already-written Twitter API for Python is not what I'm looking for.
This code gets the response, it runs in a separate thread from the main one:
while True:
try:
readByte = dc.request.read(1)
except:
readByte = []
if len(byte) != 0:
dc.responseLock.acquire()
dc.response = dc.response + chr(byte[0])
dc.responseLock.release()
Note that the request is stored in dc.request and the response in dc.response, these are created elsewhere. dc.responseLock is a Lock that prevents dc.response from being accessed by multiple threads at once.
With this running on a separate thread, the main thread can then get dc.response, which contains the entire response received so far. New data is added to dc.response as it comes in without blocking the main thread.
This works perfectly when it's running, but I run into a problem when I want it to stop. I changed my while statement to while not dc.twitterAbort, so that when I want to abort this thread I just set dc.twitterAbort to True, and the thread will stop.
But it doesn't. This thread remains for a very long time afterward, stuck on the dc.request.read(1) part. There must be some sort of timeout, because it does eventually get back to the while statement and stop the thread, but it takes around 10 seconds for that to happen.
How can I get my thread to stop immediately when I want it to, if it's stuck on the call to read()?
Again, this method is working to get Tweets, the problem is only in getting it to stop. If I'm going about this entirely the wrong way, feel free to point me in the right direction. I'm new to Python, so I may be overlooking some easier way of going about this.
Your idea is not new, there are OS mechanisms(*) for making sure that an application is only calling I/O-related system calls when they are guaranteed to be not blocking . These mechanisms are usually used by async I/O frameworks, such as tornado or gevent. Use one of those, and you will find it very easy to run code "while" your application is waiting for an I/O event, such as waiting for incoming data on a socket.
If you use gevent's monkey-patching method, you can proceed using http.client, as requested. You just need to get used to the cooperative scheduling paradigm introduced by gevent/greenlets, in which your execution flow "jumps" between sub-routines.
Of course you can also perform blocking I/O in another thread (like you did), so that it does not affect the responsiveness of your main thread. Regarding your "How can I get my thread to stop immediately" problem:
Forcing a thread that's blocking in a system call to stop is usually not a clean or even valid process (also see Is there any way to kill a Thread in Python?). Either -- if your application has finished its jobs -- you take down the entire process, which also affects all contained threads, or you just leave the thread be and give it as much time to terminate as required (these 10 seconds you were referring to are not a problem -- are they?)
If you do not want to have such long-blocking system calls anywhere in your application (be it in the main thread or not), then use above-mentioned techniques to prevent blocking system calls.
(*) see e.g. O_NONBLOCK option in http://man7.org/linux/man-pages/man2/open.2.html

What happens when you have an infinite loop in Django view code?

Something that I just thought about:
Say I'm writing view code for my Django site, and I make a mistake and create an infinite loop.
Whenever someone would try to access the view, the worker assigned to the request (be it a Gevent worker or a Python thread) would stay in a loop indefinitely.
If I understand correctly, the server would send a timeout error to the client after 30 seconds. But what will happen with the Python worker? Will it keep on working indefinitely? That sounds dangerous!
Imagine I've got a server in which I've allocated 10 workers. I let it run and at some point, a client tries to access the view with the infinite loop. A worker will be assigned to it, and will be effectively dead until the next server restart. The dangerous thing is that at first I wouldn't notice it, because the site would just be imperceptibly slower, having 9 workers instead of 10. But then it might happen again and again throughout a long span of time, maybe months. The site would just get progressively slower, until eventually it would be really slow with just one worker.
A server restart would solve the problem, but I'd hate to have my site's functionality depend on server restarts.
Is this a real problem that happens? Is there a way to avoid it?
Update: I'd also really appreciate a way to take a stacktrace of the thread/worker that's stuck in an infinite loop, so I could have that emailed to me so I'll be aware of the problem. (I don't know how to do this because there is no exception being raised.)
Update to people saying things to the effect of "Avoid writing code that has infinite loops": In case it wasn't obvious, I do not spend my free time intentionally putting infinite loops into my code. When these things happen, they are mistakes, and mistakes can be minimized but never completely avoided. I want to know that even when I make a mistake, there'll be a safety net that will notify me and allow me to fix the problem.
It is a real problem. In case of gevent, due to context switching, it can even immediately stop your website from responding.
Everything depends on your environment. For example, when running django in production through uwsgi you can set harakiri - that is time in seconds, after which thread handling the request will be killed if it didn't finish handling the response. It is strongly recommended to set such a value in order to deal with some faulty requests or bad code. Such event is reported in uwsgi log. I believe other solutions for running Django in production have similar options.
Otherwise, due to network architecture, client disconnection will not stop the infinite loop, and by default there will be no response at all - just infinite loading. Various timeout options (one of which harakiri is) may end up showing connection timeout - for example, php has (as far as i remember) default timeout of 30 seconds and it will return 504 gateway timeout. Socket disconnection timeout depends on http server settings and it will not stop application thread, it will only close client socket.
If not using gevent (or any other green threads), infinite loop will tend to take up 100% of available CPU power (limited to one core), possibly eating up more and more memory, so your website will work pretty slow and/or timeout really quick. Django itself is not aware of request time, so - as mentioned before - your production environment stack is the way to prevent this from happening. In case of uwsgi, http://uwsgi-docs.readthedocs.org/en/latest/Options.html#harakiri-verbose is the way to go.
Harakiri does print stack trace of the killed proces: (https://uwsgi-docs.readthedocs.org/en/latest/Tracebacker.html?highlight=harakiri) straight to uwsgi log, and due to alarm system you can get notified through e-mail (http://uwsgi-docs.readthedocs.org/en/latest/AlarmSubsystem.html)
I just tested this on Django's development server.
Results:
Does not give a timeout after 30 seconds. (this might because its not a production server though)
Stays in loading until i close the page.
I guess one way to avoid it, without actually just avoiding a code like that, would be to use threading to have control of timeouts and be able to stop the thread.
Maybe something like:
import threading
from django.http import HttpResponse
class MyThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self):
print "your possible infinite loop code here"
def possible_loop_view(request):
thread = MyThread()
thread.start()
return HttpResponse("html response")
Yes, your analysis is correct. The worker thread/process will keep running. Moreover, if there is no wait/sleep in the loop, it will hog the CPU. Other threads/process will get very little cpu, resulting your entire site on slow response.
Also, I don't think server will send any timeout error to client explicitly. If the TCP timeout is set, TCP connection will be closed.
Client may also have some timeout setting to get response, which may come into picture.
Avoiding such code is best way to avoid such code. You can also have some monitoring tool on server to look for CPU/memory usage and notify for abnormal activity so that you can take action.

How to create a system of events through interrupts in python

Because of my zero knowledge about Python GUIs,
I need some help, to make a mechanism for,
Making requests through HTML,CSS or Ajax (node.js, Apache or nginx server) to a Python program to execute certain functions.
For example,
I have a python running a while True: loop, but at a given moment want perform an interrupt signal and sending data to execute a function a kind of events system.
First, I bind an event to program:
#program.bind(EVENT_NAME, EVENT_HANDLER)
program.bind(miaowcat, miaowfunc)
The program runs and any time an interrupt is performed, executing the function miaowfunct and passing the data of the event to *args
def miaowfunct(*args):
It's a prototype. So, args can be with numeric signals or other elements.
I don't know how to do this.
This kind of problem is what messaging systems are designed to solve.
You write some code that needs to be executed at a trigger (this is called a consumer).
Your code that needs to execute the function (called a producer), creates a message and sends it to a broker.
The broker takes your message and puts it on a queue.
The consumer is listening on this queue for messages, when it sees one, it will "wake up", run itself, and then go back to sleep.
For Python, the following are typically used:
Messaging Broker = RabbitMQ
Task Queue = Celery

Is calling QCoreApplications.processEvents() on a set interval safe?

I have a Qt application written in PySide (Qt Python binding). This application has a GUI thread and many different QThreads that are in charge of performing some heavy lifting - some rather long tasks. As such long task sometimes gets stuck (usually because it is waiting for a server response), the application sometimes freezes.
I was therefore wondering if it is safe to call QCoreApplication.processEvents() "manually" every second or so, so that the GUI event queue is cleared (processed)? Is that a good idea at all?
It's safe to call QCoreApplication.processEvents() whenever you like. The docs explicitly state your use case:
You can call this function occasionally when your program is busy
performing a long operation (e.g. copying a file).
There is no good reason though why threads would block the event loop in the main thread, though. (Unless your system really can't keep up.) So that's worth looking into anyway.
A couple of hints people might find useful:
A. You need to beware of the following:
Every so often the threads want to send stuff back to the main thread. So they post an event and call processEvents
If the code runs from the event also calls processEvents then instead of returning to the next statement, python can instead dispatch a worker thread again and that can then repeat this process.
The net result of this can be hundreds or thousands of nested processEvent statements which can then result in a recursion level exceeded error message.
Moral - if you are running a multi-threaded application do NOT call processEvents in any code initiated by a thread which runs in the main thread.
B. You need to be aware that CPython has a Global Interpreter Lock (GIL) that limits threads so that only one can run at any one time and the way that Python decides which threads to run is counter-intuitive. Running process events from a worker thread does not seem to do what it says on the can, and CPU time is not allocated to the main thread or to Python internal threads. I am still experimenting, but it seems that putting worker threads to sleep for a few miliseconds allows other threads to get a look in.

Categories

Resources