Termination of hanging threads in Python

Termination of hanging threads in Python - python

I am quite new to Python.
My problem:
I am starting a bunch of threads in parallel each one trying to create a session with one of a number of foreign hosts (to perform a few activities in those sessions later). Some of these hosts may be in an awkward state in which case the session creation fails eventually, however, that takes about 60 secs. If successful, the session is created immediately. Hence I want to terminate the respective session creation threads after a reasonable time (a few secs). However, I learned that the only way to stop a thread is to communicate an event status to the thread to observe - which makes no sense if it is stuck in an action (here: to establish the session). I use join() with a timeout - that speeds up the execution for all sessions successfully created; however, main() won't exit of course until all the temporarily stuck threads have returned. Is there really no way to cut this short?
Thanks for any useful advice.
Uwe

Related

pika connection times out during execution of long task (3+ minutes)

I have a process in which I need to assign long running tasks amongst a pool of workers, in python. So far I have been using RabbitMQ to queue the tasks (input is a nodejs frontend); a python worker subscribes to the queue, obtains a task and executes it. Each task takes several minutes minimum.
After an update this process started breaking, and I eventually discovered this was due to RabbitMQ version 3.6.10 having changed the way it handles timeouts. I now believe I need to rethink my method of assigning tasks, but I want to make sure I do it the right way.
Until now I only had one worker (the task is to control a sequence of actions in a VM - I couldn't afford a new Windows license for a while, so until recently I had no practical way of testing parallel task execution); I suspect if I'd had two before I would have noticed this sooner. The worker attaches to a VM using libvirt to control it. The way my code is written currently implies that I would run one instance of the script per VM that I wish to control.
I suspect that part of my problem is the use of BlockingConnection - I think I need a way for the worker to disconnect from the queue when it has received and validated a task (this part takes less than 1 sec), then reconnect once it has completed the actions, but I haven't figured out how to do this yet. Is this correct? If so, how should I do this, and if not, what should I do instead?
One other idea I've had is that instead of running a script per VM I could have a global control script that on receiving a task would spin off a thread which would handle the task. This would solve the problem of the connection timing out during task execution, but the timeout would just have moved to a different stage: I would potentially receive tasks while there were no idle VMs, and I would have to come up with a way to make the script await an available VM without breaking the RabbitMQ connection.
My current code can be seen here:
https://github.com/scherma/antfarm/blob/master/src/runmanager/runmanager.py#L342
Any thoughts folks?

celery and long running tasks

I just watch a youtube video where the presenter mentioned that one should design his/her celery to be short. Tasks running several minutes are bad.
Is this correct? What I do see is that I have some long running task, which takes say 10 minutes to finish. When these kind of task is scheduled frequently, the queue is swamped and no other tasks get scheduled. Is this the reason?
If so, what should be used for long running tasks?

Long running tasks aren't great but It's by no means appropriate to say they are bad. The best way to handle long running tasks is to create a queue for just those tasks and have them run on a separate worker then the short tasks.

The problem with long running tasks is that you have to wait for them when you're pushing a new software version on your server. If you don't wait, your task may run possibly incompatible code, especially if you pickled some complex object as a parameter (which is strongly discouraged).

As #user2097159 said its a good practice to keep the long running tasks in a dedicate queue. You should do that by routing using "settings.CELERY_ROUTES" more info here
If you could estimate how long a task can be running, I recommend to use soft_time_limit per task, you will be able to handle it.
There is a gist from a talk I gave here

Augment the basic Task definition to optionally treat the task instantiation as a generator, and check for TERM or soft timeout on every iteration through the generator. Generically inject a "state" dict kwarg into tasks that support it. If it's the first time the task is run, allocate a new one in results cache, otherwise look up the existing one from results cache.
In your task, figure out a good place to yield which results in short execution times. Update the state parameter as necessary.
When control returns to the master task class, check for TERM or soft timeout, and if there is one, save off the state object and respond to the signal.

Is it possible to prevent python's http.client.HTTPResponse.read() from hanging when there is no data?

I'm using Python http.client.HTTPResponse.read() to read data from a stream. That is, the server keeps the connection open forever and sends data periodically as it becomes available. There is no expected length of response. In particular, I'm getting Tweets through the Twitter Streaming API.
To accomplish this, I repeatedly call http.client.HTTPResponse.read(1) to get the response, one byte at a time. The problem is that the program will hang on that line if there is no data to read, which there isn't for large periods of time (when no Tweets are coming in).
I'm looking for a method that will get a single byte of the HTTP response, if available, but that will fail instantly if there is no data to read.
I've read that you can set a timeout when the connection is created, but setting a timeout on the connection defeats the whole purpose of leaving it open for a long time waiting for data to come in. I don't want to set a timeout, I want to read data if there is data to be read, or fail if there is not, without waiting at all.
I'd like to do this with what I have now (using http.client), but if it's absolutely necessary that I use a different library to do this, then so be it. I'm trying to write this entirely myself, so suggesting that I use someone else's already-written Twitter API for Python is not what I'm looking for.
This code gets the response, it runs in a separate thread from the main one:
while True:
try:
readByte = dc.request.read(1)
except:
readByte = []
if len(byte) != 0:
dc.responseLock.acquire()
dc.response = dc.response + chr(byte[0])
dc.responseLock.release()
Note that the request is stored in dc.request and the response in dc.response, these are created elsewhere. dc.responseLock is a Lock that prevents dc.response from being accessed by multiple threads at once.
With this running on a separate thread, the main thread can then get dc.response, which contains the entire response received so far. New data is added to dc.response as it comes in without blocking the main thread.
This works perfectly when it's running, but I run into a problem when I want it to stop. I changed my while statement to while not dc.twitterAbort, so that when I want to abort this thread I just set dc.twitterAbort to True, and the thread will stop.
But it doesn't. This thread remains for a very long time afterward, stuck on the dc.request.read(1) part. There must be some sort of timeout, because it does eventually get back to the while statement and stop the thread, but it takes around 10 seconds for that to happen.
How can I get my thread to stop immediately when I want it to, if it's stuck on the call to read()?
Again, this method is working to get Tweets, the problem is only in getting it to stop. If I'm going about this entirely the wrong way, feel free to point me in the right direction. I'm new to Python, so I may be overlooking some easier way of going about this.

Your idea is not new, there are OS mechanisms(*) for making sure that an application is only calling I/O-related system calls when they are guaranteed to be not blocking . These mechanisms are usually used by async I/O frameworks, such as tornado or gevent. Use one of those, and you will find it very easy to run code "while" your application is waiting for an I/O event, such as waiting for incoming data on a socket.
If you use gevent's monkey-patching method, you can proceed using http.client, as requested. You just need to get used to the cooperative scheduling paradigm introduced by gevent/greenlets, in which your execution flow "jumps" between sub-routines.
Of course you can also perform blocking I/O in another thread (like you did), so that it does not affect the responsiveness of your main thread. Regarding your "How can I get my thread to stop immediately" problem:
Forcing a thread that's blocking in a system call to stop is usually not a clean or even valid process (also see Is there any way to kill a Thread in Python?). Either -- if your application has finished its jobs -- you take down the entire process, which also affects all contained threads, or you just leave the thread be and give it as much time to terminate as required (these 10 seconds you were referring to are not a problem -- are they?)
If you do not want to have such long-blocking system calls anywhere in your application (be it in the main thread or not), then use above-mentioned techniques to prevent blocking system calls.
(*) see e.g. O_NONBLOCK option in http://man7.org/linux/man-pages/man2/open.2.html

Distributed server model

Lets say I have 100 servers each running a daemon - lets call it server - that server is responsible for spawning a thread for each user of this particular service (lets say 1000 threads per server). Every N seconds each thread does something and gets information for that particular user (this request/response model cannot be changed). The problem I a have is sometimes a thread hangs and stops doing something. I need some way to know that users data is stale, and needs to be refreshed.
The only idea I have is every 5N seconds have the thread update a MySQL record associated with that user (a last_scanned column in the users table), and another process that checks that table every 15N seconds, if the last_scanned column is not current, restart the thread.

The general way to handle this is to have the threads report their status back to the server daemon. If you haven't seen a status update within the last 5N seconds, then you kill the thread and start another.
You can keep track of the current active threads that you've spun up in a list, then just loop through them occasionally to determine state.
You of course should also fix the errors in your program that are causing threads to exit prematurely.
Premature exits and killing a thread could also leave your program in an unexpected, non-atomic state. You should probably also have the server daemon run a cleanup process that makes sure any items in your queue, or whatever you're using to determine the workload, get reset after a certain period of inactivity.

How to create a thread-safe singleton in python

I would like to hold running threads in my Django application. Since I cannot do so in the model or in the session, I thought of holding them in a singleton. I've been checking this out for a while and haven't really found a good how-to for this.
Does anyone know how to create a thread-safe singleton in python?
EDIT:
More specifically what I wand to do is I want to implement some kind of "anytime algorithm", i.e. when a user presses a button, a response returned and a new computation begins (a new thread). I want this thread to run until the user presses the button again, and then my app will return the best solution it managed to find. to do that, i need to save somewhere the thread object - i thought of storing them in the session, what apparently i cannot do.
The bottom line is - i have a FAT computation i want to do on the server side, in different threads, while the user is using my site.

Unless you have a very good reason - you should execute the long running threads in a different process altogether, and use Celery to execute them:
Celery is an open source asynchronous
task queue/job queue based on
distributed message passing. It is
focused on real-time operation, but
supports scheduling as well.
The execution units, called tasks, are
executed concurrently on one or more
worker nodes using multiprocessing,
Eventlet or gevent. Tasks can execute
asynchronously (in the background) or
synchronously (wait until ready).
Celery guide for djangonauts: http://django-celery.readthedocs.org/en/latest/getting-started/first-steps-with-django.html
For singletons and sharing data between tasks/threads, again, unless you have a good reason, you should use the db layer (aka, models) with some caution regarding db locks and refreshing stale data.
Update: regarding your use case, define a Computation model, with a status field. When a user starts a computation, an instance is created, and a task will start to run. The task will monitor the status field (check db once in a while). When a user clicks the button again, a view will change the status to user requested to stop, causing the task to terminate.

If you want asynchronous code in a web application then you're taking the wrong approach. You should run background tasks with a specialist task queue like Celery: http://celeryproject.org/
The biggest problem you have is web server architecture. Unless you go against the recommended Django web server configuration and use a worker thread MPM, you will have no way to track your thread handles between requests as each request typically occupies its own process. This is how Apache normally works: http://httpd.apache.org/docs/2.0/mod/prefork.html
EDIT:
In light of your edit I think you might learn more by creating a custom solution that does this:
Maintains start/stop state in the database
Create a new program that runs as a daemon
Periodically check the start/stop state and begin or end work from here
There's no need for multithreading here unless you need to create a new process for each user. If so, things get more complicated and using Celery will make your life much easier.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.