Lets say I have 100 servers each running a daemon - lets call it server - that server is responsible for spawning a thread for each user of this particular service (lets say 1000 threads per server). Every N seconds each thread does something and gets information for that particular user (this request/response model cannot be changed). The problem I a have is sometimes a thread hangs and stops doing something. I need some way to know that users data is stale, and needs to be refreshed.
The only idea I have is every 5N seconds have the thread update a MySQL record associated with that user (a last_scanned column in the users table), and another process that checks that table every 15N seconds, if the last_scanned column is not current, restart the thread.
The general way to handle this is to have the threads report their status back to the server daemon. If you haven't seen a status update within the last 5N seconds, then you kill the thread and start another.
You can keep track of the current active threads that you've spun up in a list, then just loop through them occasionally to determine state.
You of course should also fix the errors in your program that are causing threads to exit prematurely.
Premature exits and killing a thread could also leave your program in an unexpected, non-atomic state. You should probably also have the server daemon run a cleanup process that makes sure any items in your queue, or whatever you're using to determine the workload, get reset after a certain period of inactivity.
Related
I am quite new to Python.
My problem:
I am starting a bunch of threads in parallel each one trying to create a session with one of a number of foreign hosts (to perform a few activities in those sessions later). Some of these hosts may be in an awkward state in which case the session creation fails eventually, however, that takes about 60 secs. If successful, the session is created immediately. Hence I want to terminate the respective session creation threads after a reasonable time (a few secs). However, I learned that the only way to stop a thread is to communicate an event status to the thread to observe - which makes no sense if it is stuck in an action (here: to establish the session). I use join() with a timeout - that speeds up the execution for all sessions successfully created; however, main() won't exit of course until all the temporarily stuck threads have returned. Is there really no way to cut this short?
Thanks for any useful advice.
Uwe
I have a system which includes MySQL as database and RabbitMQ for organizing asynchronous data processing.
There are two processes (in two different containers) that work with the same record. First one updates record status in db transaction and sends message to rabbit queue. Second process fetches record from db and does some job. The problem is that the second process can read the message from the queue before the first process completes the update of the record.
Currently, in order to avoid this problem, the second process checks the status of the record, if it does not correspond to the target value, then the process waits for an update by re-sending it to the same queue.
This behavior occurs due to the fact that sending to the queue is performed within the transaction context. If I move the sending to the queue outside the transaction, it is possible that an error will occur or the process will be interrupted after the db transaction completed, the status in the database will change, but the message will not be sent to the queue, and the second process will not process this record.
What can you suggest to solve this architectural problem?
Issues
I currently have a simple Python multithreaded server program, which will run forever with out manual interruption. I want to achieve that it can be terminated gracefully at some point. Once it is terminated, I want the server to output some stats.
Solutions I have tried
Terminate the program by kill. The issue is that the server cannot output the stats because the HARD termination.
Create a control thread in the program, which listens the key input. And if key is pressed, then terminate the program and get stats. The issue with this approach is I need to do every step manually. E.g, SSH to the device, start the program, and press key at some point.
Question
Is there a way that I can run some bash/or other program to stop the program gracefully with stats output?
Have you tried to use signal.signal() to register a handler for e.g. SIGTERM? There you could implement this part of code that throws out the statistics and then just terminate the program.
The standard approach is to either
make threads sufficiently short-lived
at the stop signal, stop spawning new ones and .join() the active ones.
or
make threads periodically (e.g. after serving each request) check some shared stop flag and quit when it's set
at the stop signal, set the stop flag, then .join() the threads
Some threads can be .setDaemon(True), but only if they can be safely killed off (there's no exception or anything raised in the thread, it's just stopped where it is).
If a thread is in a blocking call, it may be possible to unblock it by shutting down the facility that it is waiting on (close the socket or the stream).
I have a Django application (API) running in production served by uWSGI, which has 8 processes (workers) running. To monitor them I use uwsgitop. Every day from time to time one worker falls into the BUSY state and stays for like five minutes and consumes all of the memory and kills the whole instance. The problem is, I do not know how to debug what the worker is doing at the particular moment or what function is it executing. Is there a fast and a proper way to find out the function and the request that it is handling?
One can send signal SIGUSR2 to a uwsgi worker, and the current request is printed into the log file, along with a native (sadly not Python) backtrace.
This is a two part question,
After I cancel my script it still continues run, what I'm doing is queering an exchange api and saving the data for various assets.
My parent script can be seen here you can see i'm testing it out with just 3 assets, a sample of one of the child scripts can be seen here.
After I cancel the script the script for BTC seems to still be running and new .json files are still being generated in it's respective folder. The only way to stop it is to delete the folder and create it again.
This is really a bonus, my code was working with two assets but now with the addition of another it seems to only take in data for BTC and not the other 2.
Your first problem is that you are not really creating worker threads.
t1 = Thread(target=BTC.main()) executes BTC.main() and uses its return code to try to start a thread. Since main loops forever, you don't start any other threads.
Once you fix that, you'll still have a problem.
In python, only the root thread sees signals such as ctrl-c. Other threads will continue executing no matter how hard you press the key. When python exits, it tries to join non-daemon threads and that can cause the program to hang. The main thread is waiting for a thread to terminate, but the thread is happily continuing with its execution.
You seem to be depending on this in your code. Your parent starts a bunch of threads (or will, when you fix the first bug) and then exits. Really, its waiting for the threads to exit. If you solve the problem with daemon threads (below), you'll also need to add code for your thread to wait and not exit.
Back to the thread problem...
One solution is to mark threads as "daemon" (do mythread.daemon = True before starting the thread). Python won't wait for those threads and the threads will be killed when the main thread exits. This is great if you don't care about what state the thread is in while terminating. But it can do bad things like leave partially written files laying around.
Another solution is to figure out some way for the main thread to interrupt the thread. Suppose the threads waits of socket traffic. You could close the socket and the thread would be woken by that event.
Another solution is to only run threads for short-lived tasks that you want to complete. Your ctrl-c gets delayed a bit but you eventually exit. You could even set them up to run off of a queue and send a special "kill" message to them when done. In fact, python thread pools are a good way to go.
Another solution is to have the thread check a Event to see if its time to exit.