terminate Python multithreaded program with log output - python

Issues
I currently have a simple Python multithreaded server program, which will run forever with out manual interruption. I want to achieve that it can be terminated gracefully at some point. Once it is terminated, I want the server to output some stats.
Solutions I have tried
Terminate the program by kill. The issue is that the server cannot output the stats because the HARD termination.
Create a control thread in the program, which listens the key input. And if key is pressed, then terminate the program and get stats. The issue with this approach is I need to do every step manually. E.g, SSH to the device, start the program, and press key at some point.
Question
Is there a way that I can run some bash/or other program to stop the program gracefully with stats output?

Have you tried to use signal.signal() to register a handler for e.g. SIGTERM? There you could implement this part of code that throws out the statistics and then just terminate the program.

The standard approach is to either
make threads sufficiently short-lived
at the stop signal, stop spawning new ones and .join() the active ones.
or
make threads periodically (e.g. after serving each request) check some shared stop flag and quit when it's set
at the stop signal, set the stop flag, then .join() the threads
Some threads can be .setDaemon(True), but only if they can be safely killed off (there's no exception or anything raised in the thread, it's just stopped where it is).
If a thread is in a blocking call, it may be possible to unblock it by shutting down the facility that it is waiting on (close the socket or the stream).

Related

How in Python can I handle a SIGTERM only after the program has exited a critical section?

A Python 2.7 program called 'eventcollector' runs continuously and polls a webservice for events. It then appends each event as a JSON object to the end of a file - /var/log/eventsexample.json. An Agent follows the file and sends the events up to cloud based software called 'anycloud' that processes the events.
I need to make eventcollector a well behaved UNIX daemon and then make that daemon a service in systemd. The systemd .service unit I will create for this purpose will let systemd know that when stopping this service it must wait 15 seconds after sending SIGTERM before sending SIGKILL. This will give eventcollector time to save state and close the files it is writing (its own log file and the event file). awill be configured to I must now make this program more resilient. The program must be able to save its state so that when it is terminated and restarted, the program knows where it left off.
Eventcollector has no visibility into anycloud. it can only see events in the source service. If Eventcollector dies becuase of a restart, it must reliabilty know what its new start_time is to query the source service for events. Therefore finishing the critical business of writing events to the file before exiting and saving state is critical.
My question is specifically about how to handle the SIGTERM such that the program has time to finish what it is doing and then save its state.
My concern however, is that unless I write state after every message I write to the file (this would consume more resources than seems necessary), I cannot be sure my program won't be terminated without saving state in time. The impact of this would be duplicate messages, and duplicate messages are not acceptable.
If I must take the performance hit, I will, but I would prefer to have a way to handle a SIGTERM gracefully such that the program can smartly do the following for example (simplified pseudocode excerpt):
while true:
response = query the webservice using method returning
a list of 100 dictionaries (events)
for i in response.data:
event = json.dumps(i)
outputfile.write(i) #< Receive SIGTERM during 2nd event, but do not
exit until the for loop is done. (how?)
signal handler:
pickle an object with the current state.
The idea is that even if the SIGTERM were received while the 2nd event is being written, the program would wait until it had written the 100th event before deciding it is safe to handle the SIGTERM.
I read in https://docs.python.org/2/library/signal.html:
There is no way to “block” signals temporarily from critical sections (since this is not supported by all Unix flavors).
One idea I had seemed too complex, and it seemed to me that there must be an easier way. the Idea was:
a main thread has a signal handler responsible for handling SIGTERM.
The main thread can communicate with a worker thread through a novel protocol so that the worker thread tells the main thread when it is entering a critical section.
When the main thread receives the SIGTERM, it waits until the worker thread tells the main thread it is out of its critical section. The main thread then tells it to save state and shutdown.
When the worker thread finishes, it tells the main thread it is done. The main thread then exits cleanly and returns zero status.
Supplimental
I'm considering using python-daemon which I understand to be Ben Finney's reference implementation of the PEP he wrote [PEP 3143](https://www.python.org/dev/peps/pep-3143/>. I understand based on what he has written and also on what I have seen from my experiences with UNIX and UNIXlike OSes that what constitutes "good behavior" on the part of a daemon is not agreed upon. I mention this because, I do agree with PEP 3143, and would like to implement this, however it does not answer my current question about how to deal with signals as I would like to do.
your daemon was in python 2.7
and python is not so convenient to use when making syscalls, bad for /dev/shm , semaphores
and i do not sure about side effects and caveats in using global variables in python
file lock is fragile and file system IO is bad for signal handlers
so i do not have a perfect answer , only ideas.
here was my idea when i was implementing a small daemon in C
main thread setup a synchronization point , for a C program , /dev/shm , semaphore, global variable , file lock were things that i have considered , and i chose /dev/shm in the end
setup the signal handler , on receiving SIGTERM, raise the synchronization flag by changing the value stored in /dev/shm
in every worker threads, check /dev/shm for synchronization flag after one portion of jobs , exit itself if the flag was raised
in main thread , set up a harvesting thread that try to harvest every other worker threads, if it succeed in harvesting , go on to exit the daemon itself.

Clean up a thread without .join() and without blocking the main thread

I am in a situation where I have two endpoints I can ask for a value, and one may be faster than the other. The calls to the endpoints are blocking. I want to wait for one to complete and take that result without waiting for the other to complete.
My solution was to issue the requests in separate threads and have those threads set a flag to true when they complete. In the main thread, I continuously check the flags (I know it is a busy wait, but that is not my primary concern right now) and when one completes it takes that value and returns it as the result.
The issue I have is that I never clean up the other thread. I can't find any way to do it without using .join(), which would just block and defeat the purpose of this whole thing. So, how can I clean up that other, slower thread that is blocking without joining it from the main thread?
What you want is to make your threads daemons, so when you get the result and finish your main, the other running thread will be forced to finish. You do that by changing the daemon keyword to True:
tr = threading.Thread(daemon=True)
From the threading docs:
The significance of this flag is that the entire Python program exits
when only daemon threads are left.
Although:
Daemon threads are abruptly stopped at shutdown. Their resources (such
as open files, database transactions, etc.) may not be released
properly. If you want your threads to stop gracefully, make them
non-daemonic and use a suitable signalling mechanism such as an Event.
I don't have any particular experience with Events so can't elaborate on that. Feel free to click the link and read on.
One bad and dirty solution is to implement a methode for the threads which close the socket which is blocking. Now you have to catch the exception in the main thread.

Python3 Non-blocking input or killing threads

Reading through posts of similar questions I strongly suspect there is no way to do what I'm trying to do but figured I'd ask. I have a program using python3 that is designed to run headless, receiving commands from remote users that have logged in. One of the commands of course is a shutdown so that the program can be ended cleanly. This section is working correctly.
However while working on this I realized an option to be able to enter commands directly, without a remote connection, would be useful in the event something unusual happened to prevent remote access. I added a local_control function that runs in it's own thread so that it doesn't interfere with the main loop. This works great for all commands except for the shutdown command.
I have a variable that both loops monitor so that they can end when the shutdown command is sent. Sending the shutdown command from within local_control works fine because the loop ends before getting back to input(). however when sending the shutdown command remotely the program doesn't end until someone presses the enter key locally because that loop remains stuck at input(). As soon as enter is pressed the program continues, successfully breaks the loop and continues with the shutdown as normal. Below is an example of my code.
import threading
self.runserver = True
def local_control(): #system to control server without remote access
while self.runserver:
raw_input = input()
if raw_input == "shutdown":
self.runserver = False
mythread = threading.Thread(target=local_control)
mythread.start()
while self.runserver:
some_input = get_remote_input() #getting command from remote user
if some_input == "shutdown":
self.runserver = False
sys.exit(0) #server is shutdown cleanly
Because the program runs primarily headless GUI options such as pygame aren't an option. Other solutions I've found online involve libraries that are not cross-platform such as msvcrt, termios, and curses. Although it's not as clean an option I'd settle for simply killing the thread to end it if I could however there is no way to do that as well. So is there a cross-platform, non-GUI option to have a non-blocking input? Or is there another way to break a blocked loop from another thread?
Your network-IO thread is blocking the processing of commands while waiting for remote commands, so it will only evaluate the state of runserver after get_remote_input() returns (and it's command is processed).
You will need three threads:
One which loops in local_control(), sending commands to the processing thread.
One which loops on get_remote_input(), also sending commands to the processing thread.
A processing thread (possibly the main thread).
A queue will probably be helpful here, since you need to avoid the race condition caused by unsynchronized access as currently present with regards to runserver.
Not a portable solution, but in *nix, you might be able send yourself an interrupt signal from the local_control function to break the blocking input(). You'll need the pthread ID (pthread_self and save it somewhere readable from local_control) for the network control thread so you can call pthread_kill.

Run away multi-threading script that continues to run after canceled python

This is a two part question,
After I cancel my script it still continues run, what I'm doing is queering an exchange api and saving the data for various assets.
My parent script can be seen here you can see i'm testing it out with just 3 assets, a sample of one of the child scripts can be seen here.
After I cancel the script the script for BTC seems to still be running and new .json files are still being generated in it's respective folder. The only way to stop it is to delete the folder and create it again.
This is really a bonus, my code was working with two assets but now with the addition of another it seems to only take in data for BTC and not the other 2.
Your first problem is that you are not really creating worker threads.
t1 = Thread(target=BTC.main()) executes BTC.main() and uses its return code to try to start a thread. Since main loops forever, you don't start any other threads.
Once you fix that, you'll still have a problem.
In python, only the root thread sees signals such as ctrl-c. Other threads will continue executing no matter how hard you press the key. When python exits, it tries to join non-daemon threads and that can cause the program to hang. The main thread is waiting for a thread to terminate, but the thread is happily continuing with its execution.
You seem to be depending on this in your code. Your parent starts a bunch of threads (or will, when you fix the first bug) and then exits. Really, its waiting for the threads to exit. If you solve the problem with daemon threads (below), you'll also need to add code for your thread to wait and not exit.
Back to the thread problem...
One solution is to mark threads as "daemon" (do mythread.daemon = True before starting the thread). Python won't wait for those threads and the threads will be killed when the main thread exits. This is great if you don't care about what state the thread is in while terminating. But it can do bad things like leave partially written files laying around.
Another solution is to figure out some way for the main thread to interrupt the thread. Suppose the threads waits of socket traffic. You could close the socket and the thread would be woken by that event.
Another solution is to only run threads for short-lived tasks that you want to complete. Your ctrl-c gets delayed a bit but you eventually exit. You could even set them up to run off of a queue and send a special "kill" message to them when done. In fact, python thread pools are a good way to go.
Another solution is to have the thread check a Event to see if its time to exit.

Procedure to exit upon keyboard interrupt for python script running multiple threads

I have a script which runs 2 threads infinitely. (Each thread is an infinite while loop) Whenever I run it normally, I use ctrl + Z or ctrl + C to stop its execution (depending on the OS). But ever since I added it to the /etc/rc.local file in Linux, for automatic startup upon boot, I am unable to use these commands to forcefully exit.
This has forced me to include something in the python script itself to cleanly exit when I type a certain key. How do I do so?
The problem is that I'm running a multithreaded application, which runs continuously and does not wait for any user inputs.
I added this to the start of a loop in my thread-
ip = raw_input()
if ip == 'quit':
quit()
But this will NOT work since it blocks for a user input, and stops the script. I don't want the script to be affected at all by this. I just want it to respond when I want to stop it. My question is not what command to use (which is explained here- Python exit commands - why so many and when should each be used?), but how I should use it without affecting the flow of my program.
Keep the code that handles the KeyboardInterrupt and send it an INT signal to stop the program: kill -INT $pid from the shell, where $pid is the process ID (PID) of the program. That's essentially the same as pressing CTRL+C in a shell where the program runs in the foreground.
Writing the program's PID into a file right after it started, either from within the program itself or from the code which started it asynchronously, makes it easier to send a signal later, without the need to search for the process in the process list.
One way is to have the threads examine a global variable as a part of their loop, and terminate (break out of the loop and terminate, that is) when the variable is set.
The main thread can then simply set the variable and join() all existing threads before terminating. You should be aware that if the individual threads are blocked waiting for some event to occur before they next check whether the global variable has been set, then they will hang anyway until that event occurs.

Categories

Resources