Python's time.sleep - never waking up - python

I think this is going to be one of those simple-when-you-see-it problems, but it has got me baffled.
[STOP PRESS: I was right. Solution was found. See the answers.]
I am using Python's unittest framework to test a multi-threaded app. Nice and straight forward - I have 5 or so worker threads monitoring a common queue, and a single producer thread making work-items for them. The producer thread is being triggered by a test-case.
In this test, only one task is being put on the queue. The processing it does is in the test is just a stub for the real processing, so the worker thread does a 5 second-sleep to simulate the elapsed time before the task will really be done, and the thread will be ready to get another task.
To the snippet of code is:
logging.info("Sleep starting")
time.sleep(5)
logging.info("Waking up")
Now the weird part. I see the "Sleep starting" log message, but not the Waking up message. The program locks up and doesn't respond to Keyboard Interrupt (CTRL+C). CPU load is very low.
I see the same problem in Windows and Ubuntu (Python 2.6.2).
I have pondered if an exception is occurring and being hidden, so I add "print 1/0" between the first and second line - I see the Division By Zero error being raised. I move it to after the sleep, and I never see the message.
I figured "Okay, maybe the other thread is trying to log something very very large at the same time, and it is still buffering. What is it doing?"
Well, by this time, the test has returned to the unittest, where it is pausing waiting for the thread to get going before testing the system's state.
logging.info("Test sleep starting")
time.sleep(0.25)
logging.info("Test waking up")
Wow, that looks familiar. It is freezing in exactly the same way! The first log message is appearing, the second isn't.
I have recently done a significant rewrite of the unit so I can't claim "I didn't touch anything", but I can't see anything untoward in my changes.
Suspicious areas:
I am including using Threading.Lock (because I don't know how to reason about GIL's safety, so I stick to what I know. I see nothing "deadlocky" about my code.
I am new to Python's unittest framework. Is there something it does with redirecting logging or similar that might simulate these symptoms?
No, I haven't substituted a non-standard time module!
What would prevent a thread from waking up? What else have I missed?

Sigh.
Worker Thread #1 is sleeping, and waking up afterwards. It is then going to log the wake message, and is blocked. Only one thread can be logging at a time.
UnitTest Thread is sleeping, and waking up afterwards. It is then going to log the wake message, and is blocked. Only one thread can be logging at a time.
Worker-Thread-Not-Previously-Mentioned-In-The-Question #2 was quietly finishing the processing the PREVIOUS item in the queue, while the first Worker Thread was sleeping. It got to a log statement. One of the parameters was an object, and str() was implicitly called. The str() function on that object had a bug; it deadlocked when it accessed some of its data members. The deadlock occured while being processed by the logging function, thus keeping the logging thread-lock, and making it appear like the other threads never woke up.
The division by zero test didn't make a difference, because the result of it was an attempt to log.

On linux, try change I/O scheduler to Completely Fair Queuing (CFQ).
echo cfq > /sys/block/sda/queue/scheduler

Related

What is the best way to debug a python multiprocess script which fails to terminate?

I am writing a python script which uses multiprocessing, multithreading and zeromq for interprocess communication. It all works fine until the program finishes: at that time the child processes terminate properly (sigwait is intercepted and the child procs terminate which I have confirmed with the ps command) but the main process often does not shut down - occasionally it does, but most of the time it does not. I have confirmed that all remaining threads of the main process are daemonic and that the last row of the script is executed properly (it is a logging.info call). I am using fork for forking processes and can see that a Forkprocess still runs in addition to the main process.
What is the best way to debug this, considering that the script has actually finished ? Maybe add a pdb or breakpoint() right at the end ?
Thanks in advance.
Here is the output, after the last row the script usually does not terminate:
INFO root::remaining active child processes: [<ForkProcess name='SyncManager-1' pid=6362 parent=6361 started>]
INFO root::non-daemonic threads which are still running, preventing orderly shutdown: [].
INFO root::======== PID: 6361 main() end: shut down completed.=========
EDIT:
I refactored the code and noticed that it now misbehaves very rarely. I am 99.9% certain that it is due to an open zeromq REQ/REP 'socket' at the time of shutdown. The refactoring made sure that these sockets are only held open only for a very short time - but it is not predictable what sockets are open at shutdown so occasionally it still hangs.
I will write a simple testharness with two processes communicating via REQ/REP sockets then shut down the child process followed by main process. I expect same result, i.e., interpreter not shutting down. Lets see, keep you posted.
I think you could try viztracer. The good thing about viztracer is that it can display all the processes on the same timeline. Maybe you can catch what's stopping your main process/forked process from shutting down. If it's a deadlock it should be noticeable. However, without the code, I really can't tell if it would help for sure.

Python - Stopping a long running taskq's thread

I have a fairly simple program that each task added into the taskq is executing and computing something, say for 30 seconds. This task is 'not' running in some kind of while or for loop.
def run(self):
while not self.stopper.is_set():
DO_MY_30_SECONDS_WORK(self)
self.task_done()
Now, assuming i have a thread.event and this can check before/after the task is done. But is there a way to tell the already running thread to stop or exit it's execution.
There's no way to stop your running thread if DO_MY_30_SECONDS_WORK(self) is blocking. Well arguably you could set it as daemon thread and it'll be abruptly killed when your main program execution finishes, this would cause problems if the thread is actually holding resources (e.g. writing to a file) and is generally not a good idea to finish a thread.
What you could do is re-design DO_MY_30_SECONDS_WORK(self) and make it non-blocking, which means cutting the work into small pieces and make it check for the stop sign in a reasonable interval, so that your thread will be responsive enough to finish itself when you tell it to do so.

Run away multi-threading script that continues to run after canceled python

This is a two part question,
After I cancel my script it still continues run, what I'm doing is queering an exchange api and saving the data for various assets.
My parent script can be seen here you can see i'm testing it out with just 3 assets, a sample of one of the child scripts can be seen here.
After I cancel the script the script for BTC seems to still be running and new .json files are still being generated in it's respective folder. The only way to stop it is to delete the folder and create it again.
This is really a bonus, my code was working with two assets but now with the addition of another it seems to only take in data for BTC and not the other 2.
Your first problem is that you are not really creating worker threads.
t1 = Thread(target=BTC.main()) executes BTC.main() and uses its return code to try to start a thread. Since main loops forever, you don't start any other threads.
Once you fix that, you'll still have a problem.
In python, only the root thread sees signals such as ctrl-c. Other threads will continue executing no matter how hard you press the key. When python exits, it tries to join non-daemon threads and that can cause the program to hang. The main thread is waiting for a thread to terminate, but the thread is happily continuing with its execution.
You seem to be depending on this in your code. Your parent starts a bunch of threads (or will, when you fix the first bug) and then exits. Really, its waiting for the threads to exit. If you solve the problem with daemon threads (below), you'll also need to add code for your thread to wait and not exit.
Back to the thread problem...
One solution is to mark threads as "daemon" (do mythread.daemon = True before starting the thread). Python won't wait for those threads and the threads will be killed when the main thread exits. This is great if you don't care about what state the thread is in while terminating. But it can do bad things like leave partially written files laying around.
Another solution is to figure out some way for the main thread to interrupt the thread. Suppose the threads waits of socket traffic. You could close the socket and the thread would be woken by that event.
Another solution is to only run threads for short-lived tasks that you want to complete. Your ctrl-c gets delayed a bit but you eventually exit. You could even set them up to run off of a queue and send a special "kill" message to them when done. In fact, python thread pools are a good way to go.
Another solution is to have the thread check a Event to see if its time to exit.

Setting up idle thread/signalling thread

I'm using Python with wxPython for writing an app.
The method I'm considering to accomplish this may not be the best - if that's the case, let me know because I'm open to refactoring.
Right now, I have one GUI form. The main program start point instantiates an instance of the GUI form then runs wx.mainLoop(), which causes the app's main initial thread to block for the lifetime of the app.
We of course know that when events happen in the UI, the UI thread runs the code for them.
Now, I have another thread - a worker thread. This thread needs to sit idle, and then when something happens in the UI thread, e.g. a button is clicked, I want the worker thread to stop idling and do something else - run a function, say.
I can't envision this right now but I could see as the app gets more complex also having to signal the worker thread while it's actually busy doing something.
I have two questions about this setup:
How can I make my worker thread idle without using up CPU time? Doing something like while True: pass will suck CPU time, while something like while True: time.sleep(0.1) will not allow instantaneous reaction to events.
What's the best way to signal into the worker thread to do something? I don't want the UI thread to execute something, I want the worker thread to be signaled, by the UI thread, that it should change what it's doing. Ideally, I'd have some way for the worker thread to register a callback with the UI itself, so that when a button is clicked or any other UI Event happens, the worker thread is signalled to change what it's doing.
So, is this the best way to accomplish this? And what's the best way to do it?
Thanks!
First: Do you actually need a background thread to sit around idle in the first place?
On most platforms, starting a new thread is cheap. (Except on Windows and Linux, where it's supercheap.) So, why not just kick off a thread whenever you need it? (It's just as easy to keep around a list of threads as a single thread, right?)
Alternatively, why not just create a ThreadPoolExecutor, and just submit jobs to it, and let the executor worry about when they get run and on which thread. Any time you can just think in terms of "tasks that need to get run without blocking the main thread" instead of "worker threads that need to wait on work", you're making your life easier. Under the covers, there's still one or more worker threads waiting on a queue, or something equivalent, but that part's all been written (and debugged and optimized) for you. All you have to write are the tasks, which are just regular functions.
But, if you want to write explicit background threads, you can, so I'll explain that.
How can I make my worker thread idle without using up CPU time? … What's the best way to signal into the worker thread to do something?
The way to idle a thread until a value is ready is to wait on a synchronization object. On any modern OS, waiting on a synchronization object means the operating system stops giving you any CPU time until the object is ready for you.*
There are a variety of different options you can see in the Threading module docs, but the obvious one to use in most cases like this is a Condition. The way to signal the worker thread is then to notify the Condition.
However, often a Queue is a lot simpler. To wait on a Queue, just call its get method with block=True. To signal another thread to wake up, just put something on the Queue. (Under the covers, a Queue wraps up a list or deque or other collection, a Lock, and a Condition, so you just tell it what you want to do—check for a value, block until there's a value, add a value—instead of dealing with waiting and signaling and protecting the collection.)
See the answer to controlling UI elements in wxPython using threading for how to signal in both directions, from a worker thread to a UI thread and vice-versa.
I'd have some way for the worker thread to register a callback with the UI itself, so that when a button is clicked or any other UI Event happens, the worker thread is signalled to change what it's doing.
You can do it this way if you want. Just pass self.queue.put or def callback(value): self.value = value; self.condition.notify() or whatever as a callback, and the GUI thread doesn't even have to know that the callback is triggering another thread.
In fact, that's a pretty nice design that may make you very happy later, when you decide to move some code back and forth between inline and background-threaded, or move it off to a child process instead of a background thread, or whatever.
I can't envision this right now but I could see as the app gets more complex also having to signal the worker thread while it's actually busy doing something.
But what do you want to happen if it's busy?
If you just want to say "If you're idle, wake up and do this task; otherwise, hold onto it and do it whenever you're ready", that's exactly what a Queue, or an Executor, will do for you automatically.
If you want to say, "If you're idle, wake up, otherwise, don't worry about it", that's what a Condition or Event will do.
If you want to say, "If you're idle, wake up and do this, otherwise, cancel what you're doing and do this instead", that's a bit more complicated. You pretty much need to have the background thread periodically check an "interrupt_me" variable while it's busy (and put a Lock around it), and then you'll set that flag as well as notifying the Condition… although in some cases, you can merge the idle and busy cases into a single Condition or Event (by calling an infinite wait() when idle, and a quick-check wait(timeout=0) when busy).
* In some cases—e.g., a Linux futex or a Windows CriticalSection—it may actually spin off a little bit of CPU time in some cases, because that happens to be a good optimization. But the point is, you're not asking for any CPU time until you're ready to use it.

How to identify the cause in Python of code that is not interruptible with a CTRL +C

I am using requests to pull some files. I have noticed that the program seems to hang after some large number of iterations that varies from 5K to 20K. I can tell it is hanging because the folder where the results are stored has not changed in several hours. I have been trying to interrupt the process (I am using IDLE) by hitting CTRL + C to no avail. I would like to interrupt instead of killing the process because restart is easier. I have finally had to kill the process. I restart and it runs fine again until I have the same symptoms. I would like to figure out how to diagnose the problem but since I am having to kill everything I have no idea where to start.
Is there an alternate way to view what is going on or to more robustly interrupt the process?
I have been assuming that if I can interrupt without killing I can look at globals and or do some other mucking around to figure out where my code is hanging.
In case it's not too late: I've just faced the same problems and have some tips
First thing: In python most waiting apis are not interruptible (ie Thread.join(), Lock.acquire()...).
Have a look at theese pages for more informations:
http://snakesthatbite.blogspot.fr/2010/09/cpython-threading-interrupting.html
http://docs.python.org/2/library/thread.html
Then if a thread is waiting on such a call, it cannot be stopped.
There is another thing to know: if a normal thread is running (or hanged) the main program will stay indefinitely untill all threads are stopped or the process is killed.
To avoid that, you can make the thread a daemon thread: Thread.daemon=True before calling Thread.start().
Second thing, to find where your program is hanged, you can launch it with a debugger but I prefer logging because logs are always there in case its to late to debug.
Try logging before and after each waiting call to see how much time your threads have been hanged. To have high quality logs, uses python logging configured with file handler, html handler or even better with a syslog handler.

Categories

Resources