At first I thought I had some kind of memory leak causing an issue but I'm getting an exception I'm not sure I fully understand but at least I've narrowed it down now.
I'm using a while True loop to keep a thread running and retrieving data. If it runs into a problem it logs it and keeps running. It seems to work fine at first - at least the first time and then it constantly logs a Threading Exception.
I narrowed it down to this section:
while True:
yada yada yada...
#Works fine to this part
pool = ThreadPool(processes=1)
async_result = pool.apply_async(SpawnPhantomJS, (dcap, service_args))
Driver = async_result.get(10)
Driver.set_window_size(1024, 768) # optional
Driver.set_page_load_timeout(30)
I do this because there's an issue spawning a lot of selenium webdrivers it times out eventually (no exception - just hangs there) and using this gave it a timeout so if it couldn't spawn in 10 the exception would catch it and go again. Seemed like a great fix. But I think it's causing problems in a loop.
It works fine to start with but then throws the same exception on every loop.
I don't understand the thread pooling well enough maybe I shouldn't constantly be defining it. It's a hard exception to catch happening so testing is a bit of a pain but I'm thinking something like this might fix it?
pool = ThreadPool(processes=1)
async_result = pool.apply_async(SpawnPhantomJS, (dcap, service_args))
while True:
Driver = async_result.get(10)
That looks neater to me but I don't understand the problem well enough to say for sure it would fix it.
I'd really appreciate any suggestions.
Update:
I've tracked the problem to this section of code 100% I put a variable named bugcounter = 1 before it and = 2 afterwards and logged this on an exception.
But when trying to reproduce it with just this code in a loop it runs fine and keeps spawning web drivers. So I've no idea.
Further update:
I can run this locally for hours. Sometimes it'll run on the (Windows) server for hours. But after a while it fails somewhere here and I can't figure out why.
An exception could be thrown because the timeout hits and the browser wouldn't spawn on time. This happens rarely but that's why we loop back to it.
My assumption here is I'm creating too many threads and the OS isn't having it. I have just spotted there's a .terminate for thread pooling maybe if I terminate the pool after using it to spawn a browser?
The question I came to in the final answer solved it.
I was using a thread pool to give the browser spawn a timeout as a workaround for the bug in the library. But I wasn't terminating that thread pool so eventually after the x amount of loops the OS wouldn't let it create another pool.
Adding a .terminate once the browser had been spawned and the pool was no longer needed solved the problem.
Related
I have a python script which starts multiple sub processes using these lines :
for elm in elements:
t = multiprocessing.Process(target=sub_process,args=[elm])
threads.append(t)
t.start()
for t in threads:
t.join()
Sometimes, for some reason the thread halts and the script never finishes.
I'm trying to use VSCode debugger to find the problem and check where in the thread itself it stuck but I'm having issues pausing these sub processes because when I click the pause in the debugger window:
It will pause the main thread and some other threads that are running properly but it won't pause the stuck sub process.
Even when I try to pause the threads manually one by one using the Call Stack window, I can still pause only the working threads and not the stuck one.
Please help me figure this thing, It's a hard thing because the thing that makes the process stuck doesn't always happen so it makes it very hard to debug.
First, those are subprocesses, not threads. It's important to understand
the difference, although it doesn't answer your question.
Second, a pause (manual break) in the Python debugger will break in Python code.
It won't break in the machine code below that executes the Python, or in the machine
code below that performing the OS services the Python code is asking for.
If you execute a pause, the pause will occur in the Python code above
the machine code when (and if) the machine code returns to the Python interpreter loop.
Given a complete example:
import multiprocessing
import time
elements = ["one", "two", "three"]
def sub_process(gs, elm):
gs.acquire()
print("sleep", elm)
time.sleep(60)
print("awake", elm);
gs.release()
def test():
gs = multiprocessing.Semaphore()
subprocs = []
for elm in elements:
p = multiprocessing.Process(target=sub_process,args=[gs, elm])
subprocs.append(p)
p.start()
for p in subprocs:
p.join()
if __name__ == '__main__':
test()
The first subprocess will grab the semaphore and sleep for a minute,
and the second and third subprocesses will wait inside gs.acquire() until they
can move forward. A pause will not break into the debugger until the
subprocess returns from the acquire, because acquire is below the Python code.
It sounds like you have an idea where the process is getting stuck,
but you don't know why. You need to determine what questions
you are trying to answer. For example:
(Assuming) one of the processess is stuck in acquire. That means one of the other
processess didn't release the semaphore. What code in which process is
acquiring a semaphore and not releasing it?
Looking at the semaphore object itself might tell you which subprocess is holding it,
but this is a tangent: can you use the debugger to inspect the semaphore
and determine who is holding it? For example, using a machine level debugger in windows,
if these were threads and a critical section, it's possible to look at the critical section
and see which thread is still holding it. I don't know if this could be
done using processes and semaphores on your chosen platform.
Which debuggers you have access to depend on the platform you're running on.
In summary:
You can't break the Python debugger in machine code
You can run the Python interpreter in a machine code debugger, but this
won't show you the Python code at all, which make life interesting.
This can be helpful if you have an idea what you're looking for -
for example, you might be able to tell that you're stuck waiting for a semaphore.
Running a machine code debugger becomes more difficult when you're running
sub-processes, because you need to know which sub-process you're interested
in, and attach to that one. This becomes simpler if you're using a single
process and multiple threads instead, since there's only one process to deal with.
"You can't get there from here, you have to go someplace else first."
You'll need to take a closer look at your code and figure out how
to answer the questions you need to answer using other means.
It's just an idea, Why not to set a timeout on your sub processes and terminate it?
TIMEOUT = 60
for elm in elements:
t = multiprocessing.Process(target=sub_process,args=[elm])
t.daemon = True
threads.append(t)
t.start()
t.join(TIMEOUT)
for t in threads:
t.join()
Good evening, after several hours looking for a solution, I still can't find any way to solve this :
I'm currently working on a Selenium script that creates X threads, each thread run a Firefox instance that makes test on a website. The thing is, when I'm using Ctrl C or leaving the executable with the cross at top-right, every Mozilla instances created keep living.
I assumed this is caused by the fact that sub-threads created in the main thread are not stopped due to these processes that are still running so I decided to make a function that takes a list of drivers will close EVERY drivers in the list, and these drivers are added to the list when they are created.
The issue happens when I'm running it as an executable, the "Stop" function for my IDE (PyCharm) has no issue with it.
What I've tried :
using atexit module to shutdown every drivers (Firefox instances) on exit with a clean_threads function -> It doesn't work because it looks like atexit is running once every threads are shutdown, so in my case, the function was never called
Running my main function in a "try - finally" structure with the clean_threads function called in the finally -> doesn't work as well, I might have used it the wrong way but it did not worked as well.
Running my main function in a "try - except (KeyboardInterrupt, SystemExit)", didn't managed to make it work aswell, for some unknown reason it just made Ctrl C being not able to
I'd love to have some advice on the procedure to follow, I admit going in circles and not finding a solution to the problem..
Any help will be appreciated, thanks in advance :) And if there is a need for more clarification, snippets or whatever, please do not hesitate.
Code of my main function :
def main():
global THREADS
load_settings()
try :
# TODO clean firefox instances
# TODO proper switch
THREADS = [Thread(target=automation, args=(i, HEADLESS)) for i in
range(FIRST_ACC, ACCOUNT_NUMBER + FIRST_ACC, 1)]
for thread in THREADS:
thread.start()
time.sleep(30)
for thread in THREADS:
thread.join()
except (KeyboardInterrupt, SystemExit):
print("Exception catchée")
clean_threads(DRIVER_LIST)
The clean_threads function :
def clean_threads(driver_list):
discard_list = []
print("Test")
for each in driver_list:
each.exit()
discard_list.append(each)
print(len(discard_list))
I am facing a pretty odd issue. I have multi process python code that processes some data in parallel. I split the data in to 8 and work on each split individually using a Process Class, I then do a join on each Process.
I just noticed that when I process a large amount of data, one of the threads.... disappears. As in it doesn't error out or raise an exception and it just goes missing. What is even more interesting is that it seems to successfully complete the join() on the process when I know for a fact it did not finish.
tn1_processes = []
for i in range(8):
tn1_processes.append(
MyCustomProcess(logger=self.logger, i=i,
shared_queue=shared_queue))
tn1_processes[-1].start()
for tn1_processor in tn1_processes:
tn1_processor.join()
print('Done')
What do I know for sure:
All Processes are starting and are processing data and reach about half way, I know this because I have logs that show all the Processes doing their work.
Then Process 1 disappears from the logs towards the end of it's job, while all the other ones keep working fine and completing. Then My code moves on after thinking all the Processes are complete after the joins (I demonstrate this with a print) however I know for a fact that one of the processes did not complete, it did not error out and for some strange reason it passed the join()?
The only thing I can think of is that the Process runs out of memory but I would feel it would error out or throw an exception if this happened. Actually it has happened to me before using the same code and I saw the exception in my logs and the code was able to handle and see that the Process failed. But this, no error or anything is strange.
Can anyone shed some light?
Using Python3.4
If I remember correctly when a process abruptly terminates it wouldn't throw an error, you need to have another queue for storing the thrown exceptions and handle them elsewhere.
When a process ends however, an exit code is given: https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Process.exitcode
A rudimentary check would be making sure all of them safely exited (probably with 0 as exit code, while negative indicates termination signal and None as running).
The issue was that the python was running out of memory. The only way I knew this is that I monitored the machine's memory usage while the code was running and it needed more space than was available so one of the processes was just killed with no errors or exceptions. #j4hangir's answer of how to avoid this is good, I need to check the exit code. I haven't tested this yet but I will and then update
I am using requests to pull some files. I have noticed that the program seems to hang after some large number of iterations that varies from 5K to 20K. I can tell it is hanging because the folder where the results are stored has not changed in several hours. I have been trying to interrupt the process (I am using IDLE) by hitting CTRL + C to no avail. I would like to interrupt instead of killing the process because restart is easier. I have finally had to kill the process. I restart and it runs fine again until I have the same symptoms. I would like to figure out how to diagnose the problem but since I am having to kill everything I have no idea where to start.
Is there an alternate way to view what is going on or to more robustly interrupt the process?
I have been assuming that if I can interrupt without killing I can look at globals and or do some other mucking around to figure out where my code is hanging.
In case it's not too late: I've just faced the same problems and have some tips
First thing: In python most waiting apis are not interruptible (ie Thread.join(), Lock.acquire()...).
Have a look at theese pages for more informations:
http://snakesthatbite.blogspot.fr/2010/09/cpython-threading-interrupting.html
http://docs.python.org/2/library/thread.html
Then if a thread is waiting on such a call, it cannot be stopped.
There is another thing to know: if a normal thread is running (or hanged) the main program will stay indefinitely untill all threads are stopped or the process is killed.
To avoid that, you can make the thread a daemon thread: Thread.daemon=True before calling Thread.start().
Second thing, to find where your program is hanged, you can launch it with a debugger but I prefer logging because logs are always there in case its to late to debug.
Try logging before and after each waiting call to see how much time your threads have been hanged. To have high quality logs, uses python logging configured with file handler, html handler or even better with a syslog handler.
I am doing some gnarly stuff with Python threads including daemons.
I am getting an intermittent error on some tests:
Exception in thread myconsumerthread (most likely raised during interpreter shutdown):
Note that there are no stack trace/exception details provided.
Scrutinising my own code hasn't helped, but I am at a bit of a loss about the next step in debugging. What debugging techniques can I use to find out more about what exception might be bringing down the runtime during shutdown?
Fine print:
Windows, CPython, 2.7.2 - Not reproduceable on Ubuntu.
The problem occurs about 3% of the time - so reproducable, just not reliably.
The code in myconsumerthread has a catch-all exception handler, which tries to write the name of the exception to sys.stderr. (Could sys already be shut-down?)
I suspect the problem is related to shutting down daemon threads very quickly; before they have completely initialised. Something in this area, but I have little evidence - certainly insufficient to be pointing at a Python bug.
Ha, I have discovered a new symptom that marks a turning point in my descent into insanity!
If I import time in my test harness (not the live code), and never use it, the frequency drops to about 0.5%.
If I import turtle in my test harness (I swear on my life, there are no turtle graphics in my code; I chose this as the most irrelevant library I could quickly find) the exception starts to be caught in a different thread, and it occurs in about a third of the runs.
I have encountered the same error on a few occasions. I'm trying to locate / generate an example that displays the exact message.
Until then, if my memory serves me well, these were the areas that I focused on.
Looking for ports, files, queues, etc... removed or closed outside the daemon threads.
Scrutinize blocking calls in the daemon threads. IE a Queue.get(block=True), pyserial.read() - with timeout=None
After digging a little more I see the same types of errors popping up relating to Queue's see comments here.
I find it odd that it doesn't display the trace back. You might try to comment out the catch-all except and let Python send it to std.error. Hopefully then you'll be able to see what's dying on you.
Update
I knew I have seen this issue before... Below you'll find an example that generates that error (many of them actually). Note that there is no other trace back message either... For sake of completeness after you see the error messages, uncomment the queue.get lines and comment out the time.sleeps. The errors should go away. After re-running this again, the errors do not appear... This is inline with what you have been seeing in the sporadic failure rates... You may need to run it a few times to see the errors.
I normally use time.sleep(x) to throttle threads if blocking IO such as get() and read() do not provide a timeout method OR there is no blocking call to be used (user interface refreshes for example).
That being said, I believe there to be a problem with a thread being shutdown when waiting on a time.sleep() call. I believe that this call is what has gotten me every time, but I do not know what actually causes it inside the sleep method. For all I know there are other blocking calls that display this same behavior.
import time
import Queue
from threading import Thread
SLAVE_CNT = 50
OWNER_CNT = 10
MASTER_CNT = 2
class ThreadHungry(object):
def __init__(self):
self.rx_queue = Queue.Queue()
def start(self):
print "Adding Masters..."
for x in range(MASTER_CNT):
self.owners = []
print "Starting slave owners..."
for y in range(OWNER_CNT):
owner = Thread(target=self.__owner_action)
owner.daemon = True
owner.start()
self.owners.append(owner)
def __owner_action(self):
self.slaves = []
print "\tStarting slaves..."
for x in range(SLAVE_CNT):
slave = Thread(target=self.__slave_action)
slave.daemon = True
slave.start()
self.slaves.append(slave)
while(1):
time.sleep(1)
#self.rx_queue.get(block=True)
def __slave_action(self):
while(1):
time.sleep(1)
#self.rx_queue.get(block=True)
if __name__ == "__main__":
c = ThreadHungry()
c.start()
# Stop the threads abruptly after 5 seconds
time.sleep(5)