I'm working on a computercheck program, with several checks.
Once the checks are completed, the results will go into a database.
So far so good.
Since separate functions were freezing the application (wx based), I introduced threading in the code. Which work fine and fast.
The threading looked like this:
check2 = thread2()
check3 = thread3()
check3 = thread4()
check4 = thread5()
check1.start()
check2.start()
check3.start()
check4.start()
check5.start()
The above is a def and is initiated by a button press event.
This all works well. Now I have to upload the results into a database. When I add the function e.g. uploadDB(arg[]) after the code, the function will start although the threads are still busy.
Which means I have to wait with that untill finished. Hence I'm now using the code a bit different like:
threads=[]
c1 = check1()
threads.append(c1)
c2 = check2()
threads.append(c2)
...
for x in threads:
x.start()
# wait for all threads to finish
for x in threads:
x.join()
uploadDB(arg[])
This works as well, but during the join, the interface freezes again, because everything is waiting until the threads are finished...and the freezing is actually what I don't want...but If I don't use the join...I don't know when the threads are finished before uploading..
There should be a more easy way to do this I suppose?
Thanks again for the help!~
/Jasper
An interim solution I have now is that I have a thead listener running. Every thread is posting a "run" to the listener and that one is keeping an integer for the amount of threads. At the end of a thread (wx.callafter) a "done" is posted and the integer is decreasing again. So when the integer is 0, it means all the threads are done and I can continue with e.g. database stuff.
However, this seems not to be a real efficient way of checking if the threads are done.
But the problem remains. "join()" is freezing...and if I set a while loop, I basically create the same situation as with the join() statement...it freezes the application..
So any suggestion of how to do this more efficient is welcome!
Thanks!
Related
I have a python script which starts multiple sub processes using these lines :
for elm in elements:
t = multiprocessing.Process(target=sub_process,args=[elm])
threads.append(t)
t.start()
for t in threads:
t.join()
Sometimes, for some reason the thread halts and the script never finishes.
I'm trying to use VSCode debugger to find the problem and check where in the thread itself it stuck but I'm having issues pausing these sub processes because when I click the pause in the debugger window:
It will pause the main thread and some other threads that are running properly but it won't pause the stuck sub process.
Even when I try to pause the threads manually one by one using the Call Stack window, I can still pause only the working threads and not the stuck one.
Please help me figure this thing, It's a hard thing because the thing that makes the process stuck doesn't always happen so it makes it very hard to debug.
First, those are subprocesses, not threads. It's important to understand
the difference, although it doesn't answer your question.
Second, a pause (manual break) in the Python debugger will break in Python code.
It won't break in the machine code below that executes the Python, or in the machine
code below that performing the OS services the Python code is asking for.
If you execute a pause, the pause will occur in the Python code above
the machine code when (and if) the machine code returns to the Python interpreter loop.
Given a complete example:
import multiprocessing
import time
elements = ["one", "two", "three"]
def sub_process(gs, elm):
gs.acquire()
print("sleep", elm)
time.sleep(60)
print("awake", elm);
gs.release()
def test():
gs = multiprocessing.Semaphore()
subprocs = []
for elm in elements:
p = multiprocessing.Process(target=sub_process,args=[gs, elm])
subprocs.append(p)
p.start()
for p in subprocs:
p.join()
if __name__ == '__main__':
test()
The first subprocess will grab the semaphore and sleep for a minute,
and the second and third subprocesses will wait inside gs.acquire() until they
can move forward. A pause will not break into the debugger until the
subprocess returns from the acquire, because acquire is below the Python code.
It sounds like you have an idea where the process is getting stuck,
but you don't know why. You need to determine what questions
you are trying to answer. For example:
(Assuming) one of the processess is stuck in acquire. That means one of the other
processess didn't release the semaphore. What code in which process is
acquiring a semaphore and not releasing it?
Looking at the semaphore object itself might tell you which subprocess is holding it,
but this is a tangent: can you use the debugger to inspect the semaphore
and determine who is holding it? For example, using a machine level debugger in windows,
if these were threads and a critical section, it's possible to look at the critical section
and see which thread is still holding it. I don't know if this could be
done using processes and semaphores on your chosen platform.
Which debuggers you have access to depend on the platform you're running on.
In summary:
You can't break the Python debugger in machine code
You can run the Python interpreter in a machine code debugger, but this
won't show you the Python code at all, which make life interesting.
This can be helpful if you have an idea what you're looking for -
for example, you might be able to tell that you're stuck waiting for a semaphore.
Running a machine code debugger becomes more difficult when you're running
sub-processes, because you need to know which sub-process you're interested
in, and attach to that one. This becomes simpler if you're using a single
process and multiple threads instead, since there's only one process to deal with.
"You can't get there from here, you have to go someplace else first."
You'll need to take a closer look at your code and figure out how
to answer the questions you need to answer using other means.
It's just an idea, Why not to set a timeout on your sub processes and terminate it?
TIMEOUT = 60
for elm in elements:
t = multiprocessing.Process(target=sub_process,args=[elm])
t.daemon = True
threads.append(t)
t.start()
t.join(TIMEOUT)
for t in threads:
t.join()
This is a two part question,
After I cancel my script it still continues run, what I'm doing is queering an exchange api and saving the data for various assets.
My parent script can be seen here you can see i'm testing it out with just 3 assets, a sample of one of the child scripts can be seen here.
After I cancel the script the script for BTC seems to still be running and new .json files are still being generated in it's respective folder. The only way to stop it is to delete the folder and create it again.
This is really a bonus, my code was working with two assets but now with the addition of another it seems to only take in data for BTC and not the other 2.
Your first problem is that you are not really creating worker threads.
t1 = Thread(target=BTC.main()) executes BTC.main() and uses its return code to try to start a thread. Since main loops forever, you don't start any other threads.
Once you fix that, you'll still have a problem.
In python, only the root thread sees signals such as ctrl-c. Other threads will continue executing no matter how hard you press the key. When python exits, it tries to join non-daemon threads and that can cause the program to hang. The main thread is waiting for a thread to terminate, but the thread is happily continuing with its execution.
You seem to be depending on this in your code. Your parent starts a bunch of threads (or will, when you fix the first bug) and then exits. Really, its waiting for the threads to exit. If you solve the problem with daemon threads (below), you'll also need to add code for your thread to wait and not exit.
Back to the thread problem...
One solution is to mark threads as "daemon" (do mythread.daemon = True before starting the thread). Python won't wait for those threads and the threads will be killed when the main thread exits. This is great if you don't care about what state the thread is in while terminating. But it can do bad things like leave partially written files laying around.
Another solution is to figure out some way for the main thread to interrupt the thread. Suppose the threads waits of socket traffic. You could close the socket and the thread would be woken by that event.
Another solution is to only run threads for short-lived tasks that you want to complete. Your ctrl-c gets delayed a bit but you eventually exit. You could even set them up to run off of a queue and send a special "kill" message to them when done. In fact, python thread pools are a good way to go.
Another solution is to have the thread check a Event to see if its time to exit.
I have this python threading code.
import threading
def sum(value):
sum = 0
for i in range(value+1):
sum += i
print "I'm done with %d - %d\n" % (value, sum)
return sum
r = range(500001, 500000*2, 100)
ts = []
for u in r:
t = threading.Thread(target=sum, args = (u,))
ts.append(t)
t.start()
for t in ts:
t.join()
Executing this, I have hundreds of threads are working.
However, when I move the t.join() right after the t.start(), I have only two threads working.
for u in r:
t = threading.Thread(target=sum, args = (u,))
ts.append(t)
t.start()
t.join()
I tested with the code that does not invoke the t.join(), but it seems to work fine?
Then when, how, and how to use thread.join()?
You seem to not understand what Thread.join does. When calling join, the current thread will block until that thread finished. So you are waiting for the thread to finish, preventing you from starting any other thread.
The idea behind join is to wait for other threads before continuing. In your case, you want to wait for all threads to finish at the end of the main program. Otherwise, if you didn’t do that, and the main program would end, then all threads it created would be killed. So usually, you should have a loop at the end, that joins all created threads to prevent the main thread from exiting down early.
Short answer: this one:
for t in ts:
t.join()
is generally the idiomatic way to start a small number of threads. Doing .join means that your main thread waits until the given thread finishes before proceeding in execution. You generally do this after you've started all of the threads.
Longer answer:
len(list(range(500001, 500000*2, 100)))
Out[1]: 5000
You're trying to start 5000 threads at once. It's miraculous your computer is still in one piece!
Your method of .join-ing in the loop that dispatches workers is never going to be able to have more than 2 threads (i.e. only one worker thread) going at once. Your main thread has to wait for each worker thread to finish before moving on to the next one. You've prevented a computer-meltdown, but your code is going to be WAY slower than if you'd just never used threading in the first place!
At this point I'd talk about the GIL, but I'll put that aside for the moment. What you need to limit your thread creation to a reasonable limit (i.e. more than one, less than 5000) is a ThreadPool. There are various ways to do this. You could roll your own - this is fairly simple with a threading.Semaphore. You could use 3.2+'s concurrent.futures package. You could use some 3rd party solution. Up to you, each is going to have a different API so I can't really discuss that further.
Obligatory GIL Discussion
cPython programmers have to live with the GIL. The Global Interpreter Lock, in short, means that only one thread can be executing python bytecode at once. This means that on processor-bound tasks (like adding a bunch of numbers), threading will not result in any speed-up. In fact, the overhead involved in setting up and tearing down threads (not to mention context switching) will result in a slowdown. Threading is better positioned to provide gains on I/O bound tasks, such as retrieving a bunch of URLs.
multiprocessing and friends sidestep the GIL limitation by, well, using multiple processes. This isn't free - data transfer between processes is expensive, so a lot of care needs to be made not to write workers that depend on shared state.
join() waits for your thread to finish, so the first use starts a hundred threads, and then waits for all of them to finish. The second use wait for end of every thread before it launches another one, which kind of defeats the purpose of threading.
The first use makes most sense. You run the threads (all of them) to do some parallel computation, and then wait until all of them finish, before you move on and use the results, to make sure the work is done (i.e. the results are actually there).
This question already has answers here:
How do you create a daemon in Python?
(16 answers)
Closed 9 years ago.
I am new with Daemons and I was wondering how can I make my main script a daemon?
I have my main script which I wish to make a Daemon and run in the background:
main.py
def requestData(information):
return currently_crunched_data()
while True:
crunchData()
I would like to be able to use the requestData function to this daemon while the loop is running. I am not too familiar with Daemons or how to convert my script into one.
However I am guessing I would have to make two threads, one for my cruncData loop and one for the Daemon request receiever since the Daemon has its own loop (daemon.requestLoop()).
I am currently looking into Pyro to do this. Does anyone know how I can ultimately make a background running while loop have the ability to receive requests from other processes (like a Daemon I suppose) ?
There are already a number of questions on creating a daemon in Python, like this one, which answer that part nicely.
So, how do you have your daemon do background work?
As you suspected, threads are an obvious answer. But there are three possible complexities.
First, there's shutdown. If you're lucky, your crunchData function can be summarily killed at any time with no corrupted data or (too-significant) lost work. In that case:
def worker():
while True:
crunchData()
# ... somewhere in the daemon startup code ...
t = threading.Thread(target=worker)
t.daemon = True
t.start()
Notice that t.daemon. A "daemon thread" has nothing to do with your program being a daemon; it means that you can just quit the main process, and it will be summarily killed.
But what if crunchData can't be killed? Then you'll need to do something like this:
quitflag = False
quitlock = threading.Lock()
def worker():
while True:
with quitlock:
if quitflag:
return
crunchData()
# ... somewhere in the daemon startup code ...
t = threading.Thread(target=worker)
t.start()
# ... somewhere in the daemon shutdown code ...
with quitlock:
quitflag = True
t.join()
I'm assuming each iteration of crunchData doesn't take that long. If it does, you may need to check quitFlag periodically within the function itself.
Meanwhile, you want your request handler to access some data that the background thread is producing. You'll need some kind of synchronization there as well.
The obvious thing is to just use another Lock. But there's a good chance that crunchData is writing to its data frequently. If it holds the lock for 10 seconds at a time, the request handler may block for 10 seconds. But if it grabs and releases the lock a million times, that could take longer than the actual work.
One alternative is to double-buffer your data: Have crunchData write into a new copy, then, when it's done, briefly grab the lock and set currentData = newData.
Depending on your use case, a Queue, a file, or something else might be even simpler.
Finally, crunchData is presumably doing a lot of CPU work. You need to make sure that the request handler does very little CPU work, or each request will slow things down quite a bit as the two threads fight over the GIL. Usually this is no problem. If it is, use a multiprocessing.Process instead of a Thread (which makes sharing or passing the data between the two processes slightly more complicated, but still not too bad).
Basically I am using toasterbox and I want code to run, lets say every 30 seconds. Well every 30 seconds the toasterbox should pop up. The code looks like this
event = threading.Event()
#############################################
bWidth = 200
bHeight = 100
tb = TB.ToasterBox(self, TB.TB_COMPLEX, TB.DEFAULT_TB_STYLE, TB.TB_ONTIME)
tb.SetPopupSize((bWidth,bHeight))
tb.SetPopupPosition((1600-bWidth,900-bHeight))
tb.SetPopupPauseTime(4000)
tb.SetPopupScrollSpeed(8)
##############################################
while true:
showPopup(tb,name,amount,progress,link)
tb.Play()
event.wait(30)
That should give you an idea. Anyway the problem that occurs is, the toasterbox pops up but tb.play() isn't blocking because it spawns a timer to handle animation so the thread immediately continues to the wait function and blocks, so the toasterbox never closes. Is there a way to change the code for play to make it blocking? Or is there a better way to do this. I tried creating a new thread to run tb.play() in but it threw an error about only being able to run from the main thread. More info about toaster box including the source found here: Toasterbox
What is this toasterbox you're using? There's one included with wxPython. See here http://xoomer.virgilio.it/infinity77/AGW_Docs/toasterbox_module.html#toasterbox or here http://www.wxpython.org/docs/api/wx.lib.agw.toasterbox.ToasterBox-class.html
I don't think the agw one supports blocking and I'm guessing the one you're using doesn't either. You can ask on the wxPython mailing list to see if they can patch their version or have a better suggestion. Personally, I would use a wx.Timer to pop up the toaster. That should take care of your problem without threads. There's an example of how to use timers here: http://www.blog.pythonlibrary.org/2009/08/25/wxpython-using-wx-timers/