I was studying the python threading and came across join().
The author told that if thread is in daemon mode then i need to use join() so that thread can finish itself before main thread terminates.
but I have also seen him using t.join() even though t was not daemon
example code is this
import threading
import time
import logging
logging.basicConfig(level=logging.DEBUG,
format='(%(threadName)-10s) %(message)s',
)
def daemon():
logging.debug('Starting')
time.sleep(2)
logging.debug('Exiting')
d = threading.Thread(name='daemon', target=daemon)
d.setDaemon(True)
def non_daemon():
logging.debug('Starting')
logging.debug('Exiting')
t = threading.Thread(name='non-daemon', target=non_daemon)
d.start()
t.start()
d.join()
t.join()
i don't know what is use of t.join() as it is not daemon and i can see no change even if i remove it
A somewhat clumsy ascii-art to demonstrate the mechanism:
The join() is presumably called by the main-thread. It could also be called by another thread, but would needlessly complicate the diagram.
join-calling should be placed in the track of the main-thread, but to express thread-relation and keep it as simple as possible, I choose to place it in the child-thread instead.
without join:
+---+---+------------------ main-thread
| |
| +........... child-thread(short)
+.................................. child-thread(long)
with join
+---+---+------------------***********+### main-thread
| | |
| +...........join() | child-thread(short)
+......................join()...... child-thread(long)
with join and daemon thread
+-+--+---+------------------***********+### parent-thread
| | | |
| | +...........join() | child-thread(short)
| +......................join()...... child-thread(long)
+,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, child-thread(long + daemonized)
'-' main-thread/parent-thread/main-program execution
'.' child-thread execution
'#' optional parent-thread execution after join()-blocked parent-thread could
continue
'*' main-thread 'sleeping' in join-method, waiting for child-thread to finish
',' daemonized thread - 'ignores' lifetime of other threads;
terminates when main-programs exits; is normally meant for
join-independent tasks
So the reason you don't see any changes is because your main-thread does nothing after your join.
You could say join is (only) relevant for the execution-flow of the main-thread.
If, for example, you want to concurrently download a bunch of pages to concatenate them into a single large page, you may start concurrent downloads using threads, but need to wait until the last page/thread is finished before you start assembling a single page out of many. That's when you use join().
Straight from the docs
join([timeout])
Wait until the thread terminates. This blocks the calling thread until the thread whose join() method is called terminates – either normally or through an unhandled exception – or until the optional timeout occurs.
This means that the main thread which spawns t and d, waits for t to finish until it finishes.
Depending on the logic your program employs, you may want to wait until a thread finishes before your main thread continues.
Also from the docs:
A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left.
A simple example, say we have this:
def non_daemon():
time.sleep(5)
print 'Test non-daemon'
t = threading.Thread(name='non-daemon', target=non_daemon)
t.start()
Which finishes with:
print 'Test one'
t.join()
print 'Test two'
This will output:
Test one
Test non-daemon
Test two
Here the master thread explicitly waits for the t thread to finish until it calls print the second time.
Alternatively if we had this:
print 'Test one'
print 'Test two'
t.join()
We'll get this output:
Test one
Test two
Test non-daemon
Here we do our job in the main thread and then we wait for the t thread to finish. In this case we might even remove the explicit joining t.join() and the program will implicitly wait for t to finish.
Thanks for this thread -- it helped me a lot too.
I learned something about .join() today.
These threads run in parallel:
d.start()
t.start()
d.join()
t.join()
and these run sequentially (not what I wanted):
d.start()
d.join()
t.start()
t.join()
In particular, I was trying to clever and tidy:
class Kiki(threading.Thread):
def __init__(self, time):
super(Kiki, self).__init__()
self.time = time
self.start()
self.join()
This works! But it runs sequentially. I can put the self.start() in __ init __, but not the self.join(). That has to be done after every thread has been started.
join() is what causes the main thread to wait for your thread to finish. Otherwise, your thread runs all by itself.
So one way to think of join() as a "hold" on the main thread -- it sort of de-threads your thread and executes sequentially in the main thread, before the main thread can continue. It assures that your thread is complete before the main thread moves forward. Note that this means it's ok if your thread is already finished before you call the join() -- the main thread is simply released immediately when join() is called.
In fact, it just now occurs to me that the main thread waits at d.join() until thread d finishes before it moves on to t.join().
In fact, to be very clear, consider this code:
import threading
import time
class Kiki(threading.Thread):
def __init__(self, time):
super(Kiki, self).__init__()
self.time = time
self.start()
def run(self):
print self.time, " seconds start!"
for i in range(0,self.time):
time.sleep(1)
print "1 sec of ", self.time
print self.time, " seconds finished!"
t1 = Kiki(3)
t2 = Kiki(2)
t3 = Kiki(1)
t1.join()
print "t1.join() finished"
t2.join()
print "t2.join() finished"
t3.join()
print "t3.join() finished"
It produces this output (note how the print statements are threaded into each other.)
$ python test_thread.py
32 seconds start! seconds start!1
seconds start!
1 sec of 1
1 sec of 1 seconds finished!
21 sec of
3
1 sec of 3
1 sec of 2
2 seconds finished!
1 sec of 3
3 seconds finished!
t1.join() finished
t2.join() finished
t3.join() finished
$
The t1.join() is holding up the main thread. All three threads complete before the t1.join() finishes and the main thread moves on to execute the print then t2.join() then print then t3.join() then print.
Corrections welcome. I'm also new to threading.
(Note: in case you're interested, I'm writing code for a DrinkBot, and I need threading to run the ingredient pumps concurrently rather than sequentially -- less time to wait for each drink.)
The method join()
blocks the calling thread until the thread whose join() method is called is terminated.
Source : http://docs.python.org/2/library/threading.html
With join - interpreter will wait until your process get completed or terminated
>>> from threading import Thread
>>> import time
>>> def sam():
... print 'started'
... time.sleep(10)
... print 'waiting for 10sec'
...
>>> t = Thread(target=sam)
>>> t.start()
started
>>> t.join() # with join interpreter will wait until your process get completed or terminated
done? # this line printed after thread execution stopped i.e after 10sec
waiting for 10sec
>>> done?
without join - interpreter wont wait until process get terminated,
>>> t = Thread(target=sam)
>>> t.start()
started
>>> print 'yes done' #without join interpreter wont wait until process get terminated
yes done
>>> waiting for 10sec
In python 3.x join() is used to join a thread with the main thread i.e. when join() is used for a particular thread the main thread will stop executing until the execution of joined thread is complete.
#1 - Without Join():
import threading
import time
def loiter():
print('You are loitering!')
time.sleep(5)
print('You are not loitering anymore!')
t1 = threading.Thread(target = loiter)
t1.start()
print('Hey, I do not want to loiter!')
'''
Output without join()-->
You are loitering!
Hey, I do not want to loiter!
You are not loitering anymore! #After 5 seconds --> This statement will be printed
'''
#2 - With Join():
import threading
import time
def loiter():
print('You are loitering!')
time.sleep(5)
print('You are not loitering anymore!')
t1 = threading.Thread(target = loiter)
t1.start()
t1.join()
print('Hey, I do not want to loiter!')
'''
Output with join() -->
You are loitering!
You are not loitering anymore! #After 5 seconds --> This statement will be printed
Hey, I do not want to loiter!
'''
This example demonstrate the .join() action:
import threading
import time
def threaded_worker():
for r in range(10):
print('Other: ', r)
time.sleep(2)
thread_ = threading.Timer(1, threaded_worker)
thread_.daemon = True # If the main thread is killed, this thread will be killed as well.
thread_.start()
flag = True
for i in range(10):
print('Main: ', i)
time.sleep(2)
if flag and i > 4:
print(
'''
Threaded_worker() joined to the main thread.
Now we have a sequential behavior instead of concurrency.
''')
thread_.join()
flag = False
Out:
Main: 0
Other: 0
Main: 1
Other: 1
Main: 2
Other: 2
Main: 3
Other: 3
Main: 4
Other: 4
Main: 5
Other: 5
Threaded_worker() joined to the main thread.
Now we have a sequential behavior instead of concurrency.
Other: 6
Other: 7
Other: 8
Other: 9
Main: 6
Main: 7
Main: 8
Main: 9
When making join(t) function for both non-daemon thread and daemon thread, the main thread (or main process) should wait t seconds, then can go further to work on its own process. During the t seconds waiting time, both of the children threads should do what they can do, such as printing out some text. After the t seconds, if non-daemon thread still didn't finish its job, and it still can finish it after the main process finishes its job, but for daemon thread, it just missed its opportunity window. However, it will eventually die after the python program exits. Please correct me if there is something wrong.
There are a few reasons for the main thread (or any other thread) to join other threads
A thread may have created or holding (locking) some resources. The join-calling thread may be able to clear the resources on its behalf
join() is a natural blocking call for the join-calling thread to continue after the called thread has terminated.
If a python program does not join other threads, the python interpreter will still join non-daemon threads on its behalf.
join() waits for both non-daemon and daemon threads to be completed.
Without join(), non-daemon threads are running and are completed with the main thread concurrently.
Without join(), daemon threads are running with the main thread concurrently and when the main thread is completed, the daemon threads are exited without completed if the daemon threads are still running.
So, with join() and daemon=False(daemon threads) below (daemon is False by default):
import time
from threading import Thread
def test1():
for _ in range(3):
print("Test1 is running...")
time.sleep(1)
print("Test1 is completed")
def test2():
for _ in range(3):
print("Test2 is running...")
time.sleep(1)
print("Test2 is completed")
# Here
thread1 = Thread(target=test1, daemon=False)
thread2 = Thread(target=test2, daemon=False)
# Here
thread1.start()
thread2.start()
thread1.join() # Here
thread2.join() # Here
print("Main is completed")
Or, with join() and daemon=True(non-daemon threads) below:
# ...
# Here
thread1 = Thread(target=test1, daemon=True)
thread2 = Thread(target=test2, daemon=True)
# Here
# ...
thread1.join() # Here
thread2.join() # Here
print("Main is completed")
join() waits for Test1 and Test2 non-daemon or daemon threads to be completed. So, Main is completed is printed after Test1 and Test2 threads are completed as shown below:
Test1 is running...
Test2 is running...
Test1 is running...
Test2 is running...
Test1 is running...
Test2 is running...
Test1 is completed
Test2 is completed
Main is completed
And, if not using join() and if daemon=False(non-daemon threads) below:
# ...
# Here
thread1 = Thread(target=test1, daemon=False)
thread2 = Thread(target=test2, daemon=False)
# Here
# ...
# thread1.join()
# thread2.join()
print("Main is completed")
Test1 and Test2 non-daemon threads are running and completed with the main thread concurrently. So, Main is completed is printed before Test1 and Test2 threads are completed as shown below:
Test1 is running...
Test2 is running...
Main is completed
Test1 is running...
Test2 is running...
Test1 is running...
Test2 is running...
Test1 is completed
Test2 is completed
And, if not using join() and if daemon=True(daemon threads) below:
# ...
# Here
thread1 = Thread(target=test1, daemon=True)
thread2 = Thread(target=test2, daemon=True)
# Here
# ...
# thread1.join()
# thread2.join()
print("Main is completed")
Test1 and Test2 daemon threads are running with the main thread concurrently. So, Main is completed is printed before Test1 and Test2 daemon threads are completed and when the main thread is completed, Test1 and Test2 daemon threads are exited without completed as shown below:
Test1 is running...
Test2 is running...
Main is completed
Looks like difference between synchronous and asynchronous processing is missunderstood here.
A thread is meant to execute a sub-procedure, most of the times on a "parallel" or "concurrent" fashion (depends on whether the device has multi-processors or not). But, what's the point on concurrency? For the most part, it's about improving performance of a process, by applying the idea of "divide and conquer". Have several threads (sub-processes) executing a "portion" of the whole process simultaneously, and then have a "final" step where all sub-processes results are combined (joined; hence the "join" method).
Of course, in order to achieve such gain on efficiency, the portions that are divided into threads, must be "mutually exclusive" (i.e., they don't share values to be updated... -- known in parallel computing as "critical section" -- ). If there is at least one value that is updated by two or more threads, then one has to wait for the other to "finish" its update, otherwise obtaining inconsistent results (i.e., two persons owning a bank account intend to withdraw certain amount of money in an ATM... if there won't be a proper mechanism that "locks" or "protects" the variable "balance" in both of the ATM devices, withdraws will completely screw-up the final value of the balance, causing obvious serious financial problem to the account owners).
So, coming back to the purpose of a thread in parallel computing: have all threads doing their individual part, and use "join" to make them "come back" to the main process so that each individual result is then "consolidated" into a global one.
Examples? A bunch of them, but let's just enumarate a few ones clearly explained:
Matrix multiplication: have each thread multiplying a vector of matrix A by the whole second matrix B, to obtain a vector of matrix C. At the end, have all resulting vestors put together to "display" (show) result: matrix C. In this example, although matrix B is used by all threads, no value of it is ever updated or modified (read-only).
Summation, product of an array of massive numbers (an array of thousand of values, whether integer or float). Make threads to execute partial sums/products (say, if you have to sum 10K values, create 5 threads, each with 2K values); then with "join" make them return to the main process and sum individual results of all 5 threads.
Theoretically, the process will do 2000 + 5 steps (2000 simultaneously in 5 threads, plus summation of final 5 sub-totals in the main process). In practice, though, how long do the 5 threads take to do its own 2000 numbers summation is completely variable as different factors get involved here (processor speed, electrical flow, or if it is a web service, network latency, and so on). However, the amount ot time invested would be in the "worst case", the amount of time the "slowest" thread takes, plus the final summation of 5 results step. Also, in practice, a thread that is meant to do 20% of the whole job, unlikely will take much longer than a single sequential process that would do 100% of the job (of course, it also depends on the size of the sample to be processed... the advantage won't be the same on a summation of 10K values, than summation of just 10 values with the same 5 threads... it's non-practicall, not worth it).
Quick sort: We all know in general how quick sort works. However, there's a chance to improve it, if, say, we execute it in TWO threads: one that does the odd numbers and one that does the even ones. Then executes recursively and at some point it joins results of both threads and does a final quick sort in a fashion that will not require so many repetitions as numbers will be sufficiently ordered after the two threads did its initial job. That's a serios gain on performance with a quite big and unordered number of items. Chances are three threads can be used by doing some arrangement to the logic behind it, but its gain is really minimum and not worth to be programmed. However, two threads have a decent performance (time) gain.
So, usage of "join" in python (or it's equivalent in other "concurrency" languages) has an important significance; but depends a lot on the programming understanding what does s/he want to "paralellize" and how skilled s/he is in splitting the algorithm in the right steps to be parallellized vs. what steps need to be kept in the main process. It's more a problem of "logic" thinking than a programming "anti-pattern".
"What's the use of using join()?" you say. Really, it's the same answer as "what's the use of closing files, since python and the OS will close my file for me when my program exits?".
It's simply a matter of good programming. You should join() your threads at the point in the code that the thread should not be running anymore, either because you positively have to ensure the thread is not running to interfere with your own code, or that you want to behave correctly in a larger system.
You might say "I don't want my code to delay giving an answer" just because of the additional time that the join() might require. This may be perfectly valid in some scenarios, but you now need to take into account that your code is "leaving cruft around for python and the OS to clean up". If you do this for performance reasons, I strongly encourage you to document that behavior. This is especially true if you're building a library/package that others are expected to utilize.
There's no reason to not join(), other than performance reasons, and I would argue that your code does not need to perform that well.
Related
So basically, I've this function th() which counts till certain number and then prints "done".
I'd want to start n number of such threads at the same time, running simultaneously.
So I wrote:
thread_num = 3 #here n is 3, but I'd normally want something way higher
thrds = []
i = 0
while i < thread_num:
thr = Thread(target=th, args=())
thrds.append(thr)
i += 1
print("thread", str(i), "added")
for t in thrds:
t.start()
t.join()
I'd want all the threads to print "done" at the same time, but they have a noticeable lag in between of them. They print "thread i started" at seemingly the same time, but print "done" with quite a bit of time lag.
Why is this happening?
Edit: Since someone asked me to add th() function as well, here it is:
def th():
v = 0
num = 10**7
while v < num:
v += 1
print("done")
This is happening because of the t.join() method that you are calling on each thread before start the next one. t.join() blocks the execution of the current thread until the thread t has completed execution. So, each thread is starting after the previous one has finished.
You first have to start all the threads, then join all the threads in separate for loops; otherwise, each thread starts but runs to completion due to join before starting another thread.
for t in thrds: # start all the threads
t.start()
for t in thrds: # wait for all threads to finish
t.join()
If you only have a simple counting thread, you may need to add some short sleep to actually see the threads output intermingle as they may still run fast enough to complete before another thread starts.
Because you start and join each thread sequentially, one thread will run to completion before the next even starts. You'd be better off running a thread pool which is a more comprehensive implementation that handles multiple issues in multithreading.
Because of memory management and object reference count issues, python only lets a single thread execute byte code at a time. Periodically, each thread will release and reacquire the Global Interpreter Lock (GIL) to let other threads run. Exactly which thread runs at any given time is up to the operating system and you may find one gets more slices than another, causing staggered results.
To get them all to print "done" at the same time, you could use a control structure like a barrier for threads to wait until all are done. With a barrier, all threads must call wait before any can continue.
thread_num = 3 #here n is 3, but I'd normally want something way higher
wait_done = threading.Barrier(thread_num)
def th(waiter):
x = 1 # to what you want
waiter.wait()
print("done")
thrds = []
i = 0
while i < thread_num:
thr = Thread(target=th, args=(wait_done,))
thrds.append(thr)
i += 1
print("thread", str(i), "added")
for t in thrds:
t.start()
for t in thrds:
t.join()
I have the following script (don't refer to the contents):
import _thread
def func1(arg1, arg2):
print("Write to CLI")
def verify_result():
func1()
for _ in range (4):
_thread.start_new_thread(func1, (DUT1_CLI, '0'))
verify_result()
I want to concurrently execute (say 4 threads) func1() which in my case includes a function call that can take time to execute. Then, only after the last thread finished its work I want to execute verify_result().
Currently, the result I get is that all threads finish their job, but verify_result() is executed before all threads finish their job.
I have even tried to use the following code (of course I imported threading) under the for loop but that didn't do the work (don't refer to the arguments)
t = threading.Thread(target = Enable_WatchDog, args = (URL_List[x], 180, Terminal_List[x], '0'))
t.start()
t.join()
Your last threading example is close, but you have to collect the threads in a list, start them all at once, then wait for them to complete all at once. Here's a simplified example:
import threading
import time
# Lock to serialize console output
output = threading.Lock()
def threadfunc(a,b):
for i in range(a,b):
time.sleep(.01) # sleep to make the "work" take longer
with output:
print(i)
# Collect the threads
threads = []
for i in range(10,100,10):
# Create 9 threads counting 10-19, 20-29, ... 90-99.
thread = threading.Thread(target=threadfunc,args=(i,i+10))
threads.append(thread)
# Start them all
for thread in threads:
thread.start()
# Wait for all to complete
for thread in threads:
thread.join()
Say you have a list of threads.
You loop(each_thread) over them -
for each_thread in thread_pool:
each_thread.start()
within the loop to start execution of the run function within each thread.
The same way, you write another loop after you start all threads and have
for each_thread in thread_pool:
each_thread.join()
what join does is that it will wait for thread i to finish execution before letting i+1th thread to finish execution.
The threads would run concurrently, join() would just synchronize the way each thread returns its results.
In your case specifically, you can the join() loop and the run verify_result() function.
i am just a beginner in python.What i try'ed to achieve is making two threads and calling different functions in different thread.I made the function in thread 1 to execute a function for 60 seconds and thread 2 to execute simultaneously and wait the main thread to wait for 70 second.When thread one exits it should also exit the second thread and finally control should come to main thread and again the call to thread one and thread two should go and same procedure repeat.
I try'ed achieving it using the below thread but i thing i was not able to
I have made a script in which i have started two thread named thread 1 and thread 2.
In thread 1 one function will run named func1 and in thread 2 function 2 will run named func 2.
Thread 1 will execute a command and wait for 60 seconds.
Thread 2 will run only till thread 1 is running .
Again after that the same process continues in while after a break of 80 Seconds.
I am a beginner in python.
Please suggest what all i have done wrong and how to correct it.
#!/usr/bin/python
import threading
import time
import subprocess
import datetime
import os
import thread
thread.start_new_thread( print_time, (None, None))
thread.start_new_thread( print_time1, (None, None))
command= "strace -o /root/Desktop/a.txt -c ./server"
final_dir = "/root/Desktop"
exitflag = 0
# Define a function for the thread
def print_time(*args):
os.chdir(final_dir)
print "IN first thread"
proc = subprocess.Popen(command,shell=True,stdout=subprocess.PIPE, stderr=subprocess.PIPE)
proc.wait(70)
exitflag=1
def print_time1(*args):
print "In second thread"
global exitflag
while exitflag:
thread.exit()
#proc = subprocess.Popen(command1,shell=True,stdout=subprocess.PIPE, sterr=subprocess.PIPE)
# Create two threads as follows
try:
while (1):
t1=threading.Thread(target=print_time)
t1.start()
t2=threading.Thread(target=print_time1)
t2=start()
time.sleep(80)
z = t1.isAlive()
z1 = t2.isAlive()
if z:
z.exit()
if z1:
z1.exit()
threading.Thread(target=print_time1).start()
threading.Thread(target=print_time1).start()
print "In try"
except:
print "Error: unable to start thread"
I can't get the example to run, I need to change the function definitons to
def print_time(*args)
and the thread call to
thread.start_new_thread( print_time, (None, None))
then you have a number of problems
you are currently not waiting for the exitflag to be set in the second thread, it justs runs to completion.
to share variables between thread you need to declare them global in the thread, otherwise you get a local variable.
thread.exit() in the print_time1 function generates an error
Your timings in the problem description and in the code does not match
So, to solve issue 1-3 for print_time1 declare it like (removing exit from the end)
def print_time1(*args):
global exitflag
while exitflag == 0: # wait for print_time
next
# Do stuff when thread is finalizing
But, check the doc for the thread module (https://docs.python.org/2/library/thread.html), "[...] however, you should consider using the high-level threading module instead."
import threading
...
while(1):
threading.Thread(target=print_time).start()
threading.Thread(target=print_time1).start()
time.sleep(80)
One final tought about the code is that you should check that the threads are actually finalized before starting new ones. Right now two new threads are started every 80 sec, this is regardless of whether the old threads have run to completion or not. Unless this is the wanted behaviour I would add a check for that in the while loop. Also while you are at it, move the try clause to be as close as possible to where the exception might be raised, i.e. where the threads are created. The way you have it now with the try encapsulating a while loop is not very common and imo not very pythonic (increases complexity of code)
I'm new to thread in python, i have a question that, supposed i start 3 threads like below, each one takes care of 1 different task:
def start( taskName, delay):
// do somthing with each taskName
# Create two threads as follows
try:
thread.start_new_thread( start, ("task1", ) )
thread.start_new_thread( start, ("task2", ) )
thread.start_new_thread( start, ("task3", ) )
except:
print "Error: unable to start thread"
Supposed that for each "start", it takes around 10-15 seconds to finish depending on each taskName it is. My question is that, if task 1 finishes in 12 seconds, tasks 2 in 10secs and task 3 in 15 seconds. Will task 2 finish then close and leave task 1 and task 3 to run till finish, or will task 2 force task 1 and 3 to close after task 2 is finished?
Are there any arguments that we can pass to the start_new_thread method in order to archive 2 of the things that I have mentioned above:
1. First to finish forces the rest to close.
2. Each one finish individually.
Thank you
As Max Noel already mentioned, it is advised to use the Thread class instead of using start_new_thread.
Now, as for your two questions:
1. First to finish forces the rest to close
You will need two important things: a shared queue that the threads can put their ID in once they are done. And a shared Event that will signal all threads to stop working when it is triggered. The main thread will wait for the first thread to put something in the queue and will then trigger the event to stop all threads.
import threading
import random
import time
import Queue
def work(worker_queue, id, stop_event):
while not stop_event.is_set():
print "This is worker", id
# do stuff
time.sleep(random.random() * 5)
# put worker ID in queue
if not stop_event.is_set():
worker_queue.put(id)
break
# queue for workers
worker_queue = Queue.Queue()
# indicator for other threads to stop
stop_event = threading.Event()
# run workers
threads = []
threads.append(threading.Thread(target=work, args=(worker_queue, 0, stop_event)))
threads.append(threading.Thread(target=work, args=(worker_queue, 1, stop_event)))
threads.append(threading.Thread(target=work, args=(worker_queue, 2, stop_event)))
for thread in threads:
thread.start()
# this will block until the first element is in the queue
first_finished = worker_queue.get()
print first_finished, 'was first!'
# signal the rest to stop working
stop_event.set()
2. Each finish individually
Now this is much easier. Just call the join method on all Thread objects. This will wait for each thread to finish.
for thread in threads:
thread.start()
for thread in threads:
thread.join()
Btw, the above code is for Python 2.7. Let me know if you need Python 3
First off, don't use start_new_thread, it's a low-level primitive. Use the Thread class in the threading module instead.
Once you have that, Thread instances have a .join() method, which you can call from another thread (your program's main thread) to wait for them to terminate.
t1 = Thread(target=my_func)
t1.start()
# Waits for t1 to finish.
t1.join()
All threads will terminate when the process terminates.
Thus, if your main program ends after the try..except, then all three threads may get terminated prematurely. For example:
import thread
import logging
import time
logger = logging.getLogger(__name__)
def start(taskname, n):
for i in range(n):
logger.info('{}'.format(i))
time.sleep(0.1)
if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG,
format='[%(asctime)s %(threadName)s] %(message)s',
datefmt='%H:%M:%S')
try:
thread.start_new_thread( start, ("task1", 10) )
thread.start_new_thread( start, ("task2", 5) )
thread.start_new_thread( start, ("task3", 8) )
except Exception as err:
logger.exception(err)
may print something like
[14:15:16 Dummy-3] 0
[14:15:16 Dummy-1] 0
In contrast, if you place
time.sleep(5)
at the end of the script, then you see the full expected output from all three
threads.
Note also that the thread module is a low-level module; unless you have a
particular reason for using it, most often people use the threading module which
implements more useful features for dealing with threads, such as a join
method which blocks until the thread has finished. See below for an example.
The docs state:
When the function returns, the thread silently exits.
When the function terminates with an unhandled exception, a stack trace is
printed and then the thread exits (but other threads continue to run).
Thus, by default, when one thread finishes, the others continue to run.
The example above also demonstrates this.
To make all the threads exit when one function finishes is more difficult.
One thread can not kill another thread cleanly (e.g., without killing the entire
process.)
Using threading, you could arrange for the threads to set a variable
(e.g. flag) to True when finished, and have each thread check the state of
flag periodically and quit if it is True. But note that the other threads will
not necessarily terminate immediately; they will only terminate when they next
check the state of flag. If a thread is blocked, waiting for I/O for instance,
then it may not check the flag for a considerable amount of time (if ever!).
However, if the thread spends most of its time in a quick loop, you could check the state of flag once per iteration:
import threading
import logging
import time
logger = logging.getLogger(__name__)
def start(taskname, n):
global flag
for i in range(n):
if flag:
break
logger.info('{}'.format(i))
time.sleep(0.1)
else:
# get here if loop finishes without breaking
logger.info('FINISHED')
flag = True
if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG,
format='[%(asctime)s %(threadName)s] %(message)s',
datefmt='%H:%M:%S')
threads = list()
flag = False
try:
threads.append(threading.Thread(target=start, args=("task1", 10) ))
threads.append(threading.Thread(target=start, args=("task2", 5) ))
threads.append(threading.Thread(target=start, args=("task3", 8) ))
except Exception as err:
logger.exception(err)
for t in threads:
t.start()
for t in threads:
# make the main process wait until all threads have finished.
t.join()
I am running into problems when I attempt to terminate a run a long running process running on a separate thread.
The below is the program. WorkOne creates a subprocess and runs a long running process "adb logcat" that generates log lines. I start WorkOne in main(), wait for 5 sec and attempt to stop it. Multiple runs gives multiple outputs
import threading
import time
import subprocess
import sys
class WorkOne(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.event = threading.Event()
self.process = subprocess.Popen(['adb','logcat'], stdout=subprocess.PIPE, stderr=sys.stdout.fileno())
def run(self):
for line in iter(self.process.stdout.readline,''):
#print line
if self.event.is_set():
self.process.terminate()
self.process.kill()
break;
print 'exited For'
def stop(self):
self.event.set()
def main():
print 'starting worker1'
worker1 = WorkOne()
worker1.start()
print 'number of threads: ' + str(threading.active_count())
time.sleep(5)
worker1.stop()
worker1.join(5)
print 'number of threads: ' + str(threading.active_count())
if __name__ == '__main__':
main()
Sometimes I get [A]:
starting worker1
number of threads: 2
number of threads: 2
exited For
Sometimes I get [B]:
starting worker1
number of threads: 2
number of threads: 1
exited For
Sometimes I get [C]:
starting worker1
number of threads: 2
number of threads: 2
I think I should expect to get [B] all the time. What is going wrong here?
I think [B] is only possible if the subprocess takes less than 10 seconds: The main thread sleeps 5 seconds, and after that worker finishes within the 5 seconds timeout of join().
For 10 seconds or more, worker can be alive even after the join() call since it has a timeout argument, which may happen or not. Then you can get [A] (subprocess finishes a few seconds later) or [C] (subprocess finishes much later).
To get always [B], remove the timeout argument of join() so the main thread waits until worker finishes (or make sure you kill the process within 10 seconds by placing the kill call outside of the loop).
Change
if self.event.is_set():
self.process.terminate()
self.process.kill()
break;
to
if self.event.is_set():
self.process.terminate()
self.process.wait()
break
The semicolon is a dead giveaway that there is a problem here.
I am guessing that without the wait() the thread sometimes unblocks the work1.join(5) too soon. In those cases, threading.active_count() returns 2.
And, as #A.Rodas says, work1.join(5) should be work1.join() to ensure the join does not unblock until work1 is done.
By the way, I am not sure why you'd ever want to call terminate then kill in succession. On Unix, kill is a more severe form of terminate. On Windows, they are identical. So if you are going to call kill, there is no need to call terminate.
Since you know the program called by subprocess, you should also know if terminate suffices to stop it.
Therefore, you should only need one: either self.process.terminate() or self.process.kill().