Starting n number of threads from a loop

Starting n number of threads from a loop - python

So basically, I've this function th() which counts till certain number and then prints "done".
I'd want to start n number of such threads at the same time, running simultaneously.
So I wrote:
thread_num = 3 #here n is 3, but I'd normally want something way higher
thrds = []
i = 0
while i < thread_num:
thr = Thread(target=th, args=())
thrds.append(thr)
i += 1
print("thread", str(i), "added")
for t in thrds:
t.start()
t.join()
I'd want all the threads to print "done" at the same time, but they have a noticeable lag in between of them. They print "thread i started" at seemingly the same time, but print "done" with quite a bit of time lag.
Why is this happening?
Edit: Since someone asked me to add th() function as well, here it is:
def th():
v = 0
num = 10**7
while v < num:
v += 1
print("done")

This is happening because of the t.join() method that you are calling on each thread before start the next one. t.join() blocks the execution of the current thread until the thread t has completed execution. So, each thread is starting after the previous one has finished.

You first have to start all the threads, then join all the threads in separate for loops; otherwise, each thread starts but runs to completion due to join before starting another thread.
for t in thrds: # start all the threads
t.start()
for t in thrds: # wait for all threads to finish
t.join()
If you only have a simple counting thread, you may need to add some short sleep to actually see the threads output intermingle as they may still run fast enough to complete before another thread starts.

Because you start and join each thread sequentially, one thread will run to completion before the next even starts. You'd be better off running a thread pool which is a more comprehensive implementation that handles multiple issues in multithreading.
Because of memory management and object reference count issues, python only lets a single thread execute byte code at a time. Periodically, each thread will release and reacquire the Global Interpreter Lock (GIL) to let other threads run. Exactly which thread runs at any given time is up to the operating system and you may find one gets more slices than another, causing staggered results.
To get them all to print "done" at the same time, you could use a control structure like a barrier for threads to wait until all are done. With a barrier, all threads must call wait before any can continue.
thread_num = 3 #here n is 3, but I'd normally want something way higher
wait_done = threading.Barrier(thread_num)
def th(waiter):
x = 1 # to what you want
waiter.wait()
print("done")
thrds = []
i = 0
while i < thread_num:
thr = Thread(target=th, args=(wait_done,))
thrds.append(thr)
i += 1
print("thread", str(i), "added")
for t in thrds:
t.start()
for t in thrds:
t.join()

Related

Threads are not doing their job when there is a print instruction [duplicate]

I have the following script (don't refer to the contents):
import _thread
def func1(arg1, arg2):
print("Write to CLI")
def verify_result():
func1()
for _ in range (4):
_thread.start_new_thread(func1, (DUT1_CLI, '0'))
verify_result()
I want to concurrently execute (say 4 threads) func1() which in my case includes a function call that can take time to execute. Then, only after the last thread finished its work I want to execute verify_result().
Currently, the result I get is that all threads finish their job, but verify_result() is executed before all threads finish their job.
I have even tried to use the following code (of course I imported threading) under the for loop but that didn't do the work (don't refer to the arguments)
t = threading.Thread(target = Enable_WatchDog, args = (URL_List[x], 180, Terminal_List[x], '0'))
t.start()
t.join()

Your last threading example is close, but you have to collect the threads in a list, start them all at once, then wait for them to complete all at once. Here's a simplified example:
import threading
import time
# Lock to serialize console output
output = threading.Lock()
def threadfunc(a,b):
for i in range(a,b):
time.sleep(.01) # sleep to make the "work" take longer
with output:
print(i)
# Collect the threads
threads = []
for i in range(10,100,10):
# Create 9 threads counting 10-19, 20-29, ... 90-99.
thread = threading.Thread(target=threadfunc,args=(i,i+10))
threads.append(thread)
# Start them all
for thread in threads:
thread.start()
# Wait for all to complete
for thread in threads:
thread.join()

Say you have a list of threads.
You loop(each_thread) over them -
for each_thread in thread_pool:
each_thread.start()
within the loop to start execution of the run function within each thread.
The same way, you write another loop after you start all threads and have
for each_thread in thread_pool:
each_thread.join()
what join does is that it will wait for thread i to finish execution before letting i+1th thread to finish execution.
The threads would run concurrently, join() would just synchronize the way each thread returns its results.
In your case specifically, you can the join() loop and the run verify_result() function.

Python - Why doesn't multithreading increase the speed of my code?

I tried improving my code by running this with and without using two threads:
from threading import Lock
from threading import Thread
import time
start_time = time.clock()
arr_lock = Lock()
arr = range(5000)
def do_print():
# Disable arr access to other threads; they will have to wait if they need to read
a = 0
while True:
arr_lock.acquire()
if len(arr) > 0:
item = arr.pop(0)
print item
arr_lock.release()
b = 0
for a in range(30000):
b = b + 1
else:
arr_lock.release()
break
thread1 = Thread(target=do_print)
thread1.start()
thread1.join()
print time.clock() - start_time, "seconds"
When running 2 threads my code's run time increased. Does anyone know why this happened, or perhaps know a different way to increase the performance of my code?

The primary reason you aren't seeing any performance improvements with multiple threads is because your program only enables one thread to do anything useful at a time. The other thread is always blocked.
Two things:
Remove the print statement that's invoked inside the lock. print statements drastically impact performance and timing. Also, the I/O channel to stdout is essentially single threaded, so you've built another implicit lock into your code. So let's just remove the print statement.
Use a proper sleep technique instead of "spin locking" and counting up from 0 to 30000. That's just going to burn a core needlessly.
Try this as your main loop
while True:
arr_lock.acquire()
if len(arr) > 0:
item = arr.pop(0)
arr_lock.release()
time.sleep(0)
else:
arr_lock.release()
break
This should run slightly better... I would even advocate getting the sleep statement out altogether so you can just let each thread have a full quantum.
However, because each thread is either doing "nothing" (sleeping or blocked on acquire) or just doing a single pop call on the array while in the lock, the majority of the time spent is going to be in the acquire/release calls instead of actually operating on the array. Hence, multiple threads aren't going to make your program run faster.

How to start a new thread when old one finishes?

I have a large dataset in a list that I need to do some work on.
I want to start x amounts of threads to work on the list at any given time, until everything in that list has been popped.
I know how to start x amounts of threads (lets say 20) at a given time (by using thread1....thread20.start())
but how do I make it start a new thread when one of the first 20 threads finish? so at any given time there are 20 threads running, until the list is empty.
what I have so far:
class queryData(threading.Thread):
def __init__(self,threadID):
threading.Thread.__init__(self)
self.threadID = threadID
def run(self):
global lst
#Get trade from list
trade = lst.pop()
tradeId=trade[0][1][:6]
print tradeId
thread1 = queryData(1)
thread1.start()
Update
I have something going with the following code:
for i in range(20):
threads.append(queryData(i))
for thread in threads:
thread.start()
while len(lst)>0:
for iter,thread in enumerate(threads):
thread.join()
lock.acquire()
threads[iter] = queryData(i)
threads[iter].start()
lock.release()
Now it starts 20 threads in the beginning...and then keeps starting a new thread when one finishes.
However, it is not efficient, as it waits for the first one in the list to finish, and then the second..and so on.
Is there a better way of doing this?
Basically I need:
-Start 20 threads:
-While list is not empty:
-wait for 1 of the 20 threads to finish
-reuse or start a new thread

As I suggested in a comment, I think using a multiprocessing.pool.ThreadPool would be appropriate — because it would handle much of the thread management you're manually doing in your code automatically. Once all the threads are queued-up for processing via ThreadPool's apply_async() method calls, the only thing that needs to be done is wait until they've all finished execution (unless there's something else your code could be doing, of course).
I've translated the code in my linked answer to another related question so it's more similar to what you appear to be doing to make it easier to understand in the current context.
from multiprocessing.pool import ThreadPool
from random import randint
import threading
import time
MAX_THREADS = 5
print_lock = threading.Lock() # Prevent overlapped printing from threads.
def query_data(trade):
trade_id = trade[0][1][:6]
time.sleep(randint(1, 3)) # Simulate variable working time for testing.
with print_lock:
print(trade_id)
def process_trades(trade_list):
pool = ThreadPool(processes=MAX_THREADS)
results = []
while(trade_list):
trade = trade_list.pop()
results.append(pool.apply_async(query_data, (trade,)))
pool.close() # Done adding tasks.
pool.join() # Wait for all tasks to complete.
def test():
trade_list = [[['abc', ('%06d' % id) + 'defghi']] for id in range(1, 101)]
process_trades(trade_list)
if __name__ == "__main__":
test()

You can wait for a thread to complete with : thread.join(). This call will block until that thread completes, at which point you can create a new one.
However, instead of respawning a Thread each time, why not recycle your existing threads ?
This can be done by the use of tasks for example. You keep a list of tasks in a shared collection, and when one of your threads finishes a task, it retrieves another one from that collection.

Asynchronous multiprocessing with a worker pool in Python: how to keep going after timeout?

I would like to run a number of jobs using a pool of processes and apply a given timeout after which a job should be killed and replaced by another working on the next task.
I have tried to use the multiprocessing module which offers a method to run of pool of workers asynchronously (e.g. using map_async), but there I can only set a "global" timeout after which all processes would be killed.
Is it possible to have an individual timeout after which only a single process that takes too long is killed and a new worker is added to the pool again instead (processing the next task and skipping the one that timed out)?
Here's a simple example to illustrate my problem:
def Check(n):
import time
if n % 2 == 0: # select some (arbitrary) subset of processes
print "%d timeout" % n
while 1:
# loop forever to simulate some process getting stuck
pass
print "%d done" % n
return 0
from multiprocessing import Pool
pool = Pool(processes=4)
result = pool.map_async(Check, range(10))
print result.get(timeout=1)
After the timeout all workers are killed and the program exits. I would like instead that it continues with the next subtask. Do I have to implement this behavior myself or are there existing solutions?
Update
It is possible to kill the hanging workers and they are automatically replaced. So I came up with this code:
jobs = pool.map_async(Check, range(10))
while 1:
try:
print "Waiting for result"
result = jobs.get(timeout=1)
break # all clear
except multiprocessing.TimeoutError:
# kill all processes
for c in multiprocessing.active_children():
c.terminate()
print result
The problem now is that the loop never exits; even after all tasks have been processed, calling get yields a timeout exception.

The pebble Pool module has been built for solving these types of issue. It supports timeout on given tasks allowing to detect them and easily recover.
from pebble import ProcessPool
from concurrent.futures import TimeoutError
with ProcessPool() as pool:
future = pool.schedule(function, args=[1,2], timeout=5)
try:
result = future.result()
except TimeoutError:
print "Function took longer than %d seconds" % error.args[1]
For your specific example:
from pebble import ProcessPool
from concurrent.futures import TimeoutError
results = []
with ProcessPool(max_workers=4) as pool:
future = pool.map(Check, range(10), timeout=5)
iterator = future.result()
# iterate over all results, if a computation timed out
# print it and continue to the next result
while True:
try:
result = next(iterator)
results.append(result)
except StopIteration:
break
except TimeoutError as error:
print "function took longer than %d seconds" % error.args[1]
print results

Currently the Python does not provide native means to the control execution time of each distinct task in the pool outside the worker itself.
So the easy way is to use wait_procs in the psutil module and implement the tasks as subprocesses.
If nonstandard libraries are not desirable, then you have to implement own Pool on base of subprocess module having the working cycle in the main process, poll() - ing the execution of each worker and performing required actions.
As for the updated problem, the pool becomes corrupted if you directly terminate one of the workers (it is the bug in the interpreter implementation, because such behavior should not be allowed): the worker is recreated, but the task is lost and the pool becomes nonjoinable.
You have to terminate all the pool and then recreate it again for another tasks:
from multiprocessing import Pool
while True:
pool = Pool(processes=4)
jobs = pool.map_async(Check, range(10))
print "Waiting for result"
try:
result = jobs.get(timeout=1)
break # all clear
except multiprocessing.TimeoutError:
# kill all processes
pool.terminate()
pool.join()
print result
UPDATE
Pebble is an excellent and handy library, which solves the issue. Pebble is designed for the asynchronous execution of Python functions, where is PyExPool is designed for the asynchronous execution of modules and external executables, though both can be used interchangeably.
One more aspect is when 3dparty dependencies are not desirable, then PyExPool can be a good choice, which is a single-file lightweight implementation of Multi-process Execution Pool with per-Job and global timeouts, opportunity to group Jobs into Tasks and other features.
PyExPool can be embedded into your sources and customized, having permissive Apache 2.0 license and production quality, being used in the core of one high-loaded scientific benchmarking framework.

Try the construction where each process is being joined with a timeout on a separate thread. So the main program never gets stuck and as well the processes which if gets stuck, would be killed due to timeout. This technique is a combination of threading and multiprocessing modules.
Here is my way to maintain the minimum x number of threads in the memory. Its an combination of threading and multiprocessing modules. It may be unusual to other techniques like respected fellow members have explained above BUT may be worth considerable. For the sake of explanation, I am taking a scenario of crawling a minimum of 5 websites at a time.
so here it is:-
#importing dependencies.
from multiprocessing import Process
from threading import Thread
import threading
# Crawler function
def crawler(domain):
# define crawler technique here.
output.write(scrapeddata + "\n")
pass
Next is threadController function. This function will control the flow of threads to the main memory. It will keep activating the threads to maintain the threadNum "minimum" limit ie. 5. Also it won't exit until, all Active threads(acitveCount) are finished up.
It will maintain a minimum of threadNum(5) startProcess function threads (these threads will eventually start the Processes from the processList while joining them with a time out of 60 seconds). After staring threadController, there would be 2 threads which are not included in the above limit of 5 ie. the Main thread and the threadController thread itself. thats why threading.activeCount() != 2 has been used.
def threadController():
print "Thread count before child thread starts is:-", threading.activeCount(), len(processList)
# staring first thread. This will make the activeCount=3
Thread(target = startProcess).start()
# loop while thread List is not empty OR active threads have not finished up.
while len(processList) != 0 or threading.activeCount() != 2:
if (threading.activeCount() < (threadNum + 2) and # if count of active threads are less than the Minimum AND
len(processList) != 0): # processList is not empty
Thread(target = startProcess).start() # This line would start startThreads function as a seperate thread **
startProcess function, as a separate thread, would start Processes from the processlist. The purpose of this function (**started as a different thread) is that It would become a parent thread for Processes. So when It will join them with a timeout of 60 seconds, this would stop the startProcess thread to move ahead but this won't stop threadController to perform. So this way, threadController will work as required.
def startProcess():
pr = processList.pop(0)
pr.start()
pr.join(60.00) # joining the thread with time out of 60 seconds as a float.
if __name__ == '__main__':
# a file holding a list of domains
domains = open("Domains.txt", "r").read().split("\n")
output = open("test.txt", "a")
processList = [] # thread list
threadNum = 5 # number of thread initiated processes to be run at one time
# making process List
for r in range(0, len(domains), 1):
domain = domains[r].strip()
p = Process(target = crawler, args = (domain,))
processList.append(p) # making a list of performer threads.
# starting the threadController as a seperate thread.
mt = Thread(target = threadController)
mt.start()
mt.join() # won't let go next until threadController thread finishes.
output.close()
print "Done"
Besides maintaining a minimum number of threads in the memory, my aim was to also have something which could avoid stuck threads or processes in the memory. I did this using the time out function. My apologies for any typing mistake.
I hope this construction would help anyone in this world.
Regards,
Vikas Gautam

What is the use of join() in threading?

I was studying the python threading and came across join().
The author told that if thread is in daemon mode then i need to use join() so that thread can finish itself before main thread terminates.
but I have also seen him using t.join() even though t was not daemon
example code is this
import threading
import time
import logging
logging.basicConfig(level=logging.DEBUG,
format='(%(threadName)-10s) %(message)s',
)
def daemon():
logging.debug('Starting')
time.sleep(2)
logging.debug('Exiting')
d = threading.Thread(name='daemon', target=daemon)
d.setDaemon(True)
def non_daemon():
logging.debug('Starting')
logging.debug('Exiting')
t = threading.Thread(name='non-daemon', target=non_daemon)
d.start()
t.start()
d.join()
t.join()
i don't know what is use of t.join() as it is not daemon and i can see no change even if i remove it

A somewhat clumsy ascii-art to demonstrate the mechanism:
The join() is presumably called by the main-thread. It could also be called by another thread, but would needlessly complicate the diagram.
join-calling should be placed in the track of the main-thread, but to express thread-relation and keep it as simple as possible, I choose to place it in the child-thread instead.
without join:
+---+---+------------------ main-thread
| |
| +........... child-thread(short)
+.................................. child-thread(long)
with join
+---+---+------------------***********+### main-thread
| | |
| +...........join() | child-thread(short)
+......................join()...... child-thread(long)
with join and daemon thread
+-+--+---+------------------***********+### parent-thread
| | | |
| | +...........join() | child-thread(short)
| +......................join()...... child-thread(long)
+,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, child-thread(long + daemonized)
'-' main-thread/parent-thread/main-program execution
'.' child-thread execution
'#' optional parent-thread execution after join()-blocked parent-thread could
continue
'*' main-thread 'sleeping' in join-method, waiting for child-thread to finish
',' daemonized thread - 'ignores' lifetime of other threads;
terminates when main-programs exits; is normally meant for
join-independent tasks
So the reason you don't see any changes is because your main-thread does nothing after your join.
You could say join is (only) relevant for the execution-flow of the main-thread.
If, for example, you want to concurrently download a bunch of pages to concatenate them into a single large page, you may start concurrent downloads using threads, but need to wait until the last page/thread is finished before you start assembling a single page out of many. That's when you use join().

Straight from the docs
join([timeout])
Wait until the thread terminates. This blocks the calling thread until the thread whose join() method is called terminates – either normally or through an unhandled exception – or until the optional timeout occurs.
This means that the main thread which spawns t and d, waits for t to finish until it finishes.
Depending on the logic your program employs, you may want to wait until a thread finishes before your main thread continues.
Also from the docs:
A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left.
A simple example, say we have this:
def non_daemon():
time.sleep(5)
print 'Test non-daemon'
t = threading.Thread(name='non-daemon', target=non_daemon)
t.start()
Which finishes with:
print 'Test one'
t.join()
print 'Test two'
This will output:
Test one
Test non-daemon
Test two
Here the master thread explicitly waits for the t thread to finish until it calls print the second time.
Alternatively if we had this:
print 'Test one'
print 'Test two'
t.join()
We'll get this output:
Test one
Test two
Test non-daemon
Here we do our job in the main thread and then we wait for the t thread to finish. In this case we might even remove the explicit joining t.join() and the program will implicitly wait for t to finish.

Thanks for this thread -- it helped me a lot too.
I learned something about .join() today.
These threads run in parallel:
d.start()
t.start()
d.join()
t.join()
and these run sequentially (not what I wanted):
d.start()
d.join()
t.start()
t.join()
In particular, I was trying to clever and tidy:
class Kiki(threading.Thread):
def __init__(self, time):
super(Kiki, self).__init__()
self.time = time
self.start()
self.join()
This works! But it runs sequentially. I can put the self.start() in __ init __, but not the self.join(). That has to be done after every thread has been started.
join() is what causes the main thread to wait for your thread to finish. Otherwise, your thread runs all by itself.
So one way to think of join() as a "hold" on the main thread -- it sort of de-threads your thread and executes sequentially in the main thread, before the main thread can continue. It assures that your thread is complete before the main thread moves forward. Note that this means it's ok if your thread is already finished before you call the join() -- the main thread is simply released immediately when join() is called.
In fact, it just now occurs to me that the main thread waits at d.join() until thread d finishes before it moves on to t.join().
In fact, to be very clear, consider this code:
import threading
import time
class Kiki(threading.Thread):
def __init__(self, time):
super(Kiki, self).__init__()
self.time = time
self.start()
def run(self):
print self.time, " seconds start!"
for i in range(0,self.time):
time.sleep(1)
print "1 sec of ", self.time
print self.time, " seconds finished!"
t1 = Kiki(3)
t2 = Kiki(2)
t3 = Kiki(1)
t1.join()
print "t1.join() finished"
t2.join()
print "t2.join() finished"
t3.join()
print "t3.join() finished"
It produces this output (note how the print statements are threaded into each other.)
$ python test_thread.py
32 seconds start! seconds start!1
seconds start!
1 sec of 1
1 sec of 1 seconds finished!
21 sec of
3
1 sec of 3
1 sec of 2
2 seconds finished!
1 sec of 3
3 seconds finished!
t1.join() finished
t2.join() finished
t3.join() finished
$
The t1.join() is holding up the main thread. All three threads complete before the t1.join() finishes and the main thread moves on to execute the print then t2.join() then print then t3.join() then print.
Corrections welcome. I'm also new to threading.
(Note: in case you're interested, I'm writing code for a DrinkBot, and I need threading to run the ingredient pumps concurrently rather than sequentially -- less time to wait for each drink.)

The method join()
blocks the calling thread until the thread whose join() method is called is terminated.
Source : http://docs.python.org/2/library/threading.html

With join - interpreter will wait until your process get completed or terminated
>>> from threading import Thread
>>> import time
>>> def sam():
... print 'started'
... time.sleep(10)
... print 'waiting for 10sec'
...
>>> t = Thread(target=sam)
>>> t.start()
started
>>> t.join() # with join interpreter will wait until your process get completed or terminated
done? # this line printed after thread execution stopped i.e after 10sec
waiting for 10sec
>>> done?
without join - interpreter wont wait until process get terminated,
>>> t = Thread(target=sam)
>>> t.start()
started
>>> print 'yes done' #without join interpreter wont wait until process get terminated
yes done
>>> waiting for 10sec

In python 3.x join() is used to join a thread with the main thread i.e. when join() is used for a particular thread the main thread will stop executing until the execution of joined thread is complete.
#1 - Without Join():
import threading
import time
def loiter():
print('You are loitering!')
time.sleep(5)
print('You are not loitering anymore!')
t1 = threading.Thread(target = loiter)
t1.start()
print('Hey, I do not want to loiter!')
'''
Output without join()-->
You are loitering!
Hey, I do not want to loiter!
You are not loitering anymore! #After 5 seconds --> This statement will be printed
'''
#2 - With Join():
import threading
import time
def loiter():
print('You are loitering!')
time.sleep(5)
print('You are not loitering anymore!')
t1 = threading.Thread(target = loiter)
t1.start()
t1.join()
print('Hey, I do not want to loiter!')
'''
Output with join() -->
You are loitering!
You are not loitering anymore! #After 5 seconds --> This statement will be printed
Hey, I do not want to loiter!
'''

This example demonstrate the .join() action:
import threading
import time
def threaded_worker():
for r in range(10):
print('Other: ', r)
time.sleep(2)
thread_ = threading.Timer(1, threaded_worker)
thread_.daemon = True # If the main thread is killed, this thread will be killed as well.
thread_.start()
flag = True
for i in range(10):
print('Main: ', i)
time.sleep(2)
if flag and i > 4:
print(
'''
Threaded_worker() joined to the main thread.
Now we have a sequential behavior instead of concurrency.
''')
thread_.join()
flag = False
Out:
Main: 0
Other: 0
Main: 1
Other: 1
Main: 2
Other: 2
Main: 3
Other: 3
Main: 4
Other: 4
Main: 5
Other: 5
Threaded_worker() joined to the main thread.
Now we have a sequential behavior instead of concurrency.
Other: 6
Other: 7
Other: 8
Other: 9
Main: 6
Main: 7
Main: 8
Main: 9

When making join(t) function for both non-daemon thread and daemon thread, the main thread (or main process) should wait t seconds, then can go further to work on its own process. During the t seconds waiting time, both of the children threads should do what they can do, such as printing out some text. After the t seconds, if non-daemon thread still didn't finish its job, and it still can finish it after the main process finishes its job, but for daemon thread, it just missed its opportunity window. However, it will eventually die after the python program exits. Please correct me if there is something wrong.

There are a few reasons for the main thread (or any other thread) to join other threads
A thread may have created or holding (locking) some resources. The join-calling thread may be able to clear the resources on its behalf
join() is a natural blocking call for the join-calling thread to continue after the called thread has terminated.
If a python program does not join other threads, the python interpreter will still join non-daemon threads on its behalf.

join() waits for both non-daemon and daemon threads to be completed.
Without join(), non-daemon threads are running and are completed with the main thread concurrently.
Without join(), daemon threads are running with the main thread concurrently and when the main thread is completed, the daemon threads are exited without completed if the daemon threads are still running.
So, with join() and daemon=False(daemon threads) below (daemon is False by default):
import time
from threading import Thread
def test1():
for _ in range(3):
print("Test1 is running...")
time.sleep(1)
print("Test1 is completed")
def test2():
for _ in range(3):
print("Test2 is running...")
time.sleep(1)
print("Test2 is completed")
# Here
thread1 = Thread(target=test1, daemon=False)
thread2 = Thread(target=test2, daemon=False)
# Here
thread1.start()
thread2.start()
thread1.join() # Here
thread2.join() # Here
print("Main is completed")
Or, with join() and daemon=True(non-daemon threads) below:
# ...
# Here
thread1 = Thread(target=test1, daemon=True)
thread2 = Thread(target=test2, daemon=True)
# Here
# ...
thread1.join() # Here
thread2.join() # Here
print("Main is completed")
join() waits for Test1 and Test2 non-daemon or daemon threads to be completed. So, Main is completed is printed after Test1 and Test2 threads are completed as shown below:
Test1 is running...
Test2 is running...
Test1 is running...
Test2 is running...
Test1 is running...
Test2 is running...
Test1 is completed
Test2 is completed
Main is completed
And, if not using join() and if daemon=False(non-daemon threads) below:
# ...
# Here
thread1 = Thread(target=test1, daemon=False)
thread2 = Thread(target=test2, daemon=False)
# Here
# ...
# thread1.join()
# thread2.join()
print("Main is completed")
Test1 and Test2 non-daemon threads are running and completed with the main thread concurrently. So, Main is completed is printed before Test1 and Test2 threads are completed as shown below:
Test1 is running...
Test2 is running...
Main is completed
Test1 is running...
Test2 is running...
Test1 is running...
Test2 is running...
Test1 is completed
Test2 is completed
And, if not using join() and if daemon=True(daemon threads) below:
# ...
# Here
thread1 = Thread(target=test1, daemon=True)
thread2 = Thread(target=test2, daemon=True)
# Here
# ...
# thread1.join()
# thread2.join()
print("Main is completed")
Test1 and Test2 daemon threads are running with the main thread concurrently. So, Main is completed is printed before Test1 and Test2 daemon threads are completed and when the main thread is completed, Test1 and Test2 daemon threads are exited without completed as shown below:
Test1 is running...
Test2 is running...
Main is completed

Looks like difference between synchronous and asynchronous processing is missunderstood here.
A thread is meant to execute a sub-procedure, most of the times on a "parallel" or "concurrent" fashion (depends on whether the device has multi-processors or not). But, what's the point on concurrency? For the most part, it's about improving performance of a process, by applying the idea of "divide and conquer". Have several threads (sub-processes) executing a "portion" of the whole process simultaneously, and then have a "final" step where all sub-processes results are combined (joined; hence the "join" method).
Of course, in order to achieve such gain on efficiency, the portions that are divided into threads, must be "mutually exclusive" (i.e., they don't share values to be updated... -- known in parallel computing as "critical section" -- ). If there is at least one value that is updated by two or more threads, then one has to wait for the other to "finish" its update, otherwise obtaining inconsistent results (i.e., two persons owning a bank account intend to withdraw certain amount of money in an ATM... if there won't be a proper mechanism that "locks" or "protects" the variable "balance" in both of the ATM devices, withdraws will completely screw-up the final value of the balance, causing obvious serious financial problem to the account owners).
So, coming back to the purpose of a thread in parallel computing: have all threads doing their individual part, and use "join" to make them "come back" to the main process so that each individual result is then "consolidated" into a global one.
Examples? A bunch of them, but let's just enumarate a few ones clearly explained:
Matrix multiplication: have each thread multiplying a vector of matrix A by the whole second matrix B, to obtain a vector of matrix C. At the end, have all resulting vestors put together to "display" (show) result: matrix C. In this example, although matrix B is used by all threads, no value of it is ever updated or modified (read-only).
Summation, product of an array of massive numbers (an array of thousand of values, whether integer or float). Make threads to execute partial sums/products (say, if you have to sum 10K values, create 5 threads, each with 2K values); then with "join" make them return to the main process and sum individual results of all 5 threads.
Theoretically, the process will do 2000 + 5 steps (2000 simultaneously in 5 threads, plus summation of final 5 sub-totals in the main process). In practice, though, how long do the 5 threads take to do its own 2000 numbers summation is completely variable as different factors get involved here (processor speed, electrical flow, or if it is a web service, network latency, and so on). However, the amount ot time invested would be in the "worst case", the amount of time the "slowest" thread takes, plus the final summation of 5 results step. Also, in practice, a thread that is meant to do 20% of the whole job, unlikely will take much longer than a single sequential process that would do 100% of the job (of course, it also depends on the size of the sample to be processed... the advantage won't be the same on a summation of 10K values, than summation of just 10 values with the same 5 threads... it's non-practicall, not worth it).
Quick sort: We all know in general how quick sort works. However, there's a chance to improve it, if, say, we execute it in TWO threads: one that does the odd numbers and one that does the even ones. Then executes recursively and at some point it joins results of both threads and does a final quick sort in a fashion that will not require so many repetitions as numbers will be sufficiently ordered after the two threads did its initial job. That's a serios gain on performance with a quite big and unordered number of items. Chances are three threads can be used by doing some arrangement to the logic behind it, but its gain is really minimum and not worth to be programmed. However, two threads have a decent performance (time) gain.
So, usage of "join" in python (or it's equivalent in other "concurrency" languages) has an important significance; but depends a lot on the programming understanding what does s/he want to "paralellize" and how skilled s/he is in splitting the algorithm in the right steps to be parallellized vs. what steps need to be kept in the main process. It's more a problem of "logic" thinking than a programming "anti-pattern".

"What's the use of using join()?" you say. Really, it's the same answer as "what's the use of closing files, since python and the OS will close my file for me when my program exits?".
It's simply a matter of good programming. You should join() your threads at the point in the code that the thread should not be running anymore, either because you positively have to ensure the thread is not running to interfere with your own code, or that you want to behave correctly in a larger system.
You might say "I don't want my code to delay giving an answer" just because of the additional time that the join() might require. This may be perfectly valid in some scenarios, but you now need to take into account that your code is "leaving cruft around for python and the OS to clean up". If you do this for performance reasons, I strongly encourage you to document that behavior. This is especially true if you're building a library/package that others are expected to utilize.
There's no reason to not join(), other than performance reasons, and I would argue that your code does not need to perform that well.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.