why should i join() after start() in multithreading in python?

why should i join() after start() in multithreading in python? - python

just as the sample code shows:
for temp in range(0, 10):
thread = threading.Thread(target = current_post.post)
thread.start()
threads.append(thread)
for current in range(10):
threads[current].join()
the code is just a part of a python file, but it stands for most circumstances: i should execute join() after start() in multithreading. i have been confusing by it for a few days. as we all know, when we execute thread.start(), a new thread then starts, python runs through different threads automatically, thats all we need. if so, why should i add thread.join() after start()? join() means waiting until the current thread finished IMO. but does it means a kind of single-thread? i have to wait each thread to finish their tasks, its not multithreading! join() only means executing the specified function one by one IMO. cannot start() finish the multi-threading perfectly? why should i add join() function to let them finish one by one? thx for any help :)

You do it in order to be sure that your threads have actually finished (and do not become, for example, zombie processes, after your main thread exits).
However, you don't have to do it right after starting the threads. You can do it at the very end of your process.

Join will block the current thread until the thread upon which join is called has finished.
Essentially your code is starting a load of threads and then waiting for them all to complete.
If you didn't then the chances are the process would exit and none of your threads would do anything.

Related

python function not running as thread

this is done in python 2.7.12
serialHelper is a class module arround python serial and this code does work nicely
#!/usr/bin/env python
import threading
from time import sleep
import serialHelper
sh = serialHelper.SerialHelper()
def serialGetter():
h = 0
while True:
h = h + 1
s_resp = sh.getResponse()
print ('response ' + s_resp)
sleep(3)
if __name__ == '__main__':
try:
t = threading.Thread(target=sh.serialReader)
t.setDaemon(True)
t.start()
serialGetter()
#tSR = threading.Thread(target=serialGetter)
#tSR.setDaemon(True)
#tSR.start()
except Exception as e:
print (e)
however the attemp to run serialGetter as thread as remarked it just dies.
Any reason why that function can not run as thread ?

Quoting from the Python documentation:
The entire Python program exits when no alive non-daemon threads are left.
So if you setDaemon(True) every new thread and then exit the main thread (by falling off the end of the script), the whole program will exit immediately. This kills all of the threads. Either don't use setDaemon(True), or don't exit the main thread without first calling join() on all of the threads you want to wait for.
Stepping back for a moment, it may help to think about the intended use case of a daemon thread. In Unix, a daemon is a process that runs in the background and (typically) serves requests or performs operations, either on behalf of remote clients over the network or local processes. The same basic idea applies to daemon threads:
You launch the daemon thread with some kind of work queue.
When you need some work done on the thread, you hand it a work object.
When you want the result of that work, you use an event or a future to wait for it to complete.
After requesting some work, you always eventually wait for it to complete, or perhaps cancel it (if your worker protocol supports cancellation).
You don't have to clean up the daemon thread at program termination. It just quietly goes away when there are no other threads left.
The problem is step (4). If you forget about some work object, and exit the app without waiting for it to complete, the work may get interrupted. Daemon threads don't gracefully shut down, so you could leave the outside world in an inconsistent state (e.g. an incomplete database transaction, a file that never got closed, etc.). It's often better to use a regular thread, and replace step (5) with an explicit "Finish up your work and shut down" work object that the main thread hands to the worker thread before exiting. The worker thread then recognizes this object, stops waiting on the work queue, and terminates itself once it's no longer doing anything else. This is slightly more up-front work, but is much safer in the event that a work object is inadvertently abandoned.
Because of all of the above, I recommend not using daemon threads unless you have a strong reason for them.

How to break an iteration on file lines in a thread

I have a reading thread in my application that listens on stdin. It blocks until some input is available. When some arrive, it accepts the lines, checks if they are valid commands and put them in a queue.
def ReadCommands( queue ):
for cmd in stdin:
if cmd=="":
break
# Check if cmd is valid and add to queue
queue = Queue()
thread = Thread( target=ReadCommands, args=( queue, ) )
thread.start()
Now when the main program wants to exit, it first has to join on this reading thread. The problem is that the thread is in a loop I have no control over. Even stdin.close() does not work.
How can I break the for loop in the reading thread from the main?
Alternatively, how can I write the for loop (with a while?) to be able to add my own boolean variable that would break the loop? Beware that I don't want an active waiting loop!

If you have threads that you just want to shut down on exiting your program, starting them in daemon mode is often the best way to go. If all non-daemon threads exit, your application will end, taking all daemon threads with it.
Note that you should only do this for threads that do not have to perform cleanup; your example seems to be fine for this.
Also, if you are performing a blocking C-level operation, a daemon thread may still block until it returns to the actual python scope. In that case, there is no python level option to break the block to begin with. Reading from broken sockets can be such an issue, for example.
If you need to explicitly kill a blocking thread before stopping your program, you will probably have to use the python's C-API. This can be implemented more cleanly but works in principle.

Python gevent pool.join() waiting forever

I have a function like this
def check_urls(res):
pool = Pool(10)
print pool.free_count()
for row in res:
pool.spawn(fetch, row[0], row[1])
pool.join()
pool.free_count() outputs value 10.
I used pdb to trace. Program works fine until pool.spawn() loop.
But its waiting forever at pool.join() line.
Can someone tell me whats wrong?

But its waiting forever at pool.join() line.
Can someone tell me whats wrong?
Nothing!
Though, I first wrote what's below the line, the join() function in gevent is still behaving pretty much the same way as in subprocess/threading. It's blocking until all the greenlets are done.
If you want to only test whether all the greenlets in the pool are over or not, you might want to check for the ready() on each greenlet of the pool:
is_over = all(gl.ready() for gl in pool.greenlets)
Basically, .join() is not waiting forever, it's waiting until your threads are over. If one of your threads is never ending, then join() will block forever. So make sure every greenlet thread terminate, and join() will get back to execution once all the jobs are done.
edit: The following applies only to subprocess or threading modules standard API. The GEvent's greenlet pools is not matching the "standard" API.
The join() method on a Thread/Process has for purpose to make the main process/thread wait forever until the children processes/threads are over.
You can use the timeout parameter to make it get back to execution after some time, or you can use the is_alive() method to check if it's running or not without blocking.
In the context of a process/thread pool, the join() also needs to be triggered after a call to either close() or terminate(), so you may want to:
for row in res:
pool.spawn(fetch, row[0], row[1])
pool.close()
pool.join()

When, why, and how to call thread.join() in Python?

I have this python threading code.
import threading
def sum(value):
sum = 0
for i in range(value+1):
sum += i
print "I'm done with %d - %d\n" % (value, sum)
return sum
r = range(500001, 500000*2, 100)
ts = []
for u in r:
t = threading.Thread(target=sum, args = (u,))
ts.append(t)
t.start()
for t in ts:
t.join()
Executing this, I have hundreds of threads are working.
However, when I move the t.join() right after the t.start(), I have only two threads working.
for u in r:
t = threading.Thread(target=sum, args = (u,))
ts.append(t)
t.start()
t.join()
I tested with the code that does not invoke the t.join(), but it seems to work fine?
Then when, how, and how to use thread.join()?

You seem to not understand what Thread.join does. When calling join, the current thread will block until that thread finished. So you are waiting for the thread to finish, preventing you from starting any other thread.
The idea behind join is to wait for other threads before continuing. In your case, you want to wait for all threads to finish at the end of the main program. Otherwise, if you didn’t do that, and the main program would end, then all threads it created would be killed. So usually, you should have a loop at the end, that joins all created threads to prevent the main thread from exiting down early.

Short answer: this one:
for t in ts:
t.join()
is generally the idiomatic way to start a small number of threads. Doing .join means that your main thread waits until the given thread finishes before proceeding in execution. You generally do this after you've started all of the threads.
Longer answer:
len(list(range(500001, 500000*2, 100)))
Out[1]: 5000
You're trying to start 5000 threads at once. It's miraculous your computer is still in one piece!
Your method of .join-ing in the loop that dispatches workers is never going to be able to have more than 2 threads (i.e. only one worker thread) going at once. Your main thread has to wait for each worker thread to finish before moving on to the next one. You've prevented a computer-meltdown, but your code is going to be WAY slower than if you'd just never used threading in the first place!
At this point I'd talk about the GIL, but I'll put that aside for the moment. What you need to limit your thread creation to a reasonable limit (i.e. more than one, less than 5000) is a ThreadPool. There are various ways to do this. You could roll your own - this is fairly simple with a threading.Semaphore. You could use 3.2+'s concurrent.futures package. You could use some 3rd party solution. Up to you, each is going to have a different API so I can't really discuss that further.
Obligatory GIL Discussion
cPython programmers have to live with the GIL. The Global Interpreter Lock, in short, means that only one thread can be executing python bytecode at once. This means that on processor-bound tasks (like adding a bunch of numbers), threading will not result in any speed-up. In fact, the overhead involved in setting up and tearing down threads (not to mention context switching) will result in a slowdown. Threading is better positioned to provide gains on I/O bound tasks, such as retrieving a bunch of URLs.
multiprocessing and friends sidestep the GIL limitation by, well, using multiple processes. This isn't free - data transfer between processes is expensive, so a lot of care needs to be made not to write workers that depend on shared state.

join() waits for your thread to finish, so the first use starts a hundred threads, and then waits for all of them to finish. The second use wait for end of every thread before it launches another one, which kind of defeats the purpose of threading.
The first use makes most sense. You run the threads (all of them) to do some parallel computation, and then wait until all of them finish, before you move on and use the results, to make sure the work is done (i.e. the results are actually there).

Python pause thread execution

Is there a way to "pause" the main python thread of an application perminantly?
I have some code that fires off two threads
class start():
def __init__(self):
Thread1= functions.threads.Thread1()
Thread1.setDaemon(True)
Thread1.start()
Thread2= functions.threads.Thread2()
Thread2.setDaemon(True)
Thread2.start()
#Stop thread here
At the moment, when the program gets to the end of that function it exits (There is nothing else for the main thread to do after that), killing the threads which run infinately (Looping). How do I stop the main process from exiting? I can do it with a while True: None loop but that uses a lot of CPU and there's probably a better way.

If you don't do setDaemon(True) on the threads, the process will keep running as long as the threads run for.
The daemon flag indicates that the interpreter needn't wait for a thread. It will exit when only daemon threads are left.

Use join:
Thread1.join()
Thread2.join()
Also note that setDaemon is the old API.
Thread1.daemon = True
is the preferred way now.

The whole point of daemon threads is to not prevent the application from being terminated if any of them is still running. You obviously want your threads to keep the application process alive, so don't make them daemons.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.