Basic python multi-threading issue

Basic python multi-threading issue - python

New to python and trying to understand multi-threading. Here's an example from python documentation on Queue
For the heck of my life, I don't understand how this example is working. In the worker() function, there's an infinite loop. How does the worker know when to get out of the loop? There seems to be no breaking condition.
And what exactly is the join doing at the end? Shouldn't I be joining the threads instead?
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join() # block until all tasks are done
Also another question, When should multithreading be used and when should multiprocessing be used?

Yup. You're right. worker will run forever. However since Queue only has a finite number of items, eventually worker will permanently block at q.get() (Since there will be no more items in the queue). At this point, it's inconsequential that worker is still running. q.join() blocks until the Queue count drops to 0 (whenever the worker thread calls q.task_done, the count drops by 1). After that, the program ends. And the infinitely blocking thread dies with it's creator.

Regarding your second question, the biggest difference between threads and processes in Python is that the mainstream implementations use a global interpreter lock (GIL) to ensure that multiple threads can't mess up Python's internal data structures. This means that for programs that spend most of their time doing computation in pure Python, even with multiple CPUs you're not going to speed the program up much because only one thread at a time can hold the GIL. On the other hand, multiple threads can trivially share data in a Python program, and in some (but by no means all) cases, you don't have to worry too much about thread safety.
Where multithreading can speed up a Python program is when the program spends most of its time waiting on I/O -- disk access or, particularly these days, network operations. The GIL is not held while doing I/O, so many Python threads can run concurrently in I/O bound applications.
On the other hand, with multiprocessing, each process has its own GIL, so your performance can scale to the number of CPU cores you have available. The down side is that all communication between the processes will have to be done through a multiprocessing.Queue (which acts on the surface very like a Queue.Queue, but has very different underlying mechanics, since it has to communicate across process boundaries).
Since working through a thread safe or interprocess queue avoids a lot of potential threading problems, and since Python makes it so easy, the multiprocessing module is very attractive.

Agree with joel-cornett, mostly. I tried to run the following snippet in python2.7 :
from threading import Thread
from Queue import Queue
def worker():
def do_work(item):
print(item)
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
for i in range(4):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in range(10):
q.put(item)
q.join()
The output is:
0
1
2
3
4
5
6
7
8
9
Exception in thread Thread-3 (most likely raised during interpreter shutdown):
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
File "/usr/lib/python2.7/threading.py", line 504, in run
File "abc.py", line 9, in worker
File "/usr/lib/python2.7/Queue.py", line 168, in get
File "/usr/lib/python2.7/threading.py", line 236, in wait
<type 'exceptions.TypeError'>: 'NoneType' object is not callable
Most probable explanation i think:
As the queue gets empty after task exhaustion, parent thread quits, after returning from q.join() and destroys the queue. Child threads are terminated upon receiving the first TypeError exception produced in "item = q.get()", as the queue exists no more.

Related

What is the safest way to queue multiple threads originating in a loop?

My script loops through each line of an input file and performs some actions using the string in each line. Since the tasks performed on each line are independent of each other, I decided to separate the task into threads so that the script doesn't have to wait for the task to complete to continue with the loop. The code is given below.
def myFunction(line, param):
# Doing something with line and param
# Sends multiple HTTP requests and parse the response and produce outputs
# Returns nothing
param = arg[1]
with open(targets, "r") as listfile:
for line in listfile:
print("Starting a thread for: ",line)
t=threading.Thread(target=myFunction, args=(line, param,))
threads.append(t)
t.start()
I realized that this is a bad idea as the number of lines in the input file grew large. With this code, there would be as many threads as the number of lines. Researched a bit and figured that queues would be the way.
I want to understand the optimal way of using queues for this scenario and if there are any alternatives which I can use.

To go around this problem, you can use the concept of Thread Pools, where you define a fixed number of Threads/workers to be used, for example 5 workers, and whenever a Thread finishes executing, an other Future(ly) submmited thread would take its place automatically.
Example :
import concurrent.futures
def myFunction(line, param):
print("Done with :", line, param)
param = "param_example"
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
futures = []
with open("targets", "r") as listfile:
for line in listfile:
print("Starting a thread for: ", line)
futures.append(executor.submit(myFunction, line=line, param=param))
# waiting for the threads to finish and maybe print a result :
for future in concurrent.futures.as_completed(futures):
print(future.result()) # an Exceptino should be handled here!!!

Queues are one way to do it. The way to use them is to put function parameters on a queue, and use threads to get them and do the processing.
The queue size doesn't matter too much in this case because reading the next line is fast. In another case, a more optimized solution would be to set the queue size to at least twice the number of threads. That way if all threads finish processing an item from the queue at the same time, they will all have the next item in the queue ready to be processed.
To avoid complicating the code threads can be set as daemonic so that they don't stop the program from finishing after the processing is done. They will be terminated when the main process finishes.
The alternative is to put a special item on the queue (like None) for each thread and make the threads exit after getting it from the queue and then join the threads.
For the examples bellow the number of worker threads is set using the workers variable.
Here is an example of a solution using a queue.
from queue import Queue
from threading import Thread
queue = Queue(workers * 2)
def work():
while True:
myFunction(*queue.get())
queue.task_done()
for _ in range(workers):
Thread(target=work, daemon=True).start()
with open(targets, 'r') as listfile:
for line in listfile:
queue.put((line, param))
queue.join()
A simpler solution might be using ThreadPoolExecutor. It is especially simple in this case because the function being called doesn't return anything that needs to be used in the main thread.
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=workers) as executor:
with open(targets, 'r') as listfile:
for line in listfile:
executor.submit(myFunction, line, param)
Also, if it's not a problem to have all lines stored in memory, there is a solution which doesn't use anything other than threads. The work is split in such a way that the threads read some lines from a list and ignore other lines. A simple example with two threads is where one thread reads odd lines and the other reads even lines.
from threading import Thread
with open(targets, 'r') as listfile:
lines = listfile.readlines()
def work_split(n):
for line in lines[n::workers]:
myFunction(line, param)
threads = []
for n in range(workers):
t = Thread(target=work_split, args=(n,))
t.start()
threads.append(t)
for t in threads:
t.join()
I have done a quick benchmark and the Queue is slightly faster than the ThreadPoolExecutor, but the solution with the split work is faster than both.

From the code you have reported, has no sense the use of thread.
This because there aren't any I/O operations, and so the threads are executed in a linear way without multithread. The GIL (Global Interpreter Lock) is never released by a thread in this case, so the application is only apparently using multithreading, in reality the interpreter is using only one CPU for the program and one thread at time.
In this way you don't have any advantages on use of thread, on the contrary you can have a performance degradation for this scenario, due to the switch context, and to the thread initialization overhead when a thread starts.
The only way to have better performance in this scenario, if applicable in this case, is a multiprocess program. But pay attention on the number of process that you start, remember that every process has its own interpreter.

It was a good answer by GitFront. This answer just adds one more option using the multiprocessing package.
Using concurrent.futures or multiprocessing depends on particular requirements. Multiprocessing has a lot more options comparatively but for the given question the results should be near identical in the simplest case.
from multiprocessing import cpu_count, Pool
PROCESSES = cpu_count() # Warning: uses all cores
def pool_method(listfile, param):
p = Pool(processes=PROCESSES)
checker = [p.apply_async(myFunction, (line, param)) for line in listfile]
...
There are various other methods too other than "apply_async", but this should work well for your needs.

The workers in ThreadPoolExecutor is not really daemon

The thing I cannot figure out is that although ThreadPoolExecutor uses daemon workers, they will still run even if main thread exit.
I can provide a minimal example in python3.6.4:
import concurrent.futures
import time
def fn():
while True:
time.sleep(5)
print("Hello")
thread_pool = concurrent.futures.ThreadPoolExecutor()
thread_pool.submit(fn)
while True:
time.sleep(1)
print("Wow")
Both main thread and the worker thread are infinite loops. So if I use KeyboardInterrupt to terminate main thread, I expect that the whole program will terminate too. But actually the worker thread is still running even though it is a daemon thread.
The source code of ThreadPoolExecutor confirms that worker threads are daemon thread:
t = threading.Thread(target=_worker,
args=(weakref.ref(self, weakref_cb),
self._work_queue))
t.daemon = True
t.start()
self._threads.add(t)
Further, if I manually create a daemon thread, it works like a charm:
from threading import Thread
import time
def fn():
while True:
time.sleep(5)
print("Hello")
thread = Thread(target=fn)
thread.daemon = True
thread.start()
while True:
time.sleep(1)
print("Wow")
So I really cannot figure out this strange behavior.

Suddenly... I found why. According to much more source code of ThreadPoolExecutor:
# Workers are created as daemon threads. This is done to allow the interpreter
# to exit when there are still idle threads in a ThreadPoolExecutor's thread
# pool (i.e. shutdown() was not called). However, allowing workers to die with
# the interpreter has two undesirable properties:
# - The workers would still be running during interpreter shutdown,
# meaning that they would fail in unpredictable ways.
# - The workers could be killed while evaluating a work item, which could
# be bad if the callable being evaluated has external side-effects e.g.
# writing to a file.
#
# To work around this problem, an exit handler is installed which tells the
# workers to exit when their work queues are empty and then waits until the
# threads finish.
_threads_queues = weakref.WeakKeyDictionary()
_shutdown = False
def _python_exit():
global _shutdown
_shutdown = True
items = list(_threads_queues.items())
for t, q in items:
q.put(None)
for t, q in items:
t.join()
atexit.register(_python_exit)
There is an exit handler which will join all unfinished worker...

Here's the way to avoid this problem. Bad design can be beaten by another bad design. People write daemon=True only if they really know that the worker won't damage any objects or files.
In my case, I created TreadPoolExecutor with a single worker and after a single submit I just deleted the newly created thread from the queue so the interpreter won't wait till this thread stops on its own. Notice that worker threads are created after submit, not after the initialization of TreadPoolExecutor.
import concurrent.futures.thread
from concurrent.futures import ThreadPoolExecutor
...
executor = ThreadPoolExecutor(max_workers=1)
future = executor.submit(lambda: self._exec_file(args))
del concurrent.futures.thread._threads_queues[list(executor._threads)[0]]
It works in Python 3.8 but may not work in 3.9+ since this code is accessing private variables.
See the working piece of code on github

python multiprocessing .join() deadlock depends on worker function

I am using the multiprocessing python library to spawn 4 Process() objects to parallelize a cpu intensive task. The task (inspiration and code from this great article) is to compute the prime factors for every integer in a list.
main.py:
import random
import multiprocessing
import sys
num_inputs = 4000
num_procs = 4
proc_inputs = num_inputs/num_procs
input_list = [int(1000*random.random()) for i in xrange(num_inputs)]
output_queue = multiprocessing.Queue()
procs = []
for p_i in xrange(num_procs):
print "Process [%d]"%p_i
proc_list = input_list[proc_inputs * p_i:proc_inputs * (p_i + 1)]
print " - num inputs: [%d]"%len(proc_list)
# Using target=worker1 HANGS on join
p = multiprocessing.Process(target=worker1, args=(p_i, proc_list, output_queue))
# Using target=worker2 RETURNS with success
#p = multiprocessing.Process(target=worker2, args=(p_i, proc_list, output_queue))
procs.append(p)
p.start()
for p in jobs:
print "joining ", p, output_queue.qsize(), output_queue.full()
p.join()
print "joined ", p, output_queue.qsize(), output_queue.full()
print "Processing complete."
ret_vals = []
while output_queue.empty() == False:
ret_vals.append(output_queue.get())
print len(ret_vals)
print sys.getsizeof(ret_vals)
Observation:
If the target for each process is the function worker1, for an input list larger than 4000 elements the main thread gets stuck on .join(), waiting for the spawned processes to terminate and never returns.
If the target for each process is the function worker2, for the same input list the code works just fine and the main thread returns.
This is very confusing to me, as the only difference between worker1 and worker2 (see below) is that the former inserts individual lists in the Queue whereas the latter inserts a single list of lists for each process.
Why is there deadlock using worker1 and not using worker2 target?
Shouldn't both (or neither) go beyond the Multiprocessing Queue maxsize limit is 32767?
worker1 vs worker2:
def worker1(proc_num, proc_list, output_queue):
'''worker function which deadlocks'''
for num in proc_list:
output_queue.put(factorize_naive(num))
def worker2(proc_num, proc_list, output_queue):
'''worker function that works'''
workers_stuff = []
for num in proc_list:
workers_stuff.append(factorize_naive(num))
output_queue.put(workers_stuff)
There are a lot of similar questions on SO, but I believe the core of this questions is clearly distinct from all of them.
Related Links:
https://sopython.com/canon/82/programs-using-multiprocessing-hang-deadlock-and-never-complete/
python multiprocessing issues
python multiprocessing - process hangs on join for large queue
Process.join() and queue don't work with large numbers
Python 3 Multiprocessing queue deadlock when calling join before the queue is empty
Script using multiprocessing module does not terminate
Why does multiprocessing.Process.join() hang?
When to call .join() on a process?
What exactly is Python multiprocessing Module's .join() Method Doing?

The docs warn about this:
Warning: As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe.
This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.
While a Queue appears to be unbounded, under the covers queued items are buffered in memory to avoid overloading inter-process pipes. A process cannot end normally before those memory buffers are flushed. Your worker1() puts a lot more items on the queue than your worker2(), and that's all there is to it. Note that the number of items that can queued before the implementation resorts to buffering in memory isn't defined: it can vary across OS and Python release.
As the docs suggest, the normal way to avoid this is to .get() all the items off the queue before you attempt to .join() the processes. As you've discovered, whether it's necessary to do so depends in an undefined way on how many items have been put on the queue by each worker process.

Python threading with queue: how to avoid to use join?

I have a scenario with 2 threads:
a thread waiting for messages from a socket (embedded in a C library - blocking call is "Barra.ricevi") then putting an element on a queue
a thread waiting to get element from the queue and do something
Sample code
import Barra
import Queue
import threading
posQu = Queue.Queue(maxsize=0)
def threadCAN():
while True:
canMsg = Barra.ricevi("can0")
if canMsg[0] == 'ERR':
print (canMsg)
else:
print ("Enqueued message"), canMsg
posQu.put(canMsg)
thCan = threading.Thread(target = threadCAN)
thCan.daemon = True
thCan.start()
while True:
posMsg = posQu.get()
print ("Messagge from the queue"), posMsg
The result is that every time a new message is coming from the socket a new element is added to the queue, BUT the main thread that should get items from the queue is never woke up.
The output is as follow:
Enqueued message
Enqueued message
Enqueued message
Enqueued message
I expected to have:
Enqueued message
Messagge from the queue
Enqueued message
Messagge from the queue
The only way to solve this issue seams to add the line:
posQu.join()
at the end of the thread waiting for messages from the socket, and the line:
posQu.task_done()
at the end of the main thread.
In this case, after that a new message has been received from the socket, the thread is blocking waiting for the main thread to process the enqueued item.
Unfortunately this isn't the desired behavior since I would like a thread always ready to get messages from a socket and not waiting for a job to be compleated from another thread.
What I am doing wrong ?
Thanks
Andrew
(Italy)

This is likely because your Barra does not release the global interpreter lock (GIL) when Barra.ricevi. You may want to check this though.
The GIL ensures that only one thread can run at any one time (limiting the usefulness of threads in a multi-processor system). The GIL switches threads every 100 "ticks" -- a tick loosely mapping to bytecode instructions. See here for more details.
In your producer thread, not much happens outside of the C-library call. This means the producer thread will get to call Barra.ricevi a great many times before the GIL switches to another thread.
Solutions to this are to, in terms of increasing complexity:
Call time.sleep(0) after adding an item to the queue. This yields the thread so that another thread can run.
Use sys.setcheckinterval() to lower the amount of "ticks" executed before switching threads. This is will come at the cost of making the program much more computationally expensive.
Use multiprocessing rather than threading. This includes using multiprocessing.Queue instead of Queue.Queue.
Modify Barra so that it does release the GIL when its functions are called.
Example using multiprocessing. Be aware that when using multiprocessing, your processes no longer have an implied shared state. You will need to have a look at multiprocessing to see how to pass information between processes.
import Barra
import multiprocessing
def threadCAN(posQu):
while True:
canMsg = Barra.ricevi("can0")
if canMsg[0] == 'ERR':
print(canMsg)
else:
print("Enqueued message", canMsg)
posQu.put(canMsg)
if __name__ == "__main__":
posQu = multiprocessing.Queue(maxsize=0)
procCan = multiprocessing.Process(target=threadCAN, args=(posQu,))
procCan.daemon = True
procCan.start()
while True:
posMsg = posQu.get()
print("Messagge from the queue", posMsg)

python multiprocessing pool, wait for processes and restart custom processes

I used python multiprocessing and do wait of all processes with this code:
...
results = []
for i in range(num_extract):
url = queue.get(timeout=5)
try:
print "START PROCESS!"
result = pool.apply_async(process, [host,url],callback=callback)
results.append(result)
except Exception,e:
continue
for r in results:
r.get(timeout=7)
...
i try to use pool.join but get error:
Traceback (most recent call last):
File "C:\workspace\sdl\lxchg\walker4.py", line 163, in <module>
pool.join()
File "C:\Python25\Lib\site-packages\multiprocessing\pool.py", line 338, in joi
n
assert self._state in (CLOSE, TERMINATE)
AssertionError
Why join dont work? And what is the good way to wait all processes.
My second question is how can i restart certain process in pool? i need this in the reason of memory leak. Now In fact i do rebuild of all pool after all processes done their tasks (create new object pool to do process restarting).
What i need: for example i have 4 process in pool. Then process get his task, after task is done i need to kill process and start new (to refresh memory leak).

You are getting the error because you need to call pool.close() before calling pool.join()
I don't know of a good way to shut down a process started with apply_async but see if properly shutting down the pool doesn't make your memory leak go away.
The reason I think this is that the Pool class has a bunch of attributes that are threads running in daemon mode. All of these threads get cleaned up by the join method. The code you have now won't clean them up so if you create a new Pool, you'll still have all those threads running from the last one.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.