Join to later event

Join to later event - python

Consider a Python script which spans multiple processes or threads (I have not decided which yet, I'm still experimenting). I would like to join all processes/threads spawned at a time later than when they are created. For that purpose I am using a list as a thread pool:
pool = []
for job in jobs:
j = Process(target=some_Function, args=(some_Arg, ))
j.start()
pool.append(j)
# Lots of code
for p in pool:
p.join()
do_Right_Before_Exiting()
sys.exit()
The problem with this approach is that there are many, many processes and the pool is acting like a memory leak. Eliminating the pool saves considerable memory, but I have no way to ensure that the processes finish. I could 'hack' a sleep(30) to give them time to complete, but I find this ugly and unreliable.
How might I join a process / thread to a point other than where the code is run (maybe join to something like a C# label, or join to another event)? Alternatively, how might I check that all processes / threads have completed and only then proceed?

If an array of your threads takes up too much memory, I think you have too many threads. Try dividing the jobs into 8 threads only and see if that improves things.
Most people don't have more than 8 cores.

Related

Real difference between thread and threadpool?

In this example is there any real difference or is it just syntactic sugar?
threads = []
for job in jobs:
t = threading.Thread(target=job, args=[exchange])
t.start()
threads.append(t)
for thread in threads:
thread.join()
And
with concurrent.futures.ThreadPoolExecutor(max_workers=len(jobs)) as executor:
for job in jobs:
executor.submit(job, exchange)
Main point of ThreadPool should be to reuse threads but in this example are all threads exited after "with" statement, Am I right?
How to achieve reuse? Keep instance of ThreadPool alive somewhere without with statement?

You can keep the ThreadPool alive somewhere else for as long as you need. But in this particular case you probably want to utilize the result of .submit like this:
with concurrent.futures.ThreadPoolExecutor(max_workers=len(jobs)) as executor:
futures = []
for job in jobs:
future = executor.submit(job, exchange)
futures.append(future)
for future in futures:
future.result()
which is very similar to raw threads, except threads are reused and with future.result() we can retrieve value (if any) and catch exceptions (you may want to try-except the future.result() call).
Btw, I wouldn't do max_workers=len(jobs), it seems to be against the point of ThreadPool. Also I encourage you to have a look at async api instead. Threads are of limited usage in Python anyway.

What you're asking is like asking whether there is any real difference between owning a truck, and renting a truck just on the days when you need it.
A thread is like the truck. A thread pool is like the truck rental company. Any time you create a thread pool, you are indirectly creating threads—probably more than one.
Creating and destroying threads is a costly operation. Thread pools are useful in programs that continually create many small tasks that need to be performed in different threads. Instead of creating and destroying a new thread for each task, the program submits the task to a thread pool, and the thread pool assigns the tasks to one of its worker threads. The worker threads can live a long time. They don't need to be continually created and destroyed because each one can perform any number of tasks.
If the "tasks" that your program creates need to run for almost as long as the whole program itself, Then it might make more sense just to create raw threads for that. But, if your program creates many short-lived tasks, then the thread pool probably is the better choice.

Python What is the difference between a Pool of worker processes and just running multiple Processes?

I am not sure when to use pool of workers vs multiple processes.
processes = []
for m in range(1,5):
p = Process(target=some_function)
p.start()
processes.append(p)
for p in processes:
p.join()
vs
if __name__ == '__main__':
# start 4 worker processes
with Pool(processes=4) as pool:
pool_outputs = pool.map(another_function, inputs)

As it says on PYMOTW:
The Pool class can be used to manage a fixed number of workers for
simple cases where the work to be done can be broken up and
distributed between workers independently.
The return values from the jobs are collected and returned as a list.
The pool arguments include the number of processes and a function to
run when starting the task process (invoked once per child).
Please have a look at the examples given there to better understand its application, functionalities and parameters.
Basically the Pool is a helper, easing the management of the processes (workers) in those cases where all they need to do is consume common input data, process it in parallel and produce a joint output.
The Pool does quite a few things that otherwise you should code yourself (not too hard, but still, it's convenient to find a pre-cooked solution)
i.e.
the splitting of the input data
the target process function is simplified: it can be designed to expect one input element only. The Pool is going to call it providing each element from the subset allocated to that worker
waiting for the workers to finish their job (i.e. joining the processes)
...
merging the output of each worker to produce the final output

Below information might help you understanding the difference between Pool and Process in Python multiprocessing class:
Pool:
When you have junk of data, you can use Pool class.
Only the process under executions are kept in the memory.
I/O operation: It waits till the I/O operation is completed & does not schedule another process. This might increase the execution time.
Uses FIFO scheduler.
Process:
When you have a small data or functions and less repetitive tasks to do.
It puts all the process in the memory. Hence in the larger task, it might cause to loss of memory.
I/O operation: The process class suspends the process executing I/O operations and schedule another process parallel.
Uses FIFO scheduler.

Python multiprocessing - Why does pool.close() take so long to return?

Sometimes a call to the function pool.close() takes a lot of time to return, and I want to understand why. Typically, I would have each process return a big set or a big dict, and the main merge them. It looks like this:
def worker() :
s = set()
# add millions of elements to s
return s
if __name__ == '__main__' :
pool = multiprocessing.Pool( processes=20 )
fullSet = set.union( * pool.imap_unordered( worker, xrange(100) ) )
pool.close() # This takes a LOT OF TIME!
pool.join()
As I said, the pool.close() might take 5, 10 min or more to return. Same problem occurs when using dictionaries instead of sets. This is what the documentation says about close:
Prevents any more tasks from being submitted to the pool. Once all the
tasks have been completed the worker processes will exit.
I guess I don't understand what's going on. After the line fullSet = ..., all the work is done and I don't need the workers anymore. What are they doing that is taking so much time?

It is very unlikely that Pool.close is taking that long. Simply because this is the source of close
def close(self):
debug('closing pool')
if self._state == RUN:
self._state = CLOSE
self._worker_handler._state = CLOSE
So all that’s happening is that some state variables are changed. This has no measurable impact on the runtime of that method and will not cause it to return later. You could just assume close to return instantaneously.
Now instead, what’s way more likely is that your pool.join() line is the “culprit” of this delay. But it’s just doing its job:
Wait for the worker processes to exit.
It essentially calls join on every process in the pool. And if you are joining a process or thread, you are actively waiting for it to complete its work and terminate.
So in your case, you have 20 processes running that add a million elements to a set. That takes a while. To make your main process not quit early (causing child processes to die btw.), you are waiting for the worker processes to finish their work; by joining on them. So what you’re experiencing is likely what should happen for the amount of work you do.
On a side note: If you do heavy CPU work in your worker functions, you shouldn’t spawn more processes than your CPU has hardware threads available, as you will only introduce additional overhead from managing and switching processes. For example for a consumer Core i7, this number would be 8.

It is probably the iteration over the result of pool.imap_unordered and the subsequent set.union that take a long time.
After each worker has finished building a set, it has to be pickled, sent back to the original process and unpickled. This takes time and memory. And then the * has to unpack all the sets for union to process.
You might get better results with map_async. Have the callback append the returned set to a list, and loop over that list using union on each set.

When, why, and how to call thread.join() in Python?

I have this python threading code.
import threading
def sum(value):
sum = 0
for i in range(value+1):
sum += i
print "I'm done with %d - %d\n" % (value, sum)
return sum
r = range(500001, 500000*2, 100)
ts = []
for u in r:
t = threading.Thread(target=sum, args = (u,))
ts.append(t)
t.start()
for t in ts:
t.join()
Executing this, I have hundreds of threads are working.
However, when I move the t.join() right after the t.start(), I have only two threads working.
for u in r:
t = threading.Thread(target=sum, args = (u,))
ts.append(t)
t.start()
t.join()
I tested with the code that does not invoke the t.join(), but it seems to work fine?
Then when, how, and how to use thread.join()?

You seem to not understand what Thread.join does. When calling join, the current thread will block until that thread finished. So you are waiting for the thread to finish, preventing you from starting any other thread.
The idea behind join is to wait for other threads before continuing. In your case, you want to wait for all threads to finish at the end of the main program. Otherwise, if you didn’t do that, and the main program would end, then all threads it created would be killed. So usually, you should have a loop at the end, that joins all created threads to prevent the main thread from exiting down early.

Short answer: this one:
for t in ts:
t.join()
is generally the idiomatic way to start a small number of threads. Doing .join means that your main thread waits until the given thread finishes before proceeding in execution. You generally do this after you've started all of the threads.
Longer answer:
len(list(range(500001, 500000*2, 100)))
Out[1]: 5000
You're trying to start 5000 threads at once. It's miraculous your computer is still in one piece!
Your method of .join-ing in the loop that dispatches workers is never going to be able to have more than 2 threads (i.e. only one worker thread) going at once. Your main thread has to wait for each worker thread to finish before moving on to the next one. You've prevented a computer-meltdown, but your code is going to be WAY slower than if you'd just never used threading in the first place!
At this point I'd talk about the GIL, but I'll put that aside for the moment. What you need to limit your thread creation to a reasonable limit (i.e. more than one, less than 5000) is a ThreadPool. There are various ways to do this. You could roll your own - this is fairly simple with a threading.Semaphore. You could use 3.2+'s concurrent.futures package. You could use some 3rd party solution. Up to you, each is going to have a different API so I can't really discuss that further.
Obligatory GIL Discussion
cPython programmers have to live with the GIL. The Global Interpreter Lock, in short, means that only one thread can be executing python bytecode at once. This means that on processor-bound tasks (like adding a bunch of numbers), threading will not result in any speed-up. In fact, the overhead involved in setting up and tearing down threads (not to mention context switching) will result in a slowdown. Threading is better positioned to provide gains on I/O bound tasks, such as retrieving a bunch of URLs.
multiprocessing and friends sidestep the GIL limitation by, well, using multiple processes. This isn't free - data transfer between processes is expensive, so a lot of care needs to be made not to write workers that depend on shared state.

join() waits for your thread to finish, so the first use starts a hundred threads, and then waits for all of them to finish. The second use wait for end of every thread before it launches another one, which kind of defeats the purpose of threading.
The first use makes most sense. You run the threads (all of them) to do some parallel computation, and then wait until all of them finish, before you move on and use the results, to make sure the work is done (i.e. the results are actually there).

Clarification regarding python Pool.map function used for python parallelism

I have a couple of questions regarding the functioning of the following code fragment.
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=10) # start 10 worker processes
result = pool.apply_async(f, [10]) # evaluate "f(10)" asynchronously
print result.get(timeout=1)
print pool.map(f, range(10)) # prints "[0, 1, 4,..., 81]"
In the line pool = Pool(processes=10), does it even make a difference if i'm running on 4 processor architecture (quad-core) and instantiate more than 4 worker processes since only up to 4 processes can execute at any point in time?
In thepool.map(f,range(10)) function, if I instantiate 10 worker processes, and have maybe 50 mappers does python take care of assigning mappers to processes as they complete execution or am I supposed to figure out how many mappers are created and instantiate that many number of processes in the line pool = Pool(processes=number_of_mappers) ?.
This is my first attempt at parallelizing anything and I am thoroughly confused. so any help would be much appreciated.
Thanks in advance!

If you create more worker processes than you have available CPUs, that's fine, but the processes will compete with each other for cycles. That is, you'll waste more cycles, in the sense that cycles devoted to switching among processes does nothing to get you closer to finishing. For CPU-bound tasks, it's just wasteful. For I/O-bound tasks, though, it may be just what you want, since in that case processes will spend lots of their time idle, waiting for blocking I/O to complete.
The map functions automatically slice up their iterable argument and send pieces of it to all worker processes. I really don't know what you mean by mappers, though. How many mappers do you think you created in your example? 10? 1? Something else? In what you wrote, pool.map() blocks until all work is completed.

You can create more workers than the number of threads your CPU can execute. This is required in real-time applications, like a web server, where you must ensure that each client is able communicate with you without having to wait others. If it's not a real-time application and you just want to finish all the jobs as soon as possible, it would be wiser to create as many threads as your CPU can handle simultaneously.
Python takes care of assigning jobs to workers no matter how many jobs you have.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.