Python ThreadPoolExecutor shutdown behaviour varies with where it's called from

Python ThreadPoolExecutor shutdown behaviour varies with where it's called from - python

I've run two variants of code that, to me, should run exactly identically - so I'm very surprised to see different output from each...
First up:
from concurrent.futures import ThreadPoolExecutor
from time import sleep
executor = ThreadPoolExecutor(max_workers=2)
def func(x):
print(f"In func {x}")
sleep(1)
return True
foo = executor.map(func, range(0, 10))
for f in foo:
print(f"blah {f}")
if f:
break
print("Shutting down")
executor.shutdown(wait=False)
print("Shut down")
this outputs the following - showing remaining futures being run to completion. While that surprised me at first, I believe it's consistent with the docs (in the absence of cancel_futures being set to True), as per https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor.shutdown "Regardless of the value of wait, the entire Python program will not exit until all pending futures are done executing."
In func 0
In func 1
In func 2
In func 3
blah True
Shutting down
Shut down
In func 4
In func 5
In func 6
In func 7
In func 8
In func 9
So that's fine. But here's the odd thing - if I refactor to call that within a function, it behaves differently. See minor tweak:
from concurrent.futures import ThreadPoolExecutor
from time import sleep
def run_test():
executor = ThreadPoolExecutor(max_workers=2)
def func(x):
print(f"In func {x}")
sleep(1)
return True
foo = executor.map(func, range(0, 10))
for f in foo:
print(f"blah {f}")
if f:
break
print("Shutting down")
executor.shutdown(wait=False)
print("Shut down")
run_test()
this outputs the following, suggesting the future are cancelled in this case
In func 0
In func 1
In func 2
blah True
Shutting down
In func 3
Shut down
So I guess something is happening as the executor falls out of scope at the end of run_test()? But this seems to contradict the docs (which don't mention this), and surely the executor similarly falls out of scope at the end of the first script??
Seen at both Python 3.8 and 3.9.
I expected the same output in the two cases, but they mis-matched

This surprised me too. This code also reproduces your behaviour
from concurrent.futures import ThreadPoolExecutor
from time import sleep
def run_test():
executor = ThreadPoolExecutor(max_workers=2)
def func(x):
print(f"In func {x}")
sleep(1)
foo = executor.map(func, range(0, 10))
# a
x = next(foo)
# b
print("Shutting down")
executor.shutdown(wait=False)
print("Shut down")
run_test()
If you run it as-is, it will run for first couple of integers between 0 and 10 and then exit. If you comment out the line between #a and #b then it runs all 10.
The reason, as far as I can tell, is that if you loop over the generator object (foo) at all (or call next() on it) then the code ends up in this iterator function in the CPython concurrent.futures._base source code.
When the run_test() function exits and foo goes out of scope, then you end up in this finally block, which cancels all pending futures.
In your example without a function, I believe your guess is correct that it is related to the order in which objects go out of scope. You can see this by commenting / un-commenting the line between # a and # b below
from concurrent.futures import ThreadPoolExecutor
from time import sleep
executor = ThreadPoolExecutor(max_workers=2)
def func(x):
print(f"In func {x}")
sleep(1)
return True
foo = executor.map(func, range(0, 10))
next(foo)
# a
# del foo
# b
print("Shutting down")
executor.shutdown(wait=False)
print("Shut down")

Related

Running functions concurrently in python asyncio

I'm using python 3.6 and trying to use asyncio to run tasks concurrently. I thought asyncio.gather and ensure future would be the tools to use, but it does not seem to be working as I thought it would. Could someone give me pointers?
Here is my code:
import time
import asyncio
async def f1():
print('Running func 1')
time.sleep(4)
print('Returning from func 1')
return 1
async def f2():
print('Running func 2')
time.sleep(6)
print('Returning from func 2')
return 2
async def f3():
print('Running func 3')
time.sleep(1)
print('Returning from func 3')
return 3
async def foo():
calls = [
asyncio.ensure_future(f())
for f in [f1, f2, f3]
]
res = await asyncio.gather(*calls)
print(res)
loop = asyncio.get_event_loop()
start = time.time()
loop.run_until_complete(foo())
end = time.time()
print(f'Took {end - start} seconds')
print('done')
I would expect the 3 functions to run independently of each other, but each one seems to be blocked behind the other. This is the ouput I get
Running func 1
Returning from func 1
Running func 2
Returning from func 2
Running func 3
Returning from func 3
[1, 2, 3]
Took 11.009816884994507 seconds
done
I would have expected it to take 6 seconds, with the bottleneck being f2.

First off, welcome to StackOverflow.
When you run code inside event loop, this code MUST use async libraries or be run in executor if you want not to block the entire process.
In this way, event loop can send the task background to be executed in a worker or in the event loop itself in case you use async libraries. Meanwhile, event loop can attend new function o portion of code and repeat the same process.
Once any background task has finished, event loop catch them and return its value.
In your case, if you use async library of sleep you should obtain expected results. For example:
async def f1():
print('Running func 1')
await asyncio.sleep(4)
print('Returning from func 1')
return 1

I didn't read this tutorial, but I hope it should be interesting for finding your solution.

Python 3 Limit count of active threads (finished threads do not quit)

I want to limit the number of active threads. What i have seen is, that a finished thread stays alive and does not exit itself, so the number of active threads keep growing until an error occours.
The following code starts only 8 threads at a time but they stay alive even when they finished. So the number keeps growing:
class ThreadEx(threading.Thread):
__thread_limiter = None
__max_threads = 2
#classmethod
def max_threads(cls, thread_max):
ThreadEx.__max_threads = thread_max
ThreadEx.__thread_limiter = threading.BoundedSemaphore(value=ThreadEx.__max_threads)
def __init__(self, target=None, args:tuple=()):
super().__init__(target=target, args=args)
if not ThreadEx.__thread_limiter:
ThreadEx.__thread_limiter = threading.BoundedSemaphore(value=ThreadEx.__max_threads)
def run(self):
ThreadEx.__thread_limiter.acquire()
try:
#success = self._target(*self._args)
#if success: return True
super().run()
except:
pass
finally:
ThreadEx.__thread_limiter.release()
def call_me(test1, test2):
print(test1 + test2)
time.sleep(1)
ThreadEx.max_threads(8)
for i in range(0, 99):
t = ThreadEx(target=call_me, args=("Thread count: ", str(threading.active_count())))
t.start()
Due to the for loop, the number of threads keep growing to 99.
I know that a thread has done its work because call_me has been executed and threading.active_count() was printed.
Does somebody know how i make sure, a finished thread does not stay alive?

This may be a silly answer but to me it looks you are trying to reinvent ThreadPool.
from multiprocessing.pool import ThreadPool
from time import sleep
p = ThreadPool(8)
def call_me(test1):
print(test1)
sleep(1)
for i in range(0, 99):
p.apply_async(call_me, args=(i,))
p.close()
p.join()
This will ensure only 8 concurrent threads are running your function at any point of time. And if you want a bit more performance, you can import Pool from multiprocessing and use that. The interface is exactly the same but your pool will now be subprocesses instead of threads, which usually gives a performance boost as GIL does not come in the way.

I have changed the class according to the help of Hannu.
I post it for reference, maybe it's useful for others that come across this post:
import threading
from multiprocessing.pool import ThreadPool
import time
class MultiThread():
__thread_pool = None
#classmethod
def begin(cls, max_threads):
MultiThread.__thread_pool = ThreadPool(max_threads)
#classmethod
def end(cls):
MultiThread.__thread_pool.close()
MultiThread.__thread_pool.join()
def __init__(self, target=None, args:tuple=()):
self.__target = target
self.__args = args
def run(self):
try:
result = MultiThread.__thread_pool.apply_async(self.__target, args=self.__args)
return result.get()
except:
pass
def call_me(test1, test2):
print(test1 + test2)
time.sleep(1)
return 0
MultiThread.begin(8)
for i in range(0, 99):
t = MultiThread(target=call_me, args=("Thread count: ", str(threading.active_count())))
t.run()
MultiThread.end()
The maximum of threads is 8 at any given time determined by the method begin.
And also the method run returns the result of your passed function if it returns something.
Hope that helps.

How to timeout a long running program using rxpython?

Say I have a long running python function that looks something like this?
import random
import time
from rx import Observable
def intns(x):
y = random.randint(5,10)
print(y)
print('begin')
time.sleep(y)
print('end')
return x
I want to be able to set a timeout of 1000ms.
So I'm dong something like, creating an observable and mapping it through the above intense calculation.
a = Observable.repeat(1).map(lambda x: intns(x))
Now for each value emitted, if it takes more than 1000ms I want to end the observable, as soon as I reach 1000ms using on_error or on_completed
a.timeout(1000).subscribe(lambda x: print(x), lambda x: print(x))
above statement does get timeout, and calls on_error, but it goes on to finish calculating the intense calculation and only then returns to the next statements. Is there a better way of doing this?
The last statement prints the following
8 # no of seconds to sleep
begin # begins sleeping, trying to emit the first value
Timeout # operation times out, and calls on_error
end # thread waits till the function ends
The idea is that if a particular function timesout, i want to be able to continue with my program, and ignore the result.
I was wondering if the intns function was done on a separate thread, I guess the main thread continues execution after timeout, but I still want to stop computing intns function on a thread, or kill it somehow.

The following is a class that can be called using with timeout() :
If the block under the code runs for longer than the specified time, a TimeoutError is raised.
import signal
class timeout:
# Default value is 1 second (1000ms)
def __init__(self, seconds=1, error_message='Timeout'):
self.seconds = seconds
self.error_message = error_message
def handle_timeout(self, signum, frame):
raise TimeoutError(self.error_message)
def __enter__(self):
signal.signal(signal.SIGALRM, self.handle_timeout)
signal.alarm(self.seconds)
def __exit__(self, type, value, traceback):
signal.alarm(0)
# example usage
with timeout() :
# infinite while loop so timeout is reached
while True :
pass
If I'm understanding your function, here's what your implementation would look like:
def intns(x):
y = random.randint(5,10)
print(y)
print('begin')
with timeout() :
time.sleep(y)
print('end')
return x

You can do this partially using threading
Although there's no specific way to kill a thread in python, you can implement a method to flag the thread to end.
This won't work if the thread is waiting on other resources (in your case, you simulated a "long" running code by a random wait)
See also
Is there any way to kill a Thread in Python?

This way it works:
import random
import time
import threading
import os
def intns(x):
y = random.randint(5,10)
print(y)
print('begin')
time.sleep(y)
print('end')
return x
thr = threading.Thread(target=intns, args=([10]), kwargs={})
thr.start()
st = time.clock();
while(thr.is_alive() == True):
if(time.clock() - st > 9):
os._exit(0)

Here's an example for timeout
import random
import time
import threading
_timeout = 0
def intns(loops=1):
print('begin')
processing = 0
for i in range(loops):
y = random.randint(5,10)
time.sleep(y)
if _timeout == 1:
print('timedout end')
return
print('keep processing')
return
# this will timeout
timeout_seconds = 10
loops = 10
# this will complete
#timeout_seconds = 30.0
#loops = 1
thr = threading.Thread(target=intns, args=([loops]), kwargs={})
thr.start()
st = time.clock();
while(thr.is_alive() == True):
if(time.clock() - st > timeout_seconds):
_timeout = 1
thr.join()
if _timeout == 0:
print ("completed")
else:
print ("timed-out")

You can use time.sleep() and make a while loop for time.clock()

Semaphores on Python

I've started programming in Python a few weeks ago and was trying to use Semaphores to synchronize two simple threads, for learning purposes. Here is what I've got:
import threading
sem = threading.Semaphore()
def fun1():
while True:
sem.acquire()
print(1)
sem.release()
def fun2():
while True:
sem.acquire()
print(2)
sem.release()
t = threading.Thread(target = fun1)
t.start()
t2 = threading.Thread(target = fun2)
t2.start()
But it keeps printing just 1's. How can I intercale the prints?

It is working fine, its just that its printing too fast for you to see . Try putting a time.sleep() in both functions (a small amount) to sleep the thread for that much amount of time, to actually be able to see both 1 as well as 2.
Example -
import threading
import time
sem = threading.Semaphore()
def fun1():
while True:
sem.acquire()
print(1)
sem.release()
time.sleep(0.25)
def fun2():
while True:
sem.acquire()
print(2)
sem.release()
time.sleep(0.25)
t = threading.Thread(target = fun1)
t.start()
t2 = threading.Thread(target = fun2)
t2.start()

Also, you can use Lock/mutex method as follows:
import threading
import time
mutex = threading.Lock() # is equal to threading.Semaphore(1)
def fun1():
while True:
mutex.acquire()
print(1)
mutex.release()
time.sleep(.5)
def fun2():
while True:
mutex.acquire()
print(2)
mutex.release()
time.sleep(.5)
t1 = threading.Thread(target=fun1).start()
t2 = threading.Thread(target=fun2).start()
Simpler style using "with":
import threading
import time
mutex = threading.Lock() # is equal to threading.Semaphore(1)
def fun1():
while True:
with mutex:
print(1)
time.sleep(.5)
def fun2():
while True:
with mutex:
print(2)
time.sleep(.5)
t1 = threading.Thread(target=fun1).start()
t2 = threading.Thread(target=fun2).start()
[NOTE]:
The difference between mutex, semaphore, and lock

In fact, I want to find asyncio.Semaphores, not threading.Semaphore,
and I believe someone may want it too.
So, I decided to share the asyncio.Semaphores, hope you don't mind.
from asyncio import (
Task,
Semaphore,
)
import asyncio
from typing import List
async def shopping(sem: Semaphore):
while True:
async with sem:
print(shopping.__name__)
await asyncio.sleep(0.25) # Transfer control to the loop, and it will assign another job (is idle) to run.
async def coding(sem: Semaphore):
while True:
async with sem:
print(coding.__name__)
await asyncio.sleep(0.25)
async def main():
sem = Semaphore(value=1)
list_task: List[Task] = [asyncio.create_task(_coroutine(sem)) for _coroutine in (shopping, coding)]
"""
# Normally, we will wait until all the task has done, but that is impossible in your case.
for task in list_task:
await task
"""
await asyncio.sleep(2) # So, I let the main loop wait for 2 seconds, then close the program.
asyncio.run(main())
output
shopping
coding
shopping
coding
shopping
coding
shopping
coding
shopping
coding
shopping
coding
shopping
coding
shopping
coding
16*0.25 = 2

I used this code to demonstrate how 1 thread can use a Semaphore and the other thread will wait (non-blocking) until the Sempahore is available.
This was written using Python3.6; Not tested on any other version.
This will only work is the synchronization is being done from the same thread, IPC from separate processes will fail using this mechanism.
import threading
from time import sleep
sem = threading.Semaphore()
def fun1():
print("fun1 starting")
sem.acquire()
for loop in range(1,5):
print("Fun1 Working {}".format(loop))
sleep(1)
sem.release()
print("fun1 finished")
def fun2():
print("fun2 starting")
while not sem.acquire(blocking=False):
print("Fun2 No Semaphore available")
sleep(1)
else:
print("Got Semphore")
for loop in range(1, 5):
print("Fun2 Working {}".format(loop))
sleep(1)
sem.release()
t1 = threading.Thread(target = fun1)
t2 = threading.Thread(target = fun2)
t1.start()
t2.start()
t1.join()
t2.join()
print("All Threads done Exiting")
When I run this - I get the following output.
fun1 starting
Fun1 Working 1
fun2 starting
Fun2 No Semaphore available
Fun1 Working 2
Fun2 No Semaphore available
Fun1 Working 3
Fun2 No Semaphore available
Fun1 Working 4
Fun2 No Semaphore available
fun1 finished
Got Semphore
Fun2 Working 1
Fun2 Working 2
Fun2 Working 3
Fun2 Working 4
All Threads done Exiting

Existing answers are wastefully sleeping
I noticed that almost all answers use some form of time.sleep or asyncio.sleep, which blocks the thread. This should be avoided in real software, because blocking your thread for 0.25, 0.5 or 1 second is unnecessary/wasteful - you could be doing more processing, especially if your application is IO bound - it already blocks when it does IO AND you are introducing arbitrary delays (latency) in your processing time. If all your threads are sleeping, your app isn't doing anything. Also, these variables are quite arbitrary, which is why each answer has a different value they sleep (block the thread for).
The answers are using it as a way to get Python's bytecode interpreter to pre-empt the thread after each print line, so that it alternates deterministically between running the 2 threads. By default, the interpreter pre-empts a thread every 5ms (sys.getswitchinterval() returns 0.005), and remember that these threads never run in parallel, because of Python's GIL
Solution to problem
How can I intercale the prints?
So my answer would be, you do not want to use semaphores to print (or process) something in a certain order reliably, because you cannot rely on thread prioritization in Python. See Controlling scheduling priority of python threads? for more. time.sleep(arbitrarilyLargeEnoughNumber) doesn't really work when you have more than 2 concurrent pieces of code, since you don't know which one will run next - see * below. If the order matters, use a queue, and worker threads:
from threading import Thread
import queue
q = queue.Queue()
def enqueue():
while True:
q.put(1)
q.put(2)
def reader():
while True:
value = q.get()
print(value)
enqueuer_thread = Thread(target = enqueue)
reader_thread_1 = Thread(target = reader)
reader_thread_2 = Thread(target = reader)
reader_thread_3 = Thread(target = reader)
enqueuer_thread.start()
reader_thread_1.start()
reader_thread_2.start()
reader_thread_3.start()
...
Unfortunately in this problem, you don't get to use Semaphore.
*An extra check for you
If you try a modification of the top voted answer but with an extra function/thread to print(3), you'll get:
1
2
3
1
3
2
1
3
...
Within a few prints, the ordering is broken - it's 1-3-2.

You need to use 2 semaphores to do what you want to do, and you need to initialize them at 0.
import threading
SEM_FUN1 = threading.Semaphore(0)
SEM_FUN2 = threading.Semaphore(0)
def fun1() -> None:
for _ in range(5):
SEM_FUN1.acquire()
print(1)
SEM_FUN2.release()
def fun2() -> None:
for _ in range(5):
SEM_FUN2.acquire()
print(2)
SEM_FUN1.release()
threading.Thread(target=fun1).start()
threading.Thread(target=fun2).start()
SEM_FUN1.release() # Trigger fun1
Output:

How to stop a function at a specific time and continue with next function in python?

I have a code:
function_1()
function_2()
Normally, function_1() takes 10 hours to end.
But I want function_1() to run for 2 hours, and after 2 hours, function_1 must return and program must continue with function_2(). It shouldn't wait for function_1() to be completed. Is there a way to do this in python?

What makes functions in Python able to interrupt their execution and resuming is the use of the "yield" statement -- your function then will work as a generator object. You call the "next" method on this object to have it start or continue after the last yield
import time
def function_1():
start_time = time.time()
while True:
# do long stuff
running_time = time.time() -start_time
if running_time > 2 * 60 * 60: # 2 hours
yield #<partial results can be yield here, if you want>
start_time = time.time()
runner = function_1()
while True:
try:
runner.next()
except StopIteration:
# function_1 had got to the end
break
# do other stuff

If you don't mind leaving function_1 running:
from threading import Thread
import time
Thread(target=function_1).start()
time.sleep(60*60*2)
Thread(target=function_2).start()

You can try to use module Gevent: start function in thread and kill that thread after some time.
Here is example:
import gevent
# function which you can't modify
def func1(some_arg)
# do something
pass
def func2()
# do something
pass
if __name__ == '__main__':
g = gevent.Greenlet(func1, 'Some Argument in func1')
g.start()
gevent.sleep(60*60*2)
g.kill()
# call the rest of functions
func2()

from multiprocessing import Process
p1 = Process(target=function_1)
p1.start()
p1.join(60*60*2)
if p1.is_alive():p1.terminate()
function_2()
I hope this helps
I just tested this using the following code
import time
from multiprocessing import Process
def f1():
print 0
time.sleep(10000)
print 1
def f2():
print 2
p1 = Process(target=f1)
p1.start()
p1.join(6)
if p1.is_alive():p1.terminate()
f2()
Output is as expected:
0
2

You can time the execution using the datetime module. Probably your optimizer function has a loop somewhere. Inside the loop you can test how much time has passed since you started the function.
def function_1():
t_end = datetime.time.now() + datetime.timedelta(hours=2)
while not converged:
# do your thing
if datetime.time.now() > t_end:
return

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python ThreadPoolExecutor shutdown behaviour varies with where it's called from - python

Related

Running functions concurrently in python asyncio

Python 3 Limit count of active threads (finished threads do not quit)

How to timeout a long running program using rxpython?

Semaphores on Python

How to stop a function at a specific time and continue with next function in python?

Categories

Resources