How to reduce thread switching latency in Python

How to reduce thread switching latency in Python - python

I have a Python 2.7 app that has 3 producer threads and 1 consumer thread that are connected to a Queue.queue. I'm using get and put, and the producer threads spend most of their time blocked in IO (reading from serial ports) - basically doing nothing. Basically calling serial.read()...
However, I seem to have what I would call a high latency between the time a producer thread puts to the queue and the time the consumer thread gets from the queue, like 25 ms (I'm running a 1 processor Beagle Bone Black (1GHz) on Angstrom Linux).
I would think that if all the processes are blocked, then the elapsed time between put and get should be really small, a few microseconds or so, not tens of milliseconds, except when the consumer thread is actually busy (which is not the case here).
I've read some things online that suggest that Python is guilty of busy spin, and that the GIL in Python is to blame. I guess I would rather not know the reason and just get something that is more responsive. I'm fine with the actual latency of serial transmission (about 1-2 ms).
The code looks basically like
q = Queue.queue
def a1():
while True:
p = read_serial_packet("/dev/ttyO1")
p.timestamp = time.time()
q.put(p)
def a2():
while True:
p = read_serial_packet("/dev/ttyO2")
p.timestamp = time.time()
q.put(p)
def a3():
while True:
p = read_serial_packet("/dev/ttyO3")
p.timestamp = time.time()
q.put(p)
def main():
while True:
p = q.get()
d = time.time() - p.timestamp
print str(d)
and there are 4 threads running a1, a2,a3 and main.
Here are some sample times
0.0119640827179
0.0178141593933
0.0154139995575
0.0192430019379
0.0185649394989
0.0225830078125
0.018187046051
0.0234098434448
0.0208261013031
0.0254039764404
0.0257620811462
Is this something that is "fixed" in Python 3?

As #fileoffset hinted, the answer seems to be switching from threading (which suffers from the fact that the Python GIL does not actually do "real" threading) to multiprocessing, which has several python processes instead of threads.
The conversion from threading to multiprocessing looks like this:
useMP = True # or False if you want threading
if useMP:
import multiprocessing
import multiprocessing.queues
import Queue # to import Queue.Empty exception, but don't use Queue.Queue
else:
import threading
import Queue
...
if useMP:
self.event_queue = multiprocessing.queues.Queue()
t1 = multiprocessing.Process(target=self.upstream_thread)
t2 = multiprocessing.Process(target=self.downstream_thread)
t3 = multiprocessing.Process(target=self.scanner_thread)
else :
self.event_queue = Queue.Queue()
t1 = threading.Thread(target=self.upstream_thread)
t2 = threading.Thread(target=self.downstream_thread)
t3 = threading.Thread(target=self.scanner_thread)
The rest of the API looks the same.
There is one other important issue though that was not easy to migrate and is left as an exercise. The issue is catch Unix signals, such as SIGINT or Ctrl-C handlers. Previously, the master thread catches the signal and all the other threads ignore it. Now, the signal is sent to all processes. So you have to be careful about catching KeyboardInterrupt and installing signal handlers. I don't think I did it the right way, so I am not going to elaborate... :)

You might try playing around with the value of the "check interval"
sys.setcheckinterval(50)
A brief explanation of the general concept can be found in these slides, starting around page 10.

Related

Is it an issue to schedule too many tiny threads?

Is there any issue scheduling too many tiny threads that are never overlap?
For instance, what "unexpected" behavior should be expected from something like this in python:
from threading import Timer
def print_num_thread(wait_for=0.1, num_thread=0):
t = Timer(0.1, print_num_thread, [wait_for, num_thread+1])
# do something really quick (do << wait_for) like:
print(f'thread number {num_thread}')
t.start()
print_num_thread()
What are the advantages of this compared to something like:
from time import sleep
def print_num_while(wait_for=0.1, num_while=0):
while True:
# do something really quick like:
print(f'loop_number {num_while}')
num_while += 1
sleep(wait_for)
print_num_while()
And how about stopping the threads in the first scenario without setting t.daemon = True if using some resources?

And how about stopping the threads in the first scenario without
setting t.daemon = True if using some resources?
If you set t.daemon to True the thread will not stop but become a daemon. Meaning it should keep running upon program termination. You can kill/stop a thread using thread.kill.
To compare: make a thread that is a demon in your python repl, make a thread that is none, exit python repl, check running processes.
What are the advantages of this compared to something like:
...
The first example will not block the thread, while the second one will.
To compare, try:
print_num_thread()
print("I GOT HERE")
vs:
print_num_while()
print("I GOT HERE")

Python multiprocessing - Is it possible to introduce a fixed time delay between individual processes?

I have searched and cannot find an answer to this question elsewhere. Hopefully I haven't missed something.
I am trying to use Python multiprocessing to essentially batch run some proprietary models in parallel. I have, say, 200 simulations, and I want to batch run them ~10-20 at a time. My problem is that the proprietary software crashes if two models happen to start at the same / similar time. I need to introduce a delay between processes spawned by multiprocessing so that each new model run waits a little bit before starting.
So far, my solution has been to introduced a random time delay at the start of the child process before it fires off the model run. However, this only reduces the probability of any two runs starting at the same time, and therefore I still run into problems when trying to process a large number of models. I therefore think that the time delay needs to be built into the multiprocessing part of the code but I haven't been able to find any documentation or examples of this.
Edit: I am using Python 2.7
This is my code so far:
from time import sleep
import numpy as np
import subprocess
import multiprocessing
def runmodels(arg):
sleep(np.random.rand(1,1)*120) # this is my interim solution to reduce the probability that any two runs start at the same time, but it isn't a guaranteed solution
subprocess.call(arg) # this line actually fires off the model run
if __name__ == '__main__':
arguments = [big list of runs in here
]
count = 12
pool = multiprocessing.Pool(processes = count)
r = pool.imap_unordered(runmodels, arguments)
pool.close()
pool.join()

multiprocessing.Pool() already limits number of processes running concurrently.
You could use a lock, to separate the starting time of the processes (not tested):
import threading
import multiprocessing
def init(lock):
global starting
starting = lock
def run_model(arg):
starting.acquire() # no other process can get it until it is released
threading.Timer(1, starting.release).start() # release in a second
# ... start your simulation here
if __name__=="__main__":
arguments = ...
pool = Pool(processes=12,
initializer=init, initargs=[multiprocessing.Lock()])
for _ in pool.imap_unordered(run_model, arguments):
pass

One way to do this with thread and semaphore :
from time import sleep
import subprocess
import threading
def runmodels(arg):
subprocess.call(arg)
sGlobal.release() # release for next launch
if __name__ == '__main__':
threads = []
global sGlobal
sGlobal = threading.Semaphore(12) #Semaphore for max 12 Thread
arguments = [big list of runs in here
]
for arg in arguments :
sGlobal.acquire() # Block if more than 12 thread
t = threading.Thread(target=runmodels, args=(arg,))
threads.append(t)
t.start()
sleep(1)
for t in threads :
t.join()

The answer suggested by jfs caused problems for me as a result of starting a new thread with threading.Timer. If the worker just so happens to finish before the timer does, the timer is killed and the lock is never released.
I propose an alternative route, in which each successive worker will wait until enough time has passed since the start of the previous one. This seems to have the same desired effect, but without having to rely on another child process.
import multiprocessing as mp
import time
def init(shared_val):
global start_time
start_time = shared_val
def run_model(arg):
with start_time.get_lock():
wait_time = max(0, start_time.value - time.time())
time.sleep(wait_time)
start_time.value = time.time() + 1.0 # Specify interval here
# ... start your simulation here
if __name__=="__main__":
arguments = ...
pool = mp.Pool(processes=12,
initializer=init, initargs=[mp.Value('d')])
for _ in pool.imap_unordered(run_model, arguments):
pass

Asynchronous multiprocessing with a worker pool in Python: how to keep going after timeout?

I would like to run a number of jobs using a pool of processes and apply a given timeout after which a job should be killed and replaced by another working on the next task.
I have tried to use the multiprocessing module which offers a method to run of pool of workers asynchronously (e.g. using map_async), but there I can only set a "global" timeout after which all processes would be killed.
Is it possible to have an individual timeout after which only a single process that takes too long is killed and a new worker is added to the pool again instead (processing the next task and skipping the one that timed out)?
Here's a simple example to illustrate my problem:
def Check(n):
import time
if n % 2 == 0: # select some (arbitrary) subset of processes
print "%d timeout" % n
while 1:
# loop forever to simulate some process getting stuck
pass
print "%d done" % n
return 0
from multiprocessing import Pool
pool = Pool(processes=4)
result = pool.map_async(Check, range(10))
print result.get(timeout=1)
After the timeout all workers are killed and the program exits. I would like instead that it continues with the next subtask. Do I have to implement this behavior myself or are there existing solutions?
Update
It is possible to kill the hanging workers and they are automatically replaced. So I came up with this code:
jobs = pool.map_async(Check, range(10))
while 1:
try:
print "Waiting for result"
result = jobs.get(timeout=1)
break # all clear
except multiprocessing.TimeoutError:
# kill all processes
for c in multiprocessing.active_children():
c.terminate()
print result
The problem now is that the loop never exits; even after all tasks have been processed, calling get yields a timeout exception.

The pebble Pool module has been built for solving these types of issue. It supports timeout on given tasks allowing to detect them and easily recover.
from pebble import ProcessPool
from concurrent.futures import TimeoutError
with ProcessPool() as pool:
future = pool.schedule(function, args=[1,2], timeout=5)
try:
result = future.result()
except TimeoutError:
print "Function took longer than %d seconds" % error.args[1]
For your specific example:
from pebble import ProcessPool
from concurrent.futures import TimeoutError
results = []
with ProcessPool(max_workers=4) as pool:
future = pool.map(Check, range(10), timeout=5)
iterator = future.result()
# iterate over all results, if a computation timed out
# print it and continue to the next result
while True:
try:
result = next(iterator)
results.append(result)
except StopIteration:
break
except TimeoutError as error:
print "function took longer than %d seconds" % error.args[1]
print results

Currently the Python does not provide native means to the control execution time of each distinct task in the pool outside the worker itself.
So the easy way is to use wait_procs in the psutil module and implement the tasks as subprocesses.
If nonstandard libraries are not desirable, then you have to implement own Pool on base of subprocess module having the working cycle in the main process, poll() - ing the execution of each worker and performing required actions.
As for the updated problem, the pool becomes corrupted if you directly terminate one of the workers (it is the bug in the interpreter implementation, because such behavior should not be allowed): the worker is recreated, but the task is lost and the pool becomes nonjoinable.
You have to terminate all the pool and then recreate it again for another tasks:
from multiprocessing import Pool
while True:
pool = Pool(processes=4)
jobs = pool.map_async(Check, range(10))
print "Waiting for result"
try:
result = jobs.get(timeout=1)
break # all clear
except multiprocessing.TimeoutError:
# kill all processes
pool.terminate()
pool.join()
print result
UPDATE
Pebble is an excellent and handy library, which solves the issue. Pebble is designed for the asynchronous execution of Python functions, where is PyExPool is designed for the asynchronous execution of modules and external executables, though both can be used interchangeably.
One more aspect is when 3dparty dependencies are not desirable, then PyExPool can be a good choice, which is a single-file lightweight implementation of Multi-process Execution Pool with per-Job and global timeouts, opportunity to group Jobs into Tasks and other features.
PyExPool can be embedded into your sources and customized, having permissive Apache 2.0 license and production quality, being used in the core of one high-loaded scientific benchmarking framework.

Try the construction where each process is being joined with a timeout on a separate thread. So the main program never gets stuck and as well the processes which if gets stuck, would be killed due to timeout. This technique is a combination of threading and multiprocessing modules.
Here is my way to maintain the minimum x number of threads in the memory. Its an combination of threading and multiprocessing modules. It may be unusual to other techniques like respected fellow members have explained above BUT may be worth considerable. For the sake of explanation, I am taking a scenario of crawling a minimum of 5 websites at a time.
so here it is:-
#importing dependencies.
from multiprocessing import Process
from threading import Thread
import threading
# Crawler function
def crawler(domain):
# define crawler technique here.
output.write(scrapeddata + "\n")
pass
Next is threadController function. This function will control the flow of threads to the main memory. It will keep activating the threads to maintain the threadNum "minimum" limit ie. 5. Also it won't exit until, all Active threads(acitveCount) are finished up.
It will maintain a minimum of threadNum(5) startProcess function threads (these threads will eventually start the Processes from the processList while joining them with a time out of 60 seconds). After staring threadController, there would be 2 threads which are not included in the above limit of 5 ie. the Main thread and the threadController thread itself. thats why threading.activeCount() != 2 has been used.
def threadController():
print "Thread count before child thread starts is:-", threading.activeCount(), len(processList)
# staring first thread. This will make the activeCount=3
Thread(target = startProcess).start()
# loop while thread List is not empty OR active threads have not finished up.
while len(processList) != 0 or threading.activeCount() != 2:
if (threading.activeCount() < (threadNum + 2) and # if count of active threads are less than the Minimum AND
len(processList) != 0): # processList is not empty
Thread(target = startProcess).start() # This line would start startThreads function as a seperate thread **
startProcess function, as a separate thread, would start Processes from the processlist. The purpose of this function (**started as a different thread) is that It would become a parent thread for Processes. So when It will join them with a timeout of 60 seconds, this would stop the startProcess thread to move ahead but this won't stop threadController to perform. So this way, threadController will work as required.
def startProcess():
pr = processList.pop(0)
pr.start()
pr.join(60.00) # joining the thread with time out of 60 seconds as a float.
if __name__ == '__main__':
# a file holding a list of domains
domains = open("Domains.txt", "r").read().split("\n")
output = open("test.txt", "a")
processList = [] # thread list
threadNum = 5 # number of thread initiated processes to be run at one time
# making process List
for r in range(0, len(domains), 1):
domain = domains[r].strip()
p = Process(target = crawler, args = (domain,))
processList.append(p) # making a list of performer threads.
# starting the threadController as a seperate thread.
mt = Thread(target = threadController)
mt.start()
mt.join() # won't let go next until threadController thread finishes.
output.close()
print "Done"
Besides maintaining a minimum number of threads in the memory, my aim was to also have something which could avoid stuck threads or processes in the memory. I did this using the time out function. My apologies for any typing mistake.
I hope this construction would help anyone in this world.
Regards,
Vikas Gautam

Return whichever expression returns first

I have two different functions f, and g that compute the same result with different algorithms. Sometimes one or the other takes a long time while the other terminates quickly. I want to create a new function that runs each simultaneously and then returns the result from the first that finishes.
I want to create that function with a higher order function
h = firstresult(f, g)
What is the best way to accomplish this in Python?
I suspect that the solution involves threading. I'd like to avoid discussion of the GIL.

I would simply use a Queue for this. Start the threads and the first one which has a result ready writes to the queue.
Code
from threading import Thread
from time import sleep
from Queue import Queue
def firstresult(*functions):
queue = Queue()
threads = []
for f in functions:
def thread_main():
queue.put(f())
thread = Thread(target=thread_main)
threads.append(thread)
thread.start()
result = queue.get()
return result
def slow():
sleep(1)
return 42
def fast():
return 0
if __name__ == '__main__':
print firstresult(slow, fast)
Live demo
http://ideone.com/jzzZX2
Notes
Stopping the threads is an entirely different topic. For this you need to add some state variable to the threads which needs to be checked in regular intervals. As I want to keep this example short I simply assumed that part and assumed that all workers get the time to finish their work even though the result is never read.
Skipping the discussion about the Gil as requested by the questioner. ;-)

Now - unlike my suggestion on the other answer, this piece of code does exactly what you are requesting:
from multiprocessing import Process, Queue
import random
import time
def firstresult(func1, func2):
queue = Queue()
proc1 = Process(target=func1,args=(queue,))
proc2 = Process(target=func2, args=(queue,))
proc1.start();proc2.start()
result = queue.get()
proc1.terminate(); proc2.terminate()
return result
def algo1(queue):
time.sleep(random.uniform(0,1))
queue.put("algo 1")
def algo2(queue):
time.sleep(random.uniform(0,1))
queue.put("algo 2")
print firstresult(algo1, algo2)

Run each function in a new worker thread, the 2 worker threads send the result back to the main thread in a 1 item queue or something similar. When the main thread receives the result from the winner, it kills (do python threads support kill yet? lol.) both worker threads to avoid wasting time (one function may take hours while the other only takes a second).
Replace the word thread with process if you want.

You will need to run each function in another process (with multiprocessing) or in a different thread.
If both are CPU bound, multithread won help much - exactly due to the GIL -
so multiprocessing is the way.
If the return value is a pickleable (serializable) object, I have this decorator I created that simply runs the function in background, in another process:
https://bitbucket.org/jsbueno/lelo/src
It is not exactly what you want - as both are non-blocking and start executing right away. The tirck with this decorator is that it blocks (and waits for the function to complete) as when you try to use the return value.
But on the other hand - it is just a decorator that does all the work.

Python: Pass or Sleep for long running processes?

I am writing an queue processing application which uses threads for waiting on and responding to queue messages to be delivered to the app. For the main part of the application, it just needs to stay active. For a code example like:
while True:
pass
or
while True:
time.sleep(1)
Which one will have the least impact on a system? What is the preferred way to do nothing, but keep a python app running?

I would imagine time.sleep() will have less overhead on the system. Using pass will cause the loop to immediately re-evaluate and peg the CPU, whereas using time.sleep will allow the execution to be temporarily suspended.
EDIT: just to prove the point, if you launch the python interpreter and run this:
>>> while True:
... pass
...
You can watch Python start eating up 90-100% CPU instantly, versus:
>>> import time
>>> while True:
... time.sleep(1)
...
Which barely even registers on the Activity Monitor (using OS X here but it should be the same for every platform).

Why sleep? You don't want to sleep, you want to wait for the threads to finish.
So
# store the threads you start in a your_threads list, then
for a_thread in your_threads:
a_thread.join()
See: thread.join

If you are looking for a short, zero-cpu way to loop forever until a KeyboardInterrupt, you can use:
from threading import Event
Event().wait()
Note: Due to a bug, this only works on Python 3.2+. In addition, it appears to not work on Windows. For this reason, while True: sleep(1) might be the better option.
For some background, Event objects are normally used for waiting for long running background tasks to complete:
def do_task():
sleep(10)
print('Task complete.')
event.set()
event = Event()
Thread(do_task).start()
event.wait()
print('Continuing...')
Which prints:
Task complete.
Continuing...

signal.pause() is another solution, see https://docs.python.org/3/library/signal.html#signal.pause
Cause the process to sleep until a signal is received; the appropriate handler will then be called. Returns nothing. Not on Windows. (See the Unix man page signal(2).)

I've always seen/heard that using sleep is the better way to do it. Using sleep will keep your Python interpreter's CPU usage from going wild.

You don't give much context to what you are really doing, but maybe Queue could be used instead of an explicit busy-wait loop? If not, I would assume sleep would be preferable, as I believe it will consume less CPU (as others have already noted).
[Edited according to additional information in comment below.]
Maybe this is obvious, but anyway, what you could do in a case where you are reading information from blocking sockets is to have one thread read from the socket and post suitably formatted messages into a Queue, and then have the rest of your "worker" threads reading from that queue; the workers will then block on reading from the queue without the need for neither pass, nor sleep.

Running a method as a background thread with sleep in Python:
import threading
import time
class ThreadingExample(object):
""" Threading example class
The run() method will be started and it will run in the background
until the application exits.
"""
def __init__(self, interval=1):
""" Constructor
:type interval: int
:param interval: Check interval, in seconds
"""
self.interval = interval
thread = threading.Thread(target=self.run, args=())
thread.daemon = True # Daemonize thread
thread.start() # Start the execution
def run(self):
""" Method that runs forever """
while True:
# Do something
print('Doing something imporant in the background')
time.sleep(self.interval)
example = ThreadingExample()
time.sleep(3)
print('Checkpoint')
time.sleep(2)
print('Bye')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to reduce thread switching latency in Python - python

You might try playing around with the value of the "check interval" sys.setcheckinterval(50) A brief explanation of the general concept can be found in these slides, starting around page 10.

Related

Is it an issue to schedule too many tiny threads?

Python multiprocessing - Is it possible to introduce a fixed time delay between individual processes?

Asynchronous multiprocessing with a worker pool in Python: how to keep going after timeout?

Return whichever expression returns first

Python: Pass or Sleep for long running processes?

Categories

Resources