I have a function slow_function which takes about 200 seconds to process a job_title, it reads and writes to / from a global variable.
There is no improvement in performance using this code. Am I missing something, this however returns the same results.
Code running five job categories in parallel:
from threading import Thread
threads = []
start = time.time()
for job_title in self.job_titles:
t = Thread(target=self.slow_function, args=(job_title,))
threads.append(t)
# Start all threads
for x in threads:
x.start()
# Wait for all of them to finish
for x in threads:
x.join()
end = time.time()
print "New time taken for all jobs:", end - start
You should extract slow_function from class method, because it's impossible to share local context between processes. And after that you can use this code:
from multiprocessing import Pool
start = time.time()
pool = Pool()
results = pool.map(slow_function, self.job_titles)
for r in results:
# update your `global` variables here
end = time.time()
print "New time taken for all jobs:", end - start
You need to use the multiprocessing (https://docs.python.org/2/library/multiprocessing.html) module, since the threading module is limited by the GIL (https://docs.python.org/2/glossary.html#term-global-interpreter-lock).
But you can not use global variables for exchanging data between spawned processes!!! ... see https://docs.python.org/2/library/multiprocessing.html#exchanging-objects-between-processes
Related
I have a bit of a long script, so a minimum viable example may not be easily possible.
I'm trying to follow some previous SO posts about threading - I wish to execute an exe dozens of times and to use as many CPU cores as possible. Early days here.
First I'm defining a list with elements that are dynamically populated
from multiprocessing import Pool
from multiprocessing.dummy import Pool as ThreadPool
work_items = []
Then I'm defining a function, at the same indent level as all my other working functions:
#To help with executing my exe in Parallel
def worker(tup):
subprocess.call(tup)
Then, finally, inside the function that will call this function:
#Execute jobs
start = time.time()
with ThreadPool(4) as pool:
work_results = pool.map(worker, work_items)
end = time.time()
print(end - start)
The line that is causing me grief is work_results = pool.map(worker, work_items). My linter in VSCode, and the python shell when I attempt to test both return that worker is not defined. My understanding is that the function should be in scope, and it is defined.
Is there something here that stands out as an issue as to why it would be reporting that worker is an undefined function?
I've created a Python script by following an example from the book "Essential Python Second Edition", on using threads to optimize blocking i/o operations. The code is as follows:
import select
import socket
import time
from threading import Thread
def slow_systemcall():
# Running the linux select system call, with 0.1 second timeout
select.select([socket.socket()], [], [], 0.1)
# First, run it linearly
start = time.time()
for _ in range(5):
slow_systemcall()
end = time.time()
delta = end - start
print(f"Took {delta:.3f} seconds")
# Now, run it using threads
start = time.time()
threads = []
for _ in range(5):
thread = Thread(target=slow_systemcall())
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
end = time.time()
delta = end - start
print(f"Took {delta:.3f} seconds")
I was expecting the first print to be about "Took 0.510 seconds" and the second to be about "Took 0.108 seconds", with a drastic difference between the two.
However, what I'm getting is
"Took 0.520 secones"
and
"Took 0.519 secones"
I tested this in Python 3.8 on Mac and in Python 3.6.9 on Linux. Both yielding similar results, where the multithreaded usage does not seem to speed up the blocking i/o operations at all.
What did I do wrong?
EDIT: I noticed something odd and replaced this line
thread = Thread(target=slow_systemcall())
with this line
thread = Thread(target=slow_systemcall)
And it immediately works as intended. Why is this though?
To answer your edit, you must know that the parentheses are not part of the method name but are used to invoke it. As such, adding them resulted in calling the slow_systemcall method itself and passing its result to the target argument.
You need to give the new Thread() the function object.
By adding calling Thread(target=slow_systemcall()) you invoke the function then pass the result instead of passing the function itself.
Thread(target=slow_systemcall) however passes the function, and the new thread calls it.
I have a large dataset in a list that I need to do some work on.
I want to start x amounts of threads to work on the list at any given time, until everything in that list has been popped.
I know how to start x amounts of threads (lets say 20) at a given time (by using thread1....thread20.start())
but how do I make it start a new thread when one of the first 20 threads finish? so at any given time there are 20 threads running, until the list is empty.
what I have so far:
class queryData(threading.Thread):
def __init__(self,threadID):
threading.Thread.__init__(self)
self.threadID = threadID
def run(self):
global lst
#Get trade from list
trade = lst.pop()
tradeId=trade[0][1][:6]
print tradeId
thread1 = queryData(1)
thread1.start()
Update
I have something going with the following code:
for i in range(20):
threads.append(queryData(i))
for thread in threads:
thread.start()
while len(lst)>0:
for iter,thread in enumerate(threads):
thread.join()
lock.acquire()
threads[iter] = queryData(i)
threads[iter].start()
lock.release()
Now it starts 20 threads in the beginning...and then keeps starting a new thread when one finishes.
However, it is not efficient, as it waits for the first one in the list to finish, and then the second..and so on.
Is there a better way of doing this?
Basically I need:
-Start 20 threads:
-While list is not empty:
-wait for 1 of the 20 threads to finish
-reuse or start a new thread
As I suggested in a comment, I think using a multiprocessing.pool.ThreadPool would be appropriate — because it would handle much of the thread management you're manually doing in your code automatically. Once all the threads are queued-up for processing via ThreadPool's apply_async() method calls, the only thing that needs to be done is wait until they've all finished execution (unless there's something else your code could be doing, of course).
I've translated the code in my linked answer to another related question so it's more similar to what you appear to be doing to make it easier to understand in the current context.
from multiprocessing.pool import ThreadPool
from random import randint
import threading
import time
MAX_THREADS = 5
print_lock = threading.Lock() # Prevent overlapped printing from threads.
def query_data(trade):
trade_id = trade[0][1][:6]
time.sleep(randint(1, 3)) # Simulate variable working time for testing.
with print_lock:
print(trade_id)
def process_trades(trade_list):
pool = ThreadPool(processes=MAX_THREADS)
results = []
while(trade_list):
trade = trade_list.pop()
results.append(pool.apply_async(query_data, (trade,)))
pool.close() # Done adding tasks.
pool.join() # Wait for all tasks to complete.
def test():
trade_list = [[['abc', ('%06d' % id) + 'defghi']] for id in range(1, 101)]
process_trades(trade_list)
if __name__ == "__main__":
test()
You can wait for a thread to complete with : thread.join(). This call will block until that thread completes, at which point you can create a new one.
However, instead of respawning a Thread each time, why not recycle your existing threads ?
This can be done by the use of tasks for example. You keep a list of tasks in a shared collection, and when one of your threads finishes a task, it retrieves another one from that collection.
I have searched and cannot find an answer to this question elsewhere. Hopefully I haven't missed something.
I am trying to use Python multiprocessing to essentially batch run some proprietary models in parallel. I have, say, 200 simulations, and I want to batch run them ~10-20 at a time. My problem is that the proprietary software crashes if two models happen to start at the same / similar time. I need to introduce a delay between processes spawned by multiprocessing so that each new model run waits a little bit before starting.
So far, my solution has been to introduced a random time delay at the start of the child process before it fires off the model run. However, this only reduces the probability of any two runs starting at the same time, and therefore I still run into problems when trying to process a large number of models. I therefore think that the time delay needs to be built into the multiprocessing part of the code but I haven't been able to find any documentation or examples of this.
Edit: I am using Python 2.7
This is my code so far:
from time import sleep
import numpy as np
import subprocess
import multiprocessing
def runmodels(arg):
sleep(np.random.rand(1,1)*120) # this is my interim solution to reduce the probability that any two runs start at the same time, but it isn't a guaranteed solution
subprocess.call(arg) # this line actually fires off the model run
if __name__ == '__main__':
arguments = [big list of runs in here
]
count = 12
pool = multiprocessing.Pool(processes = count)
r = pool.imap_unordered(runmodels, arguments)
pool.close()
pool.join()
multiprocessing.Pool() already limits number of processes running concurrently.
You could use a lock, to separate the starting time of the processes (not tested):
import threading
import multiprocessing
def init(lock):
global starting
starting = lock
def run_model(arg):
starting.acquire() # no other process can get it until it is released
threading.Timer(1, starting.release).start() # release in a second
# ... start your simulation here
if __name__=="__main__":
arguments = ...
pool = Pool(processes=12,
initializer=init, initargs=[multiprocessing.Lock()])
for _ in pool.imap_unordered(run_model, arguments):
pass
One way to do this with thread and semaphore :
from time import sleep
import subprocess
import threading
def runmodels(arg):
subprocess.call(arg)
sGlobal.release() # release for next launch
if __name__ == '__main__':
threads = []
global sGlobal
sGlobal = threading.Semaphore(12) #Semaphore for max 12 Thread
arguments = [big list of runs in here
]
for arg in arguments :
sGlobal.acquire() # Block if more than 12 thread
t = threading.Thread(target=runmodels, args=(arg,))
threads.append(t)
t.start()
sleep(1)
for t in threads :
t.join()
The answer suggested by jfs caused problems for me as a result of starting a new thread with threading.Timer. If the worker just so happens to finish before the timer does, the timer is killed and the lock is never released.
I propose an alternative route, in which each successive worker will wait until enough time has passed since the start of the previous one. This seems to have the same desired effect, but without having to rely on another child process.
import multiprocessing as mp
import time
def init(shared_val):
global start_time
start_time = shared_val
def run_model(arg):
with start_time.get_lock():
wait_time = max(0, start_time.value - time.time())
time.sleep(wait_time)
start_time.value = time.time() + 1.0 # Specify interval here
# ... start your simulation here
if __name__=="__main__":
arguments = ...
pool = mp.Pool(processes=12,
initializer=init, initargs=[mp.Value('d')])
for _ in pool.imap_unordered(run_model, arguments):
pass
I have 2 simple functions(loops over a range) that can run separately without any dependency.. I'm trying to run this 2 functions both using the Python multiprocessing module as well as multithreading module..
When I compared the output, I see the multiprocess application takes 1 second more than the multi-threading module..
I read multi-threading is not that efficient because of the Global interpreter lock...
Based on the above statements -
1. Is is best to use the multiprocessing if there is no dependency between 2 processes?
2. How to calculate the number of processes/threads that I can run in my machine for maximum efficiency..
3. Also, is there a way to calculate the efficiency of the program by using multithreading...
Multithread module...
from multiprocessing import Process
import thread
import platform
import os
import time
import threading
class Thread1(threading.Thread):
def __init__(self,threadindicator):
threading.Thread.__init__(self)
self.threadind = threadindicator
def run(self):
starttime = time.time()
if self.threadind == 'A':
process1()
else:
process2()
endtime = time.time()
print 'Thread 1 complete : Time Taken = ', endtime - starttime
def process1():
starttime = time.time()
for i in range(100000):
for j in range(10000):
pass
endtime = time.time()
def process2():
for i in range(1000):
for j in range(1000):
pass
def main():
print 'Main Thread'
starttime = time.time()
thread1 = Thread1('A')
thread2 = Thread1('B')
thread1.start()
thread2.start()
threads = []
threads.append(thread1)
threads.append(thread2)
for t in threads:
t.join()
endtime = time.time()
print 'Main Thread Complete , Total Time Taken = ', endtime - starttime
if __name__ == '__main__':
main()
multiprocess module
from multiprocessing import Process
import platform
import os
import time
def process1():
# print 'process_1 processor =',platform.processor()
starttime = time.time()
for i in range(100000):
for j in range(10000):
pass
endtime = time.time()
print 'Process 1 complete : Time Taken = ', endtime - starttime
def process2():
# print 'process_2 processor =',platform.processor()
starttime = time.time()
for i in range(1000):
for j in range(1000):
pass
endtime = time.time()
print 'Process 2 complete : Time Taken = ', endtime - starttime
def main():
print 'Main Process start'
starttime = time.time()
processlist = []
p1 = Process(target=process1)
p1.start()
processlist.append(p1)
p2 = Process(target = process2)
p2.start()
processlist.append(p2)
for i in processlist:
i.join()
endtime = time.time()
print 'Main Process Complete - Total time taken = ', endtime - starttime
if __name__ == '__main__':
main()
If you have two CPUs available on your machine, you have two processes which don't have to communicate, and you want to use both of them to make your program faster, you should use the multiprocessing module, rather than the threading module.
The Global Interpreter Lock (GIL) prevents the Python interpreter from making efficient use of more than one CPU by using multiple threads, because only one thread can be executing Python bytecode at a time. Therefore, multithreading won't improve the overall runtime of your application unless you have calls that are blocking (e.g. waiting for IO) or that release the GIL (e.g. numpy will do this for some expensive calls) for extended periods of time. However, the multiprocessing library creates separate subprocesses, and therefore several copies of the interpreter, so it can make efficient use of multiple CPUs.
However, in the example you gave, you have one process that finishes very quickly (less than 0.1 seconds on my machine) and one process that takes around 18 seconds to finish on the other. The exact numbers may vary depending on your hardware. In that case, nearly all the work is happening in one process, so you're really only using one CPU regardless. In this case, the increased overhead of spawning processes vs threads is probably causing the process-based version to be slower.
If you make both processes do the 18 second nested loops, you should see that the multiprocessing code goes much faster (assuming your machine actually has more than one CPU). On my machine, I saw the multiprocessing code finish in around 18.5 seconds, and the multithreaded code finish in 71.5 seconds. I'm not sure why the multithreaded one took longer than around 36 seconds, but my guess is the GIL is causing some sort of thread contention issue which is slowing down both threads from executing.
As for your second question, assuming there's no other load on the system, you should use a number of processes equal to the number of CPUs on your system. You can discover this by doing lscpu on a Linux system, sysctl hw.ncpu on a Mac system, or running dxdiag from the Run dialog on Windows (there's probably other ways, but this is how I always do it).
For the third question, the simplest way to figure out how much efficiency you're getting from the extra processes is just to measure the total runtime of your program, using time.time() as you were, or the time utility in Linux (e.g. time python myprog.py). The ideal speedup should be equal to the number of processes you're using, so a 4 process program running on 4 CPUs should be at most 4x faster than the same program with 1 process, assuming you get maximum benefit from the extra processes. If the other processes aren't helping you that much, it will be less than 4x.