I have the following problem:
I am running a large process with threading which involves importing a pair of files, calculating and giving back a score. this is all in a TKinter Python API.
For single pairs it works fine:
def run_single():
#importing , calculating and scoring
couple = [path1,path2]
th_score = threading.Thread(target= scoring_function)
th_score.start()
...
The problem is when i want to do stock imports: Importing two directories, making pairs of the same file references and operating each pair in a loop. I want to do threading for each individual iteration and start with the next pair of files only when all the threads from the previous scoring process are done. I tried:
def run_multiple():
for pair in couples:
couple = pair.copy()
th_score = threading.Thread(target= scoring_function)
th_score.start()
This was wrong because of the threads not synchronizing, running some threads over others already running. Threading continues with the for loop even when the execution of scoring_function in the current iteration is not finished (yeah, I know that's threading's purpose).
I tried using conditions and events, even using them as global variables so the scoring_function could regulate them, but all they do is freezing the program.
Any suggestion on what I should use?
From my experience the best way of doing threading is to use the threading.Thread class of python along with this little trick:
import threading
couples = [1,2,3] # probably a list you created with some elements
class run_in_thread(threading.Thread):
def run(self):
for pair in couples:
couple = pair.copy()
wanted_threads = 3
for a in range(wanted_threads):
instance = run_in_thread()
instance.start()
In this case, you start three threads.
You use the built in run() function from threading.
threading.Thread class that is by default will create a new thread by executing it later with start().
I hope this helps.
More documentation of correct threading see: threading
Related
Suppose I have the following in Python
# A loop
for i in range(10000):
Do Task A
# B loop
for i in range(10000):
Do Task B
How do I run these loops simultaneously in Python?
If you want concurrency, here's a very simple example:
from multiprocessing import Process
def loop_a():
while 1:
print("a")
def loop_b():
while 1:
print("b")
if __name__ == '__main__':
Process(target=loop_a).start()
Process(target=loop_b).start()
This is just the most basic example I could think of. Be sure to read http://docs.python.org/library/multiprocessing.html to understand what's happening.
If you want to send data back to the program, I'd recommend using a Queue (which in my experience is easiest to use).
You can use a thread instead if you don't mind the global interpreter lock. Processes are more expensive to instantiate but they offer true concurrency.
There are many possible options for what you wanted:
use loop
As many people have pointed out, this is the simplest way.
for i in xrange(10000):
# use xrange instead of range
taskA()
taskB()
Merits: easy to understand and use, no extra library needed.
Drawbacks: taskB must be done after taskA, or otherwise. They can't be running simultaneously.
multiprocess
Another thought would be: run two processes at the same time, python provides multiprocess library, the following is a simple example:
from multiprocessing import Process
p1 = Process(target=taskA, args=(*args, **kwargs))
p2 = Process(target=taskB, args=(*args, **kwargs))
p1.start()
p2.start()
merits: task can be run simultaneously in the background, you can control tasks(end, stop them etc), tasks can exchange data, can be synchronized if they compete the same resources etc.
drawbacks: too heavy!OS will frequently switch between them, they have their own data space even if data is redundant. If you have a lot tasks (say 100 or more), it's not what you want.
threading
threading is like process, just lightweight. check out this post. Their usage is quite similar:
import threading
p1 = threading.Thread(target=taskA, args=(*args, **kwargs))
p2 = threading.Thread(target=taskB, args=(*args, **kwargs))
p1.start()
p2.start()
coroutines
libraries like greenlet and gevent provides something called coroutines, which is supposed to be faster than threading. No examples provided, please google how to use them if you're interested.
merits: more flexible and lightweight
drawbacks: extra library needed, learning curve.
Why do you want to run the two processes at the same time? Is it because you think they will go faster (there is a good chance that they wont). Why not run the tasks in the same loop, e.g.
for i in range(10000):
doTaskA()
doTaskB()
The obvious answer to your question is to use threads - see the python threading module. However threading is a big subject and has many pitfalls, so read up on it before you go down that route.
Alternatively you could run the tasks in separate proccesses, using the python multiprocessing module. If both tasks are CPU intensive this will make better use of multiple cores on your computer.
There are other options such as coroutines, stackless tasklets, greenlets, CSP etc, but Without knowing more about Task A and Task B and why they need to be run at the same time it is impossible to give a more specific answer.
from threading import Thread
def loopA():
for i in range(10000):
#Do task A
def loopB():
for i in range(10000):
#Do task B
threadA = Thread(target = loopA)
threadB = Thread(target = loobB)
threadA.run()
threadB.run()
# Do work indepedent of loopA and loopB
threadA.join()
threadB.join()
You could use threading or multiprocessing.
How about: A loop for i in range(10000): Do Task A, Do Task B ? Without more information i dont have a better answer.
I find that using the "pool" submodule within "multiprocessing" works amazingly for executing multiple processes at once within a Python Script.
See Section: Using a pool of workers
Look carefully at "# launching multiple evaluations asynchronously may use more processes" in the example. Once you understand what those lines are doing, the following example I constructed will make a lot of sense.
import numpy as np
from multiprocessing import Pool
def desired_function(option, processes, data, etc...):
# your code will go here. option allows you to make choices within your script
# to execute desired sections of code for each pool or subprocess.
return result_array # "for example"
result_array = np.zeros("some shape") # This is normally populated by 1 loop, lets try 4.
processes = 4
pool = Pool(processes=processes)
args = (processes, data, etc...) # Arguments to be passed into desired function.
multiple_results = []
for i in range(processes): # Executes each pool w/ option (1-4 in this case).
multiple_results.append(pool.apply_async(param_process, (i+1,)+args)) # Syncs each.
results = np.array(res.get() for res in multiple_results) # Retrieves results after
# every pool is finished!
for i in range(processes):
result_array = result_array + results[i] # Combines all datasets!
The code will basically run the desired function for a set number of processes. You will have to carefully make sure your function can distinguish between each process (hence why I added the variable "option".) Additionally, it doesn't have to be an array that is being populated in the end, but for my example, that's how I used it. Hope this simplifies or helps you better understand the power of multiprocessing in Python!
I have a code which is basically running an infinite loop, and in each iteration of the loop I run some instructions. Some of these instructions have to run in "parallel", which I do by using multiprocessing. Here is an example of my code structure:
from multiprocessing import Pool
from multiprocessing.dummy import Pool as ThreadPool
def buy_fruit(fruit, number):
print('I bought '+str(number)+' times the following fruit:'+fruit)
return 'ok'
def func1(parameter1, parameter2):
myParameters=(parameter1,parameter2)
pool= Threadpool(2)
data = pool.starmap(func2,zip(myParameters))
return 'ok'
def func2(parameter1):
print(parameter1)
return 'ok'
while true:
myFruits=('apple','pear','orange')
myQuantities=(5,10,2)
pool= Threadpool(2)
data = pool.starmap(buy_fruit,zip(myFruits,myQuantities))
func1('hello', 'hola')
I agree it's a bit messy, because I have multi-processes within the main loop, but also within functions.
So everything works well, until the loop runs a few minutes and I get an error:
"RuntimeError: can't start new thread"
I saw online that this is due to the fact that I have opened too many threads.
What is the simplest way to close all my Threads by the end of each loop iteration, so I can restart "fresh" at the start of the new loop iteration?
Thank you in advance for your time and help!
Best,
Julia
PS: The example code is just an example, my real function opens many threads in each loop and each function takes a few seconds to execute.
You are creating a new ThreadPool object inside the endless loop, which is a likely cause to your problem, because you are not terminating the threads at the end of the loop. Have you tried creating the object outside of the endless loop?
pool = ThreadPool(2)
while True:
myFruits = ('apple','pear','orange')
myQuantities = (5,10,2)
data = pool.starmap(buy_fruit, zip(myFruits,myQuantities))
Alternatively, and to answer your question, if your use case for some reason requires creating a new ThreadPool Object in each loop iteration, use a ContextManager (with Notation) to make sure all threads are closed upon leaving the ContextManager.
while True:
myFruits = ('apple','pear','orange')
myQuantities = (5,10,2)
with ThreadPool(2) as pool:
data = pool.starmap(buy_fruit, zip(myFruits,myQuantities))
Notice however the noticable performance difference this has compared to the above code. Creating and terminating Threads is expensive, which is why the example above will run much faster, and is probably what you'll want to use.
Regarding your edit involving "nested ThreadPools": I would suggest to maintain one single instance of your ThreadPool, and pass references to your nested functions as required.
def func1(pool, parameter1, parameter2):
...
...
pool = ThreadPool(2)
while True:
myFruits=('apple','pear','orange')
myQuantities=(5,10,2)
data = pool.starmap(buy_fruit, zip(myFruits,myQuantities))
func1(pool, 'hello', 'hola')
I'm writing code for a game, with the AI running as threads. They are stored in a list with the relevant information so that when their character 'dies', they can be efficiently removed from play by simply removing them from the list of active AIs.
The threads controlling them were initially called from within another thread while a tkinter window controlling the player runs in the main thread - however, the AI threads caused the main thread to slow down. After searching for a way to decrease the priority of the thread creating the AI, I found that I would have to use multiprocessing.Process instead of threading.Thread, and then altering its niceness - but when I try to do this, the AI threads don't work. The functions running this are defined within class AI():
def start():
'''
Creates a process to oversee all AI functions
'''
global activeAI, AIsentinel, done
AIsentinel=multiprocessing.Process(target=AI.tick,args=(activeAI, done))
AIsentinel.start()
and
def tick(activeAI, done):
'''
Calls all active AI functions as threads
'''
AIthreads=[]
for i in activeAI:
# i[0] is the AI function, i[1] is the character class instance (user defined)
# and the other list items are its parameters.
AIthreads.append(threading.Thread(name=i[1].name,target=lambda: i[0](i[1],i[2],i[3],i[4],i[5])))
AIthreads[-1].start()
while not done:
for x in AIthreads:
if not x.is_alive():
x.join()
AIthreads.remove(x)
for i in activeAI:
if i[1].name==x.name:
AIthreads.append(threading.Thread(name=i[1].name,target=lambda: i[0](i[1],i[2],i[3],i[4],i[5])))
AIthreads[-1].start()
The outputs of these threads should display in stdout, but when running the program, nothing appears - I assume it's because the threads don't start, but I can't tell if it's just because their output isn't displaying. I'd like a way to get this solution working, but I suspect this solution is far too ugly and not worth fixing, or simply cannot be fixed.
In the event this is as horrible as I fear, I'm also open to new approaches to the problem entirely.
I'm running this on a Windows 10 machine. Thanks in advance.
Edit:
The actual print statement is within another thread - when an AI performs an action, it adds a string describing its action to a queue, which is printed out by yet another thread (as if I didn't have enough of those). As an example:
battle_log.append('{0} dealt {1} damage to {2}!'.format(self.name,damage[0],target.name))
Which is read out by:
def battlereport():
'''
Displays battle updates.
'''
global battle_log, done
print('\n')
while not done:
for i in battle_log:
print(i)
battle_log.remove(i)
I'm kind of new to multiprocessing. However, assume that we have a program as below. The program seems to work fine. Now to the question. In my opinion we will have 4 instances of SomeKindOfClass with the same name (a) at the same time. How is that possible? Moreover, is there a potential risk with this kind of programming?
from multiprocessing.dummy import Pool
import numpy
from theFile import someKindOfClass
n = 8
allOutputs = numpy.zeros(n)
def work(index):
a = SomeKindOfClass()
a.theSlowFunction()
allOutputs[index] = a.output
pool = Pool(processes=4)
pool.map(work,range(0,n))
The name a is only local in scope within your work function, so there is no conflict of names here. Internally python will keep track of each class instance with a unique identifier. If you wanted to check this you could check the object id using the id function:
print(id(a))
I don't see any issues with your code.
Actually, you will have 8 instances of SomeKindOfClass (one for each worker), but only 4 will ever be active at the same time.
multiprocessing vs multiprocessing.dummy
Your program will only work if you continue to use the multiprocessing.dummy module, which is just a wrapper around the threading module. You are still using "python threads" (not separate processes). "Python threads" share the same global state; "Processes" don't. Python threads also share the same GIL, so they're still limited to running one python bytecode statement at a time, unlike processes, which can all run python code simultaneously.
If you were to change your import to from multiprocessing import Pool, you would notice that the allOutputs array remains unchanged after all the workers finish executing (also, you would likely get an error because you're creating the pool in the global scope, you should probably put that inside a main() function). This is because multiprocessing makes a new copy of the entire global state when it makes a new process. When the worker modifies the global allOutputs, it will be modifying a copy of that initial global state. When the process ends, nothing will be returned to the main process and the global state of the main process will remain unchanged.
Sharing State Between Processes
Unlike threads, processes aren't sharing the same memory
If you want to share state between processes, you have to explicitly declare shared variables and pass them to each process, or use pipes or some other method to allow the worker processes to communicate with each other or with the main process.
There are several ways to do this, but perhaps the simplest is using the Manager class
import multiprocessing
def worker(args):
index, array = args
a = SomeKindOfClass()
a.some_expensive_function()
array[index] = a.output
def main():
n = 8
manager = multiprocessing.Manager()
array = manager.list([0] * n)
pool = multiprocessing.Pool(4)
pool.map(worker, [(i, array) for i in range(n)])
print array
You can declare class instances inside the pool workers, because each instance has a separate place in memory so they don't conflict. The problem is if you declare a class instance first, then try to pass that one instance into multiple pool workers. Then each worker has a pointer to the same place in memory, and it will fail (this can be handled, just not this way).
Basically pool workers must not have overlapping memory anywhere. As long as the workers don't try to share memory somewhere, or perform operations that may result in collisions (like printing to the same file), there shouldn't be any problem.
Make sure whatever they're supposed to do (like something you want printed to a file, or added to a broader namespace somewhere) is returned as a result at the end, which you then iterate through.
If you are using multiprocessing you shouldn't worry - process doesn't share memory (by-default). So, there is no any risk to have several independent objects of class SomeKindOfClass - each of them will live in own process. How it works? Python runs your program and after that it runs 4 child processes. That's why it's very important to have if __init__ == '__main__' construction before pool.map(work,range(0,n)). Otherwise you will receive a infinity loop of process creation.
Problems could be if SomeKindOfClass keeps state on disk - for example, write something to file or read it.
I am facing an issue that should be easy to resolve, yet I do feel that I tap in the dark. I was writing a simple framework which consists of the following classes:
First there is an Algorithm class which simply contains numerical procedures:
class Algorithm(object):
.
.
.
#staticmethod
def calculate(parameters):
#do stuff
return result
Then, there is an item class which holds paths to files, utility and status information. A Worker class subclasses QRunnable:
class Worker(QRunnable):
def __init__(self,item,*args,**kwargs):
self.item = item
def run(self,*args,**kwargs):
result = Algorithms.calculate(items.parameter)
item.result = result
And in a Manager class the processes are started
class Manager(object):
def __init__(self,*args,**kwargs):
self.pool = QThreadPool()
self.pool.setMaxThreadCount(4)
self.items = [item1,item2,...]
def onEvent(self):
for i in self.items:
self.pooil.start(item.requestWoker()) #returns a worker
Now the problem: After doing this I notice 2 things:
The work is done not faster (even a bit slower) then doing it with 1 thread
The items get assigned the same results! For example result A, which is the correct result for item A, gets assigned to all items!
I could not find much docu on this, so where did I go wrong?
All the best
Twerp
One limitation of the most common implementation of Python (CPython) is that its parser uses a Global Interpreter Lock which means that only one thread can be parsing Python bytes at a time. It is possible for multiple Python threads to be executing C-based Python subroutines simultaneously, and for multiple Python threads to be waiting on I/O simultaneously, but not for them to be executing Python code simultaneously. Because of that, it is common not to see any speedup when multithreading a CPU-bound Python program. Common workaround are to spawn sub-processes instead of threads (since each sub-process will use its own copy of the Python interpreter, they won't interfere with each other), or to rewrite some or all of the Python program in another language that doesn't have this limitation.