Problem description:
I am working with the simulator to extract some dataset from it. The idea is to run multiple processes to perform various task. For example: moving the vehicle using one process and data collection using another process. In the data collection process, 3 threads are running to record three different data types and the recording has to occur periodically. Also, the data should be recorded synchronously.
Sample code is provided without details.
import threading
import multiprocessing
import time
class DataRecorder:
def __init__(self):
"""
some parameters
"""
pass
def move_vehicle(self, path):
pass
def record_data1(self):
pass
def record_data2(self):
pass
def record_data3(self):
pass
def record_data():
t1 = threading.Thread(target=self.record_data1)
t2 = threading.Thread(target=self.record_data2)
t3 = threading.Thread(target=self.record_data3)
threads = [t1, t2, t3]
for thread in threads:
thread.start()
while (True):
for thread in threads:
if not thread.is_alive()
thread.start() # leads to threads can only be started once
for thread in threads:
if thread.is_alive()
thread.join()
time.sleep(1)
def stop_recording(self, p1):
if p1.is_alive():
p1.terminate()
def move_and_record():
P1 = multiprocessing.Process(target=self.record_data)
P1.start()
self.move_vehicle(path)
self.stop_recording(P1)
The problem:
RuntimeError: threads can only be started once.
And inside while loop, the threads gets stopped after 1st iteration. I have tried both without .join() part and with .join() part.
I am also looking for another alternative to solve this problem.
Related
I'm using Python concurrent.futures, executes parent multi threads and each parent thread execute child threads.
When ThreadPoolExecutor is less than number of required parent threads I got starvation and program stuck.
What is the best approach to:
1. Use const ThreadPoolExecutor
2. Do not get into starvation
Please find below example code:
import time
import sys
import concurrent.futures
MAX_THREAD_EXECUTORS = 5
threadPool = concurrent.futures.ThreadPoolExecutor(MAX_THREAD_EXECUTORS)
threads = []
command_threads = []
def main():
start_tests()
join_threads()
def start_tests():
for i in range(1,14):
threads.append(threadPool.submit(start_test_flow, i))
def start_test_flow(test):
print(f"Start test flow for: {test}")
execute_commands()
join_command_threads()
def execute_commands():
for i in range(1,5):
command_threads.append(threadPool.submit(start_command, i))
def start_command(command):
print(f"Start command for: {command}")
time.sleep(120)
def join_threads():
for thread in threads:
result = thread.result()
print(f"test result={result}")
def join_command_threads():
for thread in command_threads:
result = thread.result()
print(f"command result={result}")
if __name__ == '__main__':
main()
sys.exit(0)
Best Regards, Moshe
The minimum number of threads you actually need is non-deterministic and depends on timing, although there is a number (13 + 1, i.e. one thread for each of the parent threads and at least one thread to run a child thread) that will guarantee that you will never stall. What is most likely happening is that you are quickly creating 5 parent threads and then waiting to create further parent threads and child threads because you only have 5 worker threads. But until you are able to create 4 child threads (in execute_commands) and run them to completion, a parent thread will not complete and thus you are stuck.
Now, for example, insert a call to time.sleep(1) in function start_tests as follows:
def start_tests():
for i in range(1,14):
threads.append(threadPool.submit(start_test_flow, i))
time.sleep(1)
This will allow the 4 child threads to be created and there will be some progress. But depending on timing, you may eventually stall. To guarantee that you never stall, you would have to sleep long enough to allow all 4 child threads to complete before attempting to start the next parent thread.
The bottom line is that you just don't have enough worker threads (13 + 1) to guarantee that you won't stall.
I'm trying to run a function after my thread has completed but the function is not called. Code structure:
class():
def functiontocall() # uses data calculated in thread for plotting. only works when thread is complete
do something with self.A
def watchthread():
thread()
functiontocall()
# since this function depends on variable A, it throws an error.
# I tried: if thread.join == True: functiontocall but this did not call the function.
def thread():
def run():
pythoncom.CoInitialize()
--- do stuff --
for i in 1000:
thousands of calculations while updating state in GUI ---
A = result
self.A = A
thread = threading.Thread(target=run)
thread.start()
note: i removed 'self' for simplicity.
thread.join should tell me when the thread has finished but for some reason i still cant run the functiontocall.
Is this a bad way of organizing threads in general?
Edit: I can call the function after the thread is finished but I cannot access variables when the thread is running. e.g. 0-100% progress for a progress bar in my GUI. when I use:
def watchthread():
thread()
thread.join()
functiontocall()
I cannot update the status of the thread in my GUI. It just waits until the calculations are finished then runs functiontocall().
Because you're using threads, once the thread had started Python will move onto the next thing, it will not wait for the thread to finish unless you've asked it to.
With your code, if you want to wait for the thread function to finish before moving on then it doesn't sound like you need threading, a normal function would run, complete, and then Python will move onto running functiontocall()
If there's a reason you need to use threads which isn't coming across in the example then I would suggest using a thread.join()
threads = [] # list to hold threads if you have more than one
t = threading.Thread(target=run)
threads.append(t)
for thread in threads: # wait for all threads to finish
thread.join()
functiontocall() # will only run after all threads are done
Again, I'd suggest relooking at whether threads is what you need to use here as it doesn't seem apparent.
To update this answer based on the new information, this may be the way you want to have a variable be accessible. In this case the threads would all update your class variable A, your GUI update function also reads this periodically and updates your GUI.
class ThisClass():
def __init__(self):
self.A = 0
def function_to_call(self):
while self.A != 100: # assuming this is a progress bar to 100%
# update in GUI
def run(self):
# does calculations
with lock: # to prevent issues with threads accessing variable at the same time
self.A += calculations
def progress(self):
threads = [] # list to hold threads if you have more than one
t = threading.Thread(target=run)
threads.append(t)
f = threading.Thread(target=self.function_to_call)
threads.append(f)
for thread in threads:
thread.start()
for thread in threads: # wait for all threads to finish
thread.join()
I have been trying to use Threads in python. I am working on a Pi hardware project.
Here's the problem:
When I create a thread, and call it like this, the loop keeps creating new threads before the old ones are completed. Hence, slowing the program down... (printing 'threading.active_count' displays 20+ active threads).
while True:
t4 = Thread(target = myFunc, args=())
t4.start()
print("Hello World")
I need a threading process that runs the same function over and over on a SINGLE thread without affecting or delaying my main program. i.e. when a thread has completed executing the function, run it again... but my main should still be printing "Hello World" as normal.
I've found one way to stop it crashing, which is to sit and "wait" until the thread is finished, and then start again. However, this is a blocking approach, and completely defeats the purpose of threading.
while True:
t4 = Thread(target = myFunc, args=())
t4.start()
t4.join()
print("Hello World")
Any suggestions?
You can use a multiprocessing.pool.ThreadPool to manage both the starting of new threads and limiting the maximum number of them executing concurrently.
from multiprocessing.pool import ThreadPool
from random import randint
import threading
import time
MAX_THREADS = 5 # Number of threads that can run concurrently.
print_lock = threading.Lock() # Prevent overlapped printing from threads.
def myFunc():
time.sleep(random.uniform(0, 1)) # Pause a variable amount of time.
with print_lock:
print('myFunc')
def test():
pool = ThreadPool(processes=MAX_THREADS)
for _ in range(100): # Submit as many tasks as desired.
pool.apply_async(myFunc, args=())
pool.close() # Done adding tasks.
pool.join() # Wait for all tasks to complete.
print('done')
if __name__ == '__main__':
test()
I need a threading process that runs the same function over and over on a SINGLE thread
This snippet creates a single thread that continually calls myFunc().
def threadMain() :
while True :
myFunc()
t4 = Thread(target = threadMain, args=())
t4.start()
setDaemon(True) from threading.Thread class more here https://docs.python.org/2/library/threading.html#threading.Thread.daemon
Make a delegate thread - i.e. a thread to run your other threads in sequence:
def delegate(*args):
while True:
t = Thread(target=myFunc, args=args) # or just call myFunc(*args) instead of a thread
t.start()
t.join()
t = Thread(target=delegate, args=())
t.start()
while True:
print("Hello world!")
Or even better, redesign your myFunc() to run its logic within a while True: ... loop and start the thread only once.
I'd also advise you to add some sort of a delay (e.g. time.sleep()) if you're not performing any work in your threads to help with context switching.
I have a large dataset in a list that I need to do some work on.
I want to start x amounts of threads to work on the list at any given time, until everything in that list has been popped.
I know how to start x amounts of threads (lets say 20) at a given time (by using thread1....thread20.start())
but how do I make it start a new thread when one of the first 20 threads finish? so at any given time there are 20 threads running, until the list is empty.
what I have so far:
class queryData(threading.Thread):
def __init__(self,threadID):
threading.Thread.__init__(self)
self.threadID = threadID
def run(self):
global lst
#Get trade from list
trade = lst.pop()
tradeId=trade[0][1][:6]
print tradeId
thread1 = queryData(1)
thread1.start()
Update
I have something going with the following code:
for i in range(20):
threads.append(queryData(i))
for thread in threads:
thread.start()
while len(lst)>0:
for iter,thread in enumerate(threads):
thread.join()
lock.acquire()
threads[iter] = queryData(i)
threads[iter].start()
lock.release()
Now it starts 20 threads in the beginning...and then keeps starting a new thread when one finishes.
However, it is not efficient, as it waits for the first one in the list to finish, and then the second..and so on.
Is there a better way of doing this?
Basically I need:
-Start 20 threads:
-While list is not empty:
-wait for 1 of the 20 threads to finish
-reuse or start a new thread
As I suggested in a comment, I think using a multiprocessing.pool.ThreadPool would be appropriate — because it would handle much of the thread management you're manually doing in your code automatically. Once all the threads are queued-up for processing via ThreadPool's apply_async() method calls, the only thing that needs to be done is wait until they've all finished execution (unless there's something else your code could be doing, of course).
I've translated the code in my linked answer to another related question so it's more similar to what you appear to be doing to make it easier to understand in the current context.
from multiprocessing.pool import ThreadPool
from random import randint
import threading
import time
MAX_THREADS = 5
print_lock = threading.Lock() # Prevent overlapped printing from threads.
def query_data(trade):
trade_id = trade[0][1][:6]
time.sleep(randint(1, 3)) # Simulate variable working time for testing.
with print_lock:
print(trade_id)
def process_trades(trade_list):
pool = ThreadPool(processes=MAX_THREADS)
results = []
while(trade_list):
trade = trade_list.pop()
results.append(pool.apply_async(query_data, (trade,)))
pool.close() # Done adding tasks.
pool.join() # Wait for all tasks to complete.
def test():
trade_list = [[['abc', ('%06d' % id) + 'defghi']] for id in range(1, 101)]
process_trades(trade_list)
if __name__ == "__main__":
test()
You can wait for a thread to complete with : thread.join(). This call will block until that thread completes, at which point you can create a new one.
However, instead of respawning a Thread each time, why not recycle your existing threads ?
This can be done by the use of tasks for example. You keep a list of tasks in a shared collection, and when one of your threads finishes a task, it retrieves another one from that collection.
So, I have this basic example that I wrote where I use multiple threads and a single Queue calling different functions independently and doing certain tasks. Does this logic look right or is there any scope of improvement?
I didnt use like a separate class for threads as showed in http://www.ibm.com/developerworks/aix/library/au-threadingpython/ as that would unneccesarily complicate the workflow I am trying to implement, wherein each thread calls a separate function and puts it in the same queue which I can use later to analyze the results.
from Queue import Queue
from threading import Thread
class Scheduler():
def __init__(self):
self.id=10
def add(self,q,id):
self.id+=id
q.put('added %d' % self.id)
q.task_done()
def mul(self,q,id):
self.id*=id
q.put('multiplied : %d' % self.id)
q.task_done()
if __name__=='__main__':
id=5
sch1=Scheduler()
sch2=Scheduler()
q= Queue()
t1=Thread(target=sch1.add, args=(q,id,))
t1.start()
t2=Thread(target=sch2.mul, args=(q,id,))
t2.start()
print q.get()
print q.get()
q.join()
There are a couple of issues here:
First, incrementing self.id via += and *= is not atomic, so if you were to run multiple add and or mul methods concurrently on the same Scheduler object, self.id could end up being calculated incorrectly because of two or more instances stepping on each other. You can fix that by protecting the increment operations with a threading.Lock.
Second, you're misusing the Queue.task_done/Queue.join methods. The idea behind task_done and join is to have a producer thread putitems onto the Queue, and then, after its added all its work items to the Queue, call queue.join() to wait for all the work items to be processed by one or more consumers. The consumers call queue.get(), processs the work item, and then call queue.task_done() to signal that its done processing the item. You've got this a bit backwards - you're calling queue.put and queue.task_done from the same thread. The way you're using the Queue, it really doesn't make sense to use this pattern - you're just using the Queue to pass results back to the main thread. You might as well just do this:
from Queue import Queue
from threading import Thread
class Scheduler():
def __init__(self):
self.id=10
def add(self,q,id):
self.id+=id
q.put('added %d' % self.id)
def mul(self,q,id):
self.id*=id
q.put('multiplied : %d' % self.id)
if __name__=='__main__':
sch1 = Scheduler()
sch2 = Scheduler()
q = Queue()
t1 = Thread(target=sch1.add, args=(q,id,))
t1.start()
t2 = Thread(target=sch2.mul, args=(q,id,))
t2.start()
print q.get()
print q.get()