True Concurrency in Python - python

I am relatively new to concurrency in Python and I am working on some code that has to call functions outside of my code. I cannot edit those functions but I need them to run concurrently. I've tried a few different solutions like Multiprocessing, Threading and AsyncIO. AsyncIO comes the closest to what I want if every function I was calling was defined with it, but they're not.
The functions I'm calling will block. Sometimes for 15-30 minutes. During that time, I need other functions doing other things. The code below illustrates my problem. If you run it you'll see that whether using Threads or Multiprocesses, the tasks always run serially. I need them to run simultaneous to each other. I get that the output blocks until the entire script runs, but the tasks themselves should not.
What am I missing? With so many choices for concurrency or at least apparent concurrency in Python, I would think this is easier than I'm finding it.
#!/usr/bin/python3
from datetime import datetime
from multiprocessing import Process
import sys
from threading import Thread
from time import sleep
def main():
# Doing it with the multiprocess module
print("Using MultiProcess:")
useprocs()
print("\nUsing Threading:")
usethreads()
def useprocs():
procs = []
task1 = Process(target=blockingfunc('Task1'))
task1.start()
procs.append(task1)
task2 = Process(target=blockingfunc('Tast2'))
task2.start()
procs.append(task2)
task1.join()
task2.join()
print('All processes completed')
def usethreads():
threads = []
task3 = Process(target=blockingfunc('Task3'))
task3.start()
threads.append(task3)
task4 = Process(target=blockingfunc('Task4'))
task4.start()
threads.append(task4)
task3.join()
task4.join()
print('All threads completed')
def blockingfunc(taskname):
now = datetime.now()
current_time = now.strftime("%H:%M:%S")
print(current_time, "Starting task: ", taskname)
sleep(5)
now = datetime.now()
current_time = now.strftime("%H:%M:%S")
print(current_time, taskname, "completed")
if __name__ == '__main__':
try:
main()
except:
sys.exit(1)

Note that the program you posted imports Thread but never uses it.
More importantly, in a line like:
task1 = Process(target=blockingfunc('Task1'))
you're calling blockingfunc('Task1') and passing what it returns (None) as the value of the target argument. Not at all what you intended. What you intended:
task1 = Process(target=blockingfunc, args=['Task1'])
Then, as intended, blockingfunc isn't actually invoked before you call the start() method.

Related

Threads not executing concurrently

I am working with a simple Python script that controls a sensor and reads measurements from this sensor. I want to take different measurement types concurrently, the function below can be used for each type of measurement:
def measure(measurement_type, num_iterations):
file = open(measurement_type, 'w')
writer = csv.writer(file)
for i in range(num_iterations):
pm2.5, pm10 = sensor.query()
writer.writerow(pm2.5, pm10, curr_time())
time.sleep(60)
file.close()
upload_data(file, measurement_type)
I attempt to invoke multiple calls to this function on separate threads in order to obtain files describing measurements in various contexts of times (hourly, daily, weekly, etc.):
if __name__ == '__main__':
sensor = SDS011("/dev/ttyUSB0")
sensor.sleep(sleep=False)
print("Preparing sensor...")
time.sleep(15)
print("Sensor is now running:")
try:
while True:
Thread(target=take_measurements('hourly', 60)).start()
Thread(target=take_measurements('daily', 1440)).start()
Thread(target=take_measurements('weekly', 10080)).start()
Thread(target=take_measurements('monthly', 43800)).start()
except KeyboardInterrupt:
clean_exit()
Only one of these threads is ever running at a given time, and which one is executed appears random. It may be worth noting that this script is running on a RaspberryPi. My first thought was that multiple threads attempting to access the sensor could create a race condition, but I would not expect the script to continue running any threads if this occurred.
when you call your function directly in the target operation, Python will first try to evaluate what your function returns and execute its code.
There is a special way to indicate to the threading module that you want some arguments for your function and not call your function until the moment you start the thread. Hope the example below helps:
from time import sleep
from random import randint
from threading import Thread
def something(to_print):
sleep(randint(1,3))
print(to_print)
threadlist = []
threadlist.append(Thread(target=something, args=["A"]))
threadlist.append(Thread(target=something, args=["B"]))
threadlist.append(Thread(target=something, args=["C"]))
for thread in threadlist:
thread.start()
This will return a different value each time:
(.venv) remzi in ~/Desktop/playground > python test.py
A
C
B
(.venv) remzi in ~/Desktop/playground > python test.py
C
A
B

Process Pool Executor runs code outside of scope

I'm trying to run a bunch of processes in parallel with the Process Pool Executor from concurrent futures in Python.
The processes are all running in parallel in a while loop which is great, but for some reason the code outside of the main method repeatedly runs. I saw another answer say to use the name == main check to fix but it still doesn't work.
Any ideas how I can just get the code inside the main method to run? My object keeps getting reset repeatedly.
EDIT: I ran my code using ThreadPoolExecutor instead and it fixed the problem, although I'm still curious about this.
import concurrent.futures
import time
from myFile import myObject
obj = myObject()
def main():
with concurrent.futures.ProcessPoolExecutor() as executor:
while condition:
for index in range(0,10):
executor.submit(obj.function, index, index+1)
executor.submit(obj.function2)
time.sleep(5)
print("test")
if __name__ == "__main__":
main()

How to achive multiprocessing with time constraint in python

I have two functions that need to be run parallel using scheduler. I implemented with multiprocessing but one process blocks other process. How to achieve such functionality where lets say one function runs every 5 minutes and performs some task while other functions also performs some task every 2 minutes? Here both functions are different.
I have used scheduler to run both functions. But it blocks other function until its finished.
For example:
def count1():
now = datetime.now()
start_time = now.strftime("%H:%M:%S")
time.sleep(5)
datetime.now()
end_time = now.strftime("%H:%M:%S")
def count2():
now = datetime.now()
start_time = now.strftime("%H:%M:%S")
time.sleep(5)
datetime.now()
end_time = now.strftime("%H:%M:%S")
if __name__ == '__main__':
schedule.every(5).seconds.do(count1)
schedule.every(15).seconds.do(count2)
while True:
# Checks whether a scheduled task
# is pending to run or not
schedule.run_pending()
time.sleep(1)
I want to run both functions parallel without blocking each other. How do I achieve this?
I assume you are using the schedule package, which is described in the first paragraph of its documentation as an in-process scheduler – in other words, it won't give you parallelism. The documentation also includes an FAQ entry on running jobs in parallel.
Bottom line: if you want parallelism, you'll need to set up your own threads or processes, or find a different scheduling package that does that stuff.

In Python run scheduled event function and while loop function with multiprocess with pool

I have two functions. One is celen() which checks the calendar and make schedule to execute something and another one is infinite while loop, tech(). I tried to run by multi-process, couldn't see anything printing on shell and ended up doing following code which at least showing the first process's output.
But, while the first process/ the calendar event with apsscheduler running it shows the all the pending jobs, the second job/function, the infinite loop doesn't start!
How can I run both with multiprocess/subprocess/multithreading while I can still see the output in shell or anywhere from both function?
def trade():
return(calen(),tech())
with Pool(cpu_count()) as p:
results = p.map(trade())
print(list(results))
Previously I also did try
if __name__ == '__main__':
with Pool(processes=2) as pool:
r1 = pool.apply_async(calen, ())
r2 = pool.apply_async(tech, ())
print(r1.get(timeout=120))
print(r2.get(timeout=120))
I will appreciate if anyone can give a solve how to run while loop & scheduled event together while outputs are visible.
I guess I am doing mistake with apscheduler. Apschduler it self run multiprocess with schdule and also in interval/while loop.
The while loop should be executed from apscheduler, not as separate function.
Instead I trid to do as seperate, one with apsscheduler & another ordinary while loop. WHile apscheduler started it was blocking any other operation.
This helped me https://devcenter.heroku.com/articles/clock-processes-python
It's actually good solution for multiprocess as well (as far I have understood)
from apscheduler.schedulers.blocking import BlockingScheduler
sched = BlockingScheduler()
#sched.scheduled_job('interval', minutes=3)
def timed_job():
print('This job is run every three minutes.')
#sched.scheduled_job('cron', day_of_week='mon-fri', hour=17)
def scheduled_job():
print('This job is run every weekday at 5pm.')
sched.start()

Python multiprocessing - Is it possible to introduce a fixed time delay between individual processes?

I have searched and cannot find an answer to this question elsewhere. Hopefully I haven't missed something.
I am trying to use Python multiprocessing to essentially batch run some proprietary models in parallel. I have, say, 200 simulations, and I want to batch run them ~10-20 at a time. My problem is that the proprietary software crashes if two models happen to start at the same / similar time. I need to introduce a delay between processes spawned by multiprocessing so that each new model run waits a little bit before starting.
So far, my solution has been to introduced a random time delay at the start of the child process before it fires off the model run. However, this only reduces the probability of any two runs starting at the same time, and therefore I still run into problems when trying to process a large number of models. I therefore think that the time delay needs to be built into the multiprocessing part of the code but I haven't been able to find any documentation or examples of this.
Edit: I am using Python 2.7
This is my code so far:
from time import sleep
import numpy as np
import subprocess
import multiprocessing
def runmodels(arg):
sleep(np.random.rand(1,1)*120) # this is my interim solution to reduce the probability that any two runs start at the same time, but it isn't a guaranteed solution
subprocess.call(arg) # this line actually fires off the model run
if __name__ == '__main__':
arguments = [big list of runs in here
]
count = 12
pool = multiprocessing.Pool(processes = count)
r = pool.imap_unordered(runmodels, arguments)
pool.close()
pool.join()
multiprocessing.Pool() already limits number of processes running concurrently.
You could use a lock, to separate the starting time of the processes (not tested):
import threading
import multiprocessing
def init(lock):
global starting
starting = lock
def run_model(arg):
starting.acquire() # no other process can get it until it is released
threading.Timer(1, starting.release).start() # release in a second
# ... start your simulation here
if __name__=="__main__":
arguments = ...
pool = Pool(processes=12,
initializer=init, initargs=[multiprocessing.Lock()])
for _ in pool.imap_unordered(run_model, arguments):
pass
One way to do this with thread and semaphore :
from time import sleep
import subprocess
import threading
def runmodels(arg):
subprocess.call(arg)
sGlobal.release() # release for next launch
if __name__ == '__main__':
threads = []
global sGlobal
sGlobal = threading.Semaphore(12) #Semaphore for max 12 Thread
arguments = [big list of runs in here
]
for arg in arguments :
sGlobal.acquire() # Block if more than 12 thread
t = threading.Thread(target=runmodels, args=(arg,))
threads.append(t)
t.start()
sleep(1)
for t in threads :
t.join()
The answer suggested by jfs caused problems for me as a result of starting a new thread with threading.Timer. If the worker just so happens to finish before the timer does, the timer is killed and the lock is never released.
I propose an alternative route, in which each successive worker will wait until enough time has passed since the start of the previous one. This seems to have the same desired effect, but without having to rely on another child process.
import multiprocessing as mp
import time
def init(shared_val):
global start_time
start_time = shared_val
def run_model(arg):
with start_time.get_lock():
wait_time = max(0, start_time.value - time.time())
time.sleep(wait_time)
start_time.value = time.time() + 1.0 # Specify interval here
# ... start your simulation here
if __name__=="__main__":
arguments = ...
pool = mp.Pool(processes=12,
initializer=init, initargs=[mp.Value('d')])
for _ in pool.imap_unordered(run_model, arguments):
pass

Categories

Resources