I have two functions that need to be run parallel using scheduler. I implemented with multiprocessing but one process blocks other process. How to achieve such functionality where lets say one function runs every 5 minutes and performs some task while other functions also performs some task every 2 minutes? Here both functions are different.
I have used scheduler to run both functions. But it blocks other function until its finished.
For example:
def count1():
now = datetime.now()
start_time = now.strftime("%H:%M:%S")
time.sleep(5)
datetime.now()
end_time = now.strftime("%H:%M:%S")
def count2():
now = datetime.now()
start_time = now.strftime("%H:%M:%S")
time.sleep(5)
datetime.now()
end_time = now.strftime("%H:%M:%S")
if __name__ == '__main__':
schedule.every(5).seconds.do(count1)
schedule.every(15).seconds.do(count2)
while True:
# Checks whether a scheduled task
# is pending to run or not
schedule.run_pending()
time.sleep(1)
I want to run both functions parallel without blocking each other. How do I achieve this?
I assume you are using the schedule package, which is described in the first paragraph of its documentation as an in-process scheduler – in other words, it won't give you parallelism. The documentation also includes an FAQ entry on running jobs in parallel.
Bottom line: if you want parallelism, you'll need to set up your own threads or processes, or find a different scheduling package that does that stuff.
Related
I am relatively new to concurrency in Python and I am working on some code that has to call functions outside of my code. I cannot edit those functions but I need them to run concurrently. I've tried a few different solutions like Multiprocessing, Threading and AsyncIO. AsyncIO comes the closest to what I want if every function I was calling was defined with it, but they're not.
The functions I'm calling will block. Sometimes for 15-30 minutes. During that time, I need other functions doing other things. The code below illustrates my problem. If you run it you'll see that whether using Threads or Multiprocesses, the tasks always run serially. I need them to run simultaneous to each other. I get that the output blocks until the entire script runs, but the tasks themselves should not.
What am I missing? With so many choices for concurrency or at least apparent concurrency in Python, I would think this is easier than I'm finding it.
#!/usr/bin/python3
from datetime import datetime
from multiprocessing import Process
import sys
from threading import Thread
from time import sleep
def main():
# Doing it with the multiprocess module
print("Using MultiProcess:")
useprocs()
print("\nUsing Threading:")
usethreads()
def useprocs():
procs = []
task1 = Process(target=blockingfunc('Task1'))
task1.start()
procs.append(task1)
task2 = Process(target=blockingfunc('Tast2'))
task2.start()
procs.append(task2)
task1.join()
task2.join()
print('All processes completed')
def usethreads():
threads = []
task3 = Process(target=blockingfunc('Task3'))
task3.start()
threads.append(task3)
task4 = Process(target=blockingfunc('Task4'))
task4.start()
threads.append(task4)
task3.join()
task4.join()
print('All threads completed')
def blockingfunc(taskname):
now = datetime.now()
current_time = now.strftime("%H:%M:%S")
print(current_time, "Starting task: ", taskname)
sleep(5)
now = datetime.now()
current_time = now.strftime("%H:%M:%S")
print(current_time, taskname, "completed")
if __name__ == '__main__':
try:
main()
except:
sys.exit(1)
Note that the program you posted imports Thread but never uses it.
More importantly, in a line like:
task1 = Process(target=blockingfunc('Task1'))
you're calling blockingfunc('Task1') and passing what it returns (None) as the value of the target argument. Not at all what you intended. What you intended:
task1 = Process(target=blockingfunc, args=['Task1'])
Then, as intended, blockingfunc isn't actually invoked before you call the start() method.
I would like to run a function asynchronously in Python, calling the function repeatedly at a fixed time interval. This java class has functionality similar to what I want. I was hoping for something in python like:
pool = multiprocessing.Pool()
pool.schedule(func, args, period)
# other code to do while that runs in the background
pool.close()
pool.join()
Are there any packages which provide similar functionality? I would prefer something simple and lightweight.
How could I implement this functionality in python?
This post is similar, but asks for an in process solution. I want a multiprocess async solution.
Here is one possible solution. One caveat is that func needs to return faster than rate, else it wont be called as frequently as rate and if it ever gets quicker it will be scheduled faster than rate while it catches up. This approach seems like a lot of work, but then again parallel programming is often tough. I would appreciate a second look at the code to make sure I don't have a deadlock waiting somewhere.
import multiprocessing, time, math
def func():
print('hello its now {}'.format(time.time()))
def wrapper(f, period, event):
last = time.time() - period
while True:
now = time.time()
# returns True if event is set, otherwise False after timeout
if event.wait(timeout=(last + period - now)):
break
else:
f()
last += period
def main():
period = 2
# event is the poison pill, setting it breaks the infinite loop in wrapper
event = multiprocessing.Event()
process = multiprocessing.Process(target=wrapper, args=(func, period, event))
process.start()
# burn some cpu cycles, takes about 20 seconds on my machine
x = 7
for i in range(50000000):
x = math.sqrt(x**2)
event.set()
process.join()
print('x is {} by the way'.format(x))
if __name__ == '__main__':
main()
I have 2 simple functions(loops over a range) that can run separately without any dependency.. I'm trying to run this 2 functions both using the Python multiprocessing module as well as multithreading module..
When I compared the output, I see the multiprocess application takes 1 second more than the multi-threading module..
I read multi-threading is not that efficient because of the Global interpreter lock...
Based on the above statements -
1. Is is best to use the multiprocessing if there is no dependency between 2 processes?
2. How to calculate the number of processes/threads that I can run in my machine for maximum efficiency..
3. Also, is there a way to calculate the efficiency of the program by using multithreading...
Multithread module...
from multiprocessing import Process
import thread
import platform
import os
import time
import threading
class Thread1(threading.Thread):
def __init__(self,threadindicator):
threading.Thread.__init__(self)
self.threadind = threadindicator
def run(self):
starttime = time.time()
if self.threadind == 'A':
process1()
else:
process2()
endtime = time.time()
print 'Thread 1 complete : Time Taken = ', endtime - starttime
def process1():
starttime = time.time()
for i in range(100000):
for j in range(10000):
pass
endtime = time.time()
def process2():
for i in range(1000):
for j in range(1000):
pass
def main():
print 'Main Thread'
starttime = time.time()
thread1 = Thread1('A')
thread2 = Thread1('B')
thread1.start()
thread2.start()
threads = []
threads.append(thread1)
threads.append(thread2)
for t in threads:
t.join()
endtime = time.time()
print 'Main Thread Complete , Total Time Taken = ', endtime - starttime
if __name__ == '__main__':
main()
multiprocess module
from multiprocessing import Process
import platform
import os
import time
def process1():
# print 'process_1 processor =',platform.processor()
starttime = time.time()
for i in range(100000):
for j in range(10000):
pass
endtime = time.time()
print 'Process 1 complete : Time Taken = ', endtime - starttime
def process2():
# print 'process_2 processor =',platform.processor()
starttime = time.time()
for i in range(1000):
for j in range(1000):
pass
endtime = time.time()
print 'Process 2 complete : Time Taken = ', endtime - starttime
def main():
print 'Main Process start'
starttime = time.time()
processlist = []
p1 = Process(target=process1)
p1.start()
processlist.append(p1)
p2 = Process(target = process2)
p2.start()
processlist.append(p2)
for i in processlist:
i.join()
endtime = time.time()
print 'Main Process Complete - Total time taken = ', endtime - starttime
if __name__ == '__main__':
main()
If you have two CPUs available on your machine, you have two processes which don't have to communicate, and you want to use both of them to make your program faster, you should use the multiprocessing module, rather than the threading module.
The Global Interpreter Lock (GIL) prevents the Python interpreter from making efficient use of more than one CPU by using multiple threads, because only one thread can be executing Python bytecode at a time. Therefore, multithreading won't improve the overall runtime of your application unless you have calls that are blocking (e.g. waiting for IO) or that release the GIL (e.g. numpy will do this for some expensive calls) for extended periods of time. However, the multiprocessing library creates separate subprocesses, and therefore several copies of the interpreter, so it can make efficient use of multiple CPUs.
However, in the example you gave, you have one process that finishes very quickly (less than 0.1 seconds on my machine) and one process that takes around 18 seconds to finish on the other. The exact numbers may vary depending on your hardware. In that case, nearly all the work is happening in one process, so you're really only using one CPU regardless. In this case, the increased overhead of spawning processes vs threads is probably causing the process-based version to be slower.
If you make both processes do the 18 second nested loops, you should see that the multiprocessing code goes much faster (assuming your machine actually has more than one CPU). On my machine, I saw the multiprocessing code finish in around 18.5 seconds, and the multithreaded code finish in 71.5 seconds. I'm not sure why the multithreaded one took longer than around 36 seconds, but my guess is the GIL is causing some sort of thread contention issue which is slowing down both threads from executing.
As for your second question, assuming there's no other load on the system, you should use a number of processes equal to the number of CPUs on your system. You can discover this by doing lscpu on a Linux system, sysctl hw.ncpu on a Mac system, or running dxdiag from the Run dialog on Windows (there's probably other ways, but this is how I always do it).
For the third question, the simplest way to figure out how much efficiency you're getting from the extra processes is just to measure the total runtime of your program, using time.time() as you were, or the time utility in Linux (e.g. time python myprog.py). The ideal speedup should be equal to the number of processes you're using, so a 4 process program running on 4 CPUs should be at most 4x faster than the same program with 1 process, assuming you get maximum benefit from the extra processes. If the other processes aren't helping you that much, it will be less than 4x.
I am trying to run python application and execute actions based on specified interval. Below code is consuming constantly 100% of CPU.
def action_print():
print "hello there"
interval = 5
next_run = 0
while True:
while next_run > time.time():
pass
next_run = time.time() + interval
action_print()
I would like to avoid putting process to sleep as there will be more actions to execute at various intervals.
please advise
If you know when the next run will be, you can simply use time.sleep:
import time
interval = 5
next_run = 0
while True:
time.sleep(max(0, next_run - time.time()))
next_run = time.time() + interval
action_print()
If you want other threads to be able to interrupt you, use an event like this:
import time,threading
interval = 5
next_run = 0
interruptEvent = threading.Event()
while True:
interruptEvent.wait(max(0, next_run - time.time()))
interruptEvent.clear()
next_run = time.time() + interval
action_print()
Another thread can now call interruptEvent.set() to wake up yours.
In many cases, you will also want to use a Lock to avoid race conditions on shared data. Make sure to clear the event while you hold the lock.
You should also be aware that under cpython, only one thread can execute Python code. Therefore, if your program is CPU-bound over multiple threads and you're using cpython or pypy, you should substitute threading with multiprocessing.
Presumably you do not want to write time.sleep(interval) , but replacing 'pass' with time.sleep(0.1) will almost completely free up your CPU, and still allow you flexibility in the WHILE predicate.
Alternatively you could use a thread for each event you are scheduling and use time.sleep(interval) but this will still tie up your CPU.
Bottom line : your loop WHILE : PASS is going round and round very fast consuming all your CPU.
Is it possible to schedule an event in python without multithreading?
I am trying to obtain something like scheduling a function to execute every x seconds.
Maybe sched?
You could use a combination of signal.alarm and a signal handler for SIGALRM like so to repeat the function every 5 seconds.
import signal
def handler(sig, frame):
print ("I am done this time")
signal.alarm(5) #Schedule this to happen again.
signal.signal(signal.SIGALRM, handler)
signal.alarm(5)
The other option is to use the sched module that comes along with Python but I don't know whether it uses threads or not.
Sched is probably the way to go for this, as #eumiro points out. However, if you don't want to do that, then you could do this:
import time
while 1:
#call your event
time.sleep(x) #wait for x many seconds before calling the script again
Without threading it seldom makes sense to periodically call a function. Because your main thread is blocked by waiting - it simply does nothing.
However if you really want to do so:
import time
for x in range(3):
print('Loop start')
time.sleep(2)
print('Calling some function...')
I this what you really want?
You could use celery:
Celery is an open source asynchronous
task queue/job queue based on
distributed message passing. It is
focused on real-time operation, but
supports scheduling as well.
The execution units, called tasks, are
executed concurrently on one or more
worker nodes. Tasks can execute
asynchronously (in the background) or
synchronously (wait until ready).
and a code example:
You probably want to see some code by
now, so here’s an example task adding
two numbers:
from celery.decorators import task
#task
def add(x, y):
return x + y
You can execute the task in the
background, or wait for it to finish:
>>> result = add.delay(4, 4)
>>> result.wait() # wait for and return the result 8
This is of more general use than the problem you describe requires, though.