Multiprocessing subject to timer - python

I have a large list L to operate over. Let f() be the function which operates on L. f() takes another variable, which expires every 15 minutes and needs to be renewed. Here is an example, in serial:
def main():
L = openList()
# START THE CLOCK
clockStart = dt.datetime.now()
clockExp = clockStart + dt.timedelta(seconds=900)
a = getRenewed()
for item in L:
f(item, a) # operate on item given a
# CHECK TIME REMAINING
clockCur = dt.datetime.now()
clockRem = (clockExp - clockCur).total_seconds()
# RENEW a IF NEEDED
if clockRem < 5: # renew with 5 seconds left
clockStart = dt.datetime.now()
clockExp = clockStart + dt.timedelta(seconds=900)
a = getRenewed()
Since f() takes a few seconds (or longer sometimes), I would like to parallelize the code. Any tips for how to do that given the timer? I envision sharing clockExp and "a", and when a process satisfies clockRem < 5, it calls getRenewed() and shares the new "a" and clockExp, and repeat.

If getRenewed is idempotent (that is, you can call it multiple times without side effects), you can simply move your existing timer code to your worker processes, and let them each call it once when they notice their own timer has run down. This only requires synchronization for the items from the list that you pass in, and multiprocessing.Pool can handle that easily enough:
def setup_worker():
global clockExp, a
clockStart = dt.datetime.now()
clockExp = clockStart + dt.timedelta(seconds=900)
a = getRenewed()
def worker(item):
global clockExp, a
clockCur = dt.datetime.now()
clockRem = (clockExp - clockCur).total_seconds()
if clockRem < 5: # renew with 5 seconds left
clockStart = dt.datetime.now()
clockExp = clockStart + dt.timedelta(seconds=900)
a = getRenewed()
f(item, a)
def main(L):
pool = multiprocessing.Pool(initializer=setup_worker)
pool.map(worker, L)
If getRenewed is not idempotent, things will need to be a little more complicated. You won't be able to call it in each worker process, so you'll need to have a set up some kind of communication method between your processes so they can each get the latest version when it is available.
I'd suggest using a multiprocessing.queue to pass the a value from the main process to the workers. You can still use a Pool for the list items, you just need to make sure you use it asynchronously from the main process. Like this, perhaps:
def setup_worker2(queue):
global x
x = random.random()
global a_queue, a, clockExp
a_queue = queue
a = a_queue.get() # wait for the first `a` value
clockStart = dt.datetime.now()
clockExp = clockStart + dt.timedelta(seconds=900)
def worker2(item):
global a, clockExp
clockCur = dt.datetime.now()
clockRem = (clockExp - clockCur).total_seconds()
if clockRem < 60: # start checking for a new `a` value 60 seconds before its needed
try:
a = a_queue.get_nowait()
clockStart = dt.datetime.now()
clockExp = clockStart + dt.timedelta(seconds=900)
except queue.Empty:
pass
return f(item, a)
def main2(L):
queue = multiprocessing.Queue() # setup the queue for the a values
pool = multiprocessing.Pool(initializer=setup_worker2, initargs=(queue,))
result = pool.map_async(worker2, L) # send the items to the pool asynchronously
while True: # loop for sending a values through the queue
a = getRenewed() # get a new item
for _ in range(os.cpu_count()):
queue.put(a) # send one copy per worker process
try:
result.wait(900-5) # sleep for ~15 minutes, or until the result is ready
except multiprocessing.TimeoutError:
pass # if we got a timeout, keep looping!
else:
break # if not, we are done, so break out of the loop!
The workers still need to have to have some timing code, because otherwise you'd face a race condition where one worker might consume two of the a values sent down the queue in a single batch from the main process. That could happen if some of the calls to f are significantly slower than others (which is probably likely if they involve downloading things from the web).

Related

When i used a timer decorator for multiple processes, the result is wrong

Here is my code.
# this decorator is used to record the running time.
def timer(func):
#functools.wraps(func)
def wrapper(*args, **kwargs):
start_time = time.time()
func(*args, **kwargs)
stop_time = time.time()
cost_time = stop_time-start_time
print(f'cost time: {cost_time} s!')
return wrapper
# caculate_and_save is my target function.
#timer
def multi_process():
process_list = []
gzdhb_df = pd.read_excel(io='./raw_data/各站点海拔.xlsx')
for province in province_list:
province_array = gzdhb_df[gzdhb_df['省份']==province].values
p = Process(target=caculate_and_save, kwargs={'province': province, 'province_data': province_array})
process_list.append(p)
p.start()
l = len(process_list)
while True:
for p in process_list:
if not p.is_alive():
l -= 1
if l <= 0:
break
The printout of the program is shown in the figure below. Why does the runtime print when the first process is executed, instead of printing at the end of all processes?
printout
First, as an aside, your timer decorator will not work with functions that return results. You should save the result of calling func and then finally return that result after you print out the timings. Now for your main problem:
You have started some number of Process instances in process_list. Let's imagine that in your while True: loop you finally find a process that is no longer alive. Let's call it p1 just to name it. You then subtract 1 from variable l. The next time through the loop, even if no further processes have terminated, you will still find that process p1 is no longer alive and even though all processes except for p1 are still alive, you will be decrementing l again because of p1. In short, once you discover that first process being no longer alive, every subsequent time through the loop you are guaranteed to be decrementing l at least once. Eventually l will go negative and you will break out of the loop with possibly only a single process having terminated.
I would suggest you wait for all processes to terminate by calling join:
# this decorator is used to record the running time.
def timer(func):
#functools.wraps(func)
def wrapper(*args, **kwargs):
start_time = time.time()
result = func(*args, **kwargs)
stop_time = time.time()
cost_time = stop_time-start_time
print(f'cost time: {cost_time} s!')
return result
return wrapper
# caculate_and_save is my target function.
#timer
def multi_process():
process_list = []
gzdhb_df = pd.read_excel(io='./raw_data/各站点海拔.xlsx')
for province in province_list:
province_array = gzdhb_df[gzdhb_df['省份']==province].values
p = Process(target=caculate_and_save, kwargs={'province': province, 'province_data': province_array})
process_list.append(p)
p.start()
for p in process_list:
p.join()

Python CPU Scheduler Simulator

So I have a FCFS and SJF CPU simulator scheduling algorithm, however I'm struggling to implement shortest remaining time first algorithm.
This is what I have so far.
def srtf(submit_times, burst_times):
"""First Come First Serve Algorithm returns the time metrics"""
cpu_clock = 0
job = 0
response_times = []
turn_around_times = []
wait_times = []
total_jobs = []
remaining_burst_times = []
for stuff in range(len(submit_times)):
total_jobs.append(tuple((submit_times[stuff], burst_times[stuff])))
remaining_burst_times.append(burst_times[stuff])
while job < len(submit_times):
if cpu_clock < int(submit_times[job]):
cpu_clock = int(submit_times[job])
ready_queue = []
for the_job in total_jobs:
job_time = int(the_job[0])
if job_time <= cpu_clock:
ready_queue.append(the_job)
short_job = ready_queue_bubble(ready_queue)
submit, burst = short_job[0], short_job[1]
next_event = cpu_clock + int(burst)
response_time = cpu_clock - int(submit)
response_times.append(response_time)
remaining_burst_times[job] = next_event - cpu_clock
# cpu_clock = next_event
if remaining_burst_times[job] == 0:
turn_around_time = next_event - int(submit)
wait_time = turn_around_time - int(burst)
turn_around_times.append(turn_around_time)
wait_times.append(wait_time)
else:
pass
job += 1
total_jobs.remove(short_job)
remaining_burst_times.remove(short_job[1])
return response_times, turn_around_times, wait_times
Basically the function takes in a list of submit times and burst times and returns lists for the response, turn around and wait times. I have been trying to edit remnants from my short job first with a ready queue, to no avail.
Can anyone point me in the right direction?
It's not a very simple simulation due to preemption. Designing simulations is all about representing 1) the state of the world and 2) events that act on the world.
State of the world here is:
Processes. These have their own internal state.
Submit time (immutable)
Burst time (immutable)
Remaining time (mutable)
Completion time (mutable)
Wall clock time.
Next process to be submitted.
Running process.
Run start time (of the currently running process).
Waiting runnable processes (i.e. past submit with remaining > 0).
There are only two kinds of events.
A process's submit time occurs.
The running process completes.
When there are no more processes waiting to be submitted, and no process is running, the simulation is over. You can get the statistics you need from the processes.
The algorithm initializes the state then gets executes a standard event loop:
processes = list of Process built from parameters, sorted by submit time
wall_clock = 0
next_submit = 0 # index in list of processes
running = None # index of running process
run_start = None # start of current run
waiting = []
while True:
event = GetNextEvent()
if event is None:
break
wall_clock = event.time
if event.kind == 'submit':
# Update state for new process submission.
else: # event.kind is 'completion'
# Update state for running process completion.
An important detail is that if completion and submit events happen at the same time, process the completion first. The other way 'round makes update logic complicated; a running process with zero time remaining is a special case.
The "update state" methods adjust all the elements of the state according to the srtf algorithm. Roughly like this...
def UpdateStateForProcessCompletion():
# End the run of the running process
processes[running].remaining = 0
processes[running].completion_time = wall_clock
# Schedule a new one, if any are waiting.
running = PopShortestTimeRemainingProcess(waiting)
run_start_time = clock_time if running else None
A new submit is more complex.
def UpdateStateForProcessCompletion():
new_process = next_submit
next_submit += 1
new_time_remaining = processes[new_process].remaining
# Maybe preempt the running process.
if running:
# Get updated remaining time to run.
running_time_remaining = processes[running].remaining - (wall_clock - run_start)
# We only need to look at new and running processes.
# Waiting ones can't win because they already lost to the running one.
if new_time_remaining < running_time_remaining:
# Preempt.
processes[running].remaining = running_time_remaining
waiting.append(running)
running = new_process
run_start_time = wall_clock
else:
# New process waits. Nothing else changes
waiting.append(new_process)
else:
# Nothing's running. Run the newly submitted process.
running = new_process
run_start_time = wall_clock
The only thing left is getting the next event. You need only inspect processes[next_submit].submit and wall_clock + processes[running].remaining. Choose the smallest. The event has that time and the respective type. Of course you need to deal with the cases where next_submit and/or running are None.
I may not have everything perfect here, but it's pretty close.
Addition
Hope you're done with your homework by this time. This is fun to code up. I ran it on this example, and the trace matches well. Cheers
import heapq as pq
class Process(object):
"""A description of a process in the system."""
def __init__(self, id, submit, burst):
self.id = id
self.submit = submit
self.burst = burst
self.remaining = burst
self.completion = None
self.first_run = None
#property
def response(self):
return None if self.first_run is None else self.first_run - self.submit
#property
def turnaround(self):
return None if self.completion is None else self.completion - self.submit
#property
def wait(self):
return None if self.turnaround is None else self.turnaround - self.burst
def __repr__(self):
return f'P{self.id} # {self.submit} for {self.burst} ({-self.remaining or self.completion})'
def srtf(submits, bursts):
# Make a list of processes in submit time order.
processes = [Process(i + 1, submits[i], bursts[i]) for i in range(len(submits))]
processes_by_submit_asc = sorted(processes, key=lambda x: x.submit)
process_iter = iter(processes_by_submit_asc)
# The state of the simulation:
wall_clock = 0 # Wall clock time.
next_submit = next(process_iter, None) # Next process to be submitted.
running = None # Running process.
run_start = None # Time the running process started running.
waiting = [] # Heap of waiting processes. Pop gets min remaining.
def run(process):
"""Switch the running process to the given one, which may be None."""
nonlocal running, run_start
running = process
if running is None:
run_start = None
return
running.first_run = running.first_run or wall_clock
run_start = wall_clock
while next_submit or running:
print(f'Wall clock: {wall_clock}')
print(f'Running: {running} since {run_start}')
print(f'Waiting: {waiting}')
# Handle completion first, if there is one.
if running and (next_submit is None or run_start + running.remaining <= next_submit.submit):
print('Complete')
wall_clock = run_start + running.remaining
running.remaining = 0
running.completion = wall_clock
run(pq.heappop(waiting)[1] if waiting else None)
continue
# Handle a new submit, if there is one.
if next_submit and (running is None or next_submit.submit < run_start + running.remaining):
print(f'Submit: {next_submit}')
new_process = next_submit
next_submit = next(process_iter, None)
wall_clock = new_process.submit
new_time_remaining = new_process.remaining
if running:
# Maybe preempt the running process. Otherwise new process waits.
running_time_remaining = running.remaining - (wall_clock - run_start)
if new_time_remaining < running_time_remaining:
print('Preempt!')
running.remaining = running_time_remaining
pq.heappush(waiting, (running_time_remaining, running))
run(new_process)
else:
pq.heappush(waiting, (new_time_remaining, new_process))
else:
run(new_process)
for p in processes:
print(f'{p} {p.response} {p.turnaround} {p.wait}')
return ([p.response for p in processes],
[p.turnaround for p in processes],
[p.wait for p in processes])
submits = [6,3,4,1,2,5]
bursts = [1,3,6,5,2,1]
print(srtf(submits, bursts))

Time out a function if it is taking more than 1 minutes in python

I am running a multiprocessing task in python, how can I timeout a function after 60seconds.
What I have done is shown in the snippet below:
import multiprocessing as mp
from multiprocessing import Pool
from multiprocessing import Queue
def main():
global f
global question
global queue
queue = Queue()
processes = []
question = [16,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,17,18,19,21,20,23]
cores=5
loww=0
chunksize = int((len(question)-loww)/cores)
splits = []
for i in range(cores):
splits.append(loww+1+((i)*chunksize))
splits.append(len(question)+1)
print("",splits)
args = []
for i in range(cores):
a=[]
arguments = (i, splits[i], splits[i+1])
a.append(arguments)
args.append(a)
print(args)
p = Pool(cores)
p.map(call_process, args)
p.close()
p.join
def call_process(args):
## end this whole block if it is taking more than 1 minutes
starttime = time.time()
lower=args[0][1]
upper=args[0][2]
for x in range(lower,upper):
if time.time() >= starttime + 60: break
a = question[x-1]
try:
pass
# a lot of functions is called and returned here
except:
continue
#write item to file
print('a = ',a)
return a
main()
I want to ensure that the call_process() method does not run for more than a minute for a particular value. Currently, I am using if time.time() >= starttime + 60: break which would not work effectively as I have different functions and things happening in the try and except block. What can I do?

How to run Threads at the same time and stop one of them without stopping other Thread?

I'm developing a reminder app. I'm asking this question for the 4th time. My issue is, I have 2 Threads. I'm using this Threads as reminders.
When the reminder date comes, the Thread stops.
Here's an example:
import datetime
from threading import Thread
# Current date & time: 12:30, 8/21/2020
current = datetime.datetime.now()
# First reminder: 12:35, 8/21/2020
a = datetime.datetime(2020, 8, 21, 12, 35)
# Second reminder: 12:40, 8/21/2020
b = datetime.datetime(2020, 8, 21, 12, 40)
Let's say I created 2 Threads. One Thread is waiting for a and the other one is waiting for b.
Everything is working nicely. Those Threads will wait until the reminder date comes. And then they will stop automatically using flag.
BUT when I attempt to stop Thread a, program stops Thread b too.
How to prevent this? Here's my full code:
import threading
from datetime import datetime, timedelta
import time
class Reminder:
# Target function
def createThread(self, check):
# Specific date (10 secs later from current date)
b = datetime.now() + timedelta(seconds = 10)
# Set flag (in this case it's 'e')
global e
e = check # gets value from parameter
while True:
# Current time
a = datetime.now()
# If user wants to stop Thread and e equals True, break
if e == True:
print("** REMINDER STOPPED ** -> Stopped.\n")
break
# If current time and set time equals, break
else:
if a >= b:
print("** REMINDER NOTIFICATION ** -> Worked.")
break
def exec(self):
# Global Thread name
global t
# Set thread, sleep 1 seconds and stop the thread
t = threading.Thread(target = self.createThread, args = [False])
# Start thread
t.start()
# Wait
time.sleep(1)
def stop(self):
global e
e = True # Stop thread
# This while statement checks program is still running
while True:
# Input
se = input("\nYup?")
print(se)
# Call class
r = Reminder()
# If input equals 'e', run the Thread
if se == 'e':
r.exec()
# If input equals 'd', stop the Thread
if se == 'd':
r.stop()
I want to set Threads by their name and delete them by their name.
Such as when I type delete a, program stops Thread a, but Thread b should be keep running.
Thanks.
Instead of one shared flag for all threads, you need a flag per thread.
The simplest way to do that is to have set or dict containing the thread objects or their thread_ids.
If you want to retrieve and stop threads by name, a dictionary is likely your best bet. You can spin up a thread and associate to a string.
spool = dict()
# add new thread
spool["first"] = threading.Thread(target = self.createThread, args = [False])
spool["first"].start()
# retrieve and stop thread
thread = spool["first"]
thread.join()
That being said, it would likely be both simpler and more efficient to simply store an ordered list of the reminders, and have a single thread iterating over them until one matches the current time or whatever reminder conditional you want to have.
class Reminder:
def __init__(self, time, msg):
self.time = time
self.msg = msg
def daemon(reminders):
while True:
for reminder in reminders:
if reminder.time <= datetime.now:
# print message or other actions ...
reminders = list[]
d = threading.Thread(target = daemon, args = [reminders])
# add more reminders and other program actions
You could also simply use the threading.Thread to execute a function after a give number of seconds. See the below example modified from the docs:
def hello():
print("hello, world")
year = 60 * 60 * 24 * 365
t = Timer(year, hello)
t.start() # after 1 year, "hello, world" will be printed

Is it possible to execute function every x seconds in python, when it is performing pool.map?

I am running pool.map on big data array and i want to print report in console every minute.
Is it possible? As i understand, python is synchronous language, it can't do this like nodejs.
Perhaps it can be done by threading.. or how?
finished = 0
def make_job():
sleep(1)
global finished
finished += 1
# I want to call this function every minute
def display_status():
print 'finished: ' + finished
def main():
data = [...]
pool = ThreadPool(45)
results = pool.map(make_job, data)
pool.close()
pool.join()
You can use a permanent threaded timer, like those from this question: Python threading.timer - repeat function every 'n' seconds
from threading import Timer,Event
class perpetualTimer(object):
# give it a cycle time (t) and a callback (hFunction)
def __init__(self,t,hFunction):
self.t=t
self.stop = Event()
self.hFunction = hFunction
self.thread = Timer(self.t,self.handle_function)
def handle_function(self):
self.hFunction()
self.thread = Timer(self.t,self.handle_function)
if not self.stop.is_set():
self.thread.start()
def start(self):
self.stop.clear()
self.thread.start()
def cancel(self):
self.stop.set()
self.thread.cancel()
Basically this is just a wrapper for a Timer object that creates a new Timer object every time your desired function is called. Don't expect millisecond accuracy (or even close) from this, but for your purposes it should be ideal.
Using this your example would become:
finished = 0
def make_job():
sleep(1)
global finished
finished += 1
def display_status():
print 'finished: ' + finished
def main():
data = [...]
pool = ThreadPool(45)
# set up the monitor to make run the function every minute
monitor = PerpetualTimer(60,display_status)
monitor.start()
results = pool.map(make_job, data)
pool.close()
pool.join()
monitor.cancel()
EDIT:
A cleaner solution may be (thanks to comments below):
from threading import Event,Thread
class RepeatTimer(Thread):
def __init__(self, t, callback, event):
Thread.__init__(self)
self.stop = event
self.wait_time = t
self.callback = callback
self.daemon = True
def run(self):
while not self.stop.wait(self.wait_time):
self.callback()
Then in your code:
def main():
data = [...]
pool = ThreadPool(45)
stop_flag = Event()
RepeatTimer(60,display_status,stop_flag).start()
results = pool.map(make_job, data)
pool.close()
pool.join()
stop_flag.set()
One way to do this, is to use main thread as the monitoring one. Something like below should work:
def main():
data = [...]
results = []
step = 0
pool = ThreadPool(16)
pool.map_async(make_job, data, callback=results.extend)
pool.close()
while True:
if results:
break
step += 1
sleep(1)
if step % 60 == 0:
print "status update" + ...
I've used .map() instead of .map_async() as the former is synchronous one. Also you probably will need to replace results.extend with something more efficient. And finally, due to GIL, speed improvement may be much smaller than expected.
BTW, it is little bit funny that you wrote that Python is synchronous in a question that asks about ThreadPool ;).
Consider using the time module. The time.time() function returns the current UNIX time.
For example, calling time.time() right now returns 1410384038.967499. One second later, it will return 1410384039.967499.
The way I would do this would be to use a while loop in the place of results = pool(...), and on every iteration to run a check like this:
last_time = time.time()
while (...):
new_time = time.time()
if new_time > last_time+60:
print "status update" + ...
last_time = new_time
(your computation here)
So that will check if (at least) a minute has elapsed since your last status update. It should print a status update approximately every sixty seconds.
Sorry that this is an incomplete answer, but I hope this helps or gives you some useful ideas.

Categories

Resources