How to schedule real time cyclic task? - python

We are a team of bachelor students currently working on building a legged robot. At the moment our interface to the robot is written in python using an sdk from the master board we are using.
In order to communicate with the master board sdk, we need to send a command every millisecond.
To allow us to send tasks periodically, we have applied the rt-preempt patch to our linux kernel. (Ubuntu LTS 20.04, kernel 5.10.27-rt36)
We are very new to writing real time applications, and have run into some issues where our task sometimes will have a much smaller time step than specified. In the figure below we have plotted the time of each cycle of the while loop where the command is being sent to the sdk. (x axis is time in seconds and y axis is the elapsed time of an iteration, also in seconds)
As seen in the plot, one step is much smaller than the rest. This seems to happen at the same exact time mark every time we run the script.
cyclic_task_plot
We set the priority of the entire script using:
pid = os.getpid()
sched = os.SCHED_FIFO
param = os.sched_param(98)
os.sched_setscheduler(pid, sched, param)
Our cyclic task looks like this:
dt is set to 0.001
while(_running):
if direction:
q = q + 0.0025
if (q > np.pi/2).any():
direction = False
else:
q = q - 0.0025
if (q < -np.pi/2).any():
direction = True
master_board.track_reference(q, q_prime)
#Terminate if duration has passed
if (time.perf_counter() - program_start > duration):
_running = False
cycle_time = time.perf_counter() - cycle_start
time.sleep(dt - cycle_time)
cycle_start = time.perf_counter()
timestep_end = time.perf_counter()
time_per_timestep_array.append(timestep_end - timestep_start)
timestep_start = time.perf_counter()
We suspect the issue has to do with the way we define the sleep amount. Cycle_time is meant to be the time that the calculations above time.sleep() takes, so that: sleep time + cycle time = 1ms. However, we are not sure how to properly do this, and we're struggling with finding resources on the subject.
How should one properly define a task such as this for a real time application?
We have quite loose requirements (several milliseconds), but it is very important to us that it is deterministic, as this is part of our thesis and we need to understand what is going on.
Any answers to our question or relevant resources are greatly appreciated.
Link to the full code: https://drive.google.com/drive/folders/12KE0EBaLc2rkTZK2FuX_goMF4MgWtknS?usp=sharing

timestep_end = time.perf_counter()
time_per_timestep_array.append(timestep_end - timestep_start)
timestep_start = time.perf_counter()
You're recording the time between timestep_start from the previous cycle and timestep_end from the current cycle. This interval does not accurately represent the cycle time step (even if we assume that no task preemption takes place); it excludes the time consumed by the array append function. Since the outlier seems to happen at the same exact time mark every time we run the script, we could suspect that at this point the array exceeds a certain size where an expensive memory reallocation has to take place. Regardless of the real reason, you should remove such timing inaccuracies by recording the time between cycle starts:
timestep_end = cycle_start
time_per_timestep_array.append(timestep_end - timestep_start)
timestep_start = cycle_start

Related

Python Pool Multiprocessing Poor CPU Usage

I have a bunch of independent N body sims I want to run in parallel in python. The walltime for individual sims is going to vary dramatically depending on the parameters of the bodies in the sims. It seemed like the best way to do this would be to build pool of processes with the multiprocessing module, give them the sim jobs with the starmap() function, and have them save the results to separate files based on the process ID. However, I've getting awful parallel performance. There is no speedup between 2 and 4 processes (I have 4 CPU on my laptop) and the unix time utility seems to think that the CPU usage percentage is ~150% which is terrible. Below is my code:
import rebound
import numpy as np
import multiprocessing as mp
def two_orbits_one_pool(orbit1, orbit2):
#######################################
print('process number', mp.current_process().name)
#######################################
# build simulation
sim = rebound.Simulation()
# add sun
sim.add(m=1.)
# add two overlapping orbits
sim.add(primary=sim.particles[0], m=orbit1['m'], a=orbit1['a'], e=orbit1['e'], inc=orbit1['i'], \
pomega=orbit1['lop'], Omega=orbit1['lan'], M=orbit1['M'])
sim.add(primary=sim.particles[0], m=orbit2['m'], a=orbit2['a'], e=orbit2['e'], inc=orbit2['i'], \
pomega=orbit2['lop'], Omega=orbit2['lan'], M=orbit2['M'])
sim.move_to_com()
# integrate for 10 orbits of orbit1
P = 2.*np.pi * np.sqrt(orbit1['a']**3)
sim.automateSimulationArchive("archive-{}.bin".format(mp.current_process().name), interval=P)
sim.integrate(10.*P)
if __name__ == "__main__":
# orbit definitions
N_M = 10
N_lop = 10
m = 1e-6
a, e = 1., 0.3
inc, lop, lan = 0., 0., 0.
M = np.linspace(0., 2*np.pi, endpoint=False, num=N_M)
dlop = np.linspace(0., 0.05, num=N_lop)
# orbit dictionaries
args = []
for i in range(dlop.shape[0]):
for j in range(M.shape[0]):
for k in range(M.shape[0]):
args.append( ( {'m':m, 'a':a, 'e':e, 'i':inc, \
'lop':lop, 'lan':lan, 'M':M[j]},
{'m':m, 'a':a, 'e':e, 'i':inc, \
'lop':lop+dlop[i], 'lan':lan, 'M':M[k]} ) )
# fill the pool with orbit jobs
with mp.Pool() as pool:
pool.starmap(two_orbits_one_pool, args)
Could someone explain why this is performing so poorly? I'm much more used to OpenMP and MPI; I'm not that familiar with parallel programming in Python. Overall, I've been quite disappointed in the multiprocessing module. I think maybe I should try using the numba module instead?
EDIT:
In response to Roland Smith's response, I profiled the integration and save time for my code. Here is a stripplot showing the results. As you can see, both Roland Smith's and J_H's suggestions were true. There is a subset of initial conditions that result in extremely long integration times due to close encounters between the bodes. However, in general, the save time was about 5 times longer than the integration time. The job suffers from stragglers and is disk i/o bound.
If there is no discernable speedup, then probably your code is not CPU-bound.
In general, writing to a disk (even an SSD) is much slower than running code on the CPU.
If several worker processes are writing significant amounts of data to disk, that might be the bottleneck.
To diagnose the problem, you have to measure.
You should separate the calculations from the saving of the data; e.g. run sim.integrate() followed by sim.simulationarchive_snapshot() 10 times, and sandwich each of those calls between time.monotonic() calls. Then return the average time of the integration step and the snapshot steps as shown below.
import time
def two_orbits_one_pool(orbit1, orbit2):
#######################################
print('process number', mp.current_process().name)
#######################################
# build simulation
sim = rebound.Simulation()
# add sun
sim.add(m=1.)
# add two overlapping orbits
sim.add(primary=sim.particles[0], m=orbit1['m'], a=orbit1['a'], e=orbit1['e'], inc=orbit1['i'], \
pomega=orbit1['lop'], Omega=orbit1['lan'], M=orbit1['M'])
sim.add(primary=sim.particles[0], m=orbit2['m'], a=orbit2['a'], e=orbit2['e'], inc=orbit2['i'], \
pomega=orbit2['lop'], Omega=orbit2['lan'], M=orbit2['M'])
sim.move_to_com()
# integrate for 10 orbits of orbit1
P = 2.*np.pi * np.sqrt(orbit1['a']**3)
arname = "archive-{}.bin".format(mp.current_process().name)
itime, stime = 0.0, 0.0
for k in range(10):
start = time.monotonic()
sim.integrate(P)
itime += time.monotonic() - start
start = time.monotonic()
sim.simulationarchive_snapshot(arname)
stime += time.monotonic() - start
return (mp.current_process().name, itime/10, stime/10)
# Run the calculations
with mp.Pool() as pool:
data = pool.starmap(two_orbits_one_pool, args)
# Print the times that it took.
for name, itime, stime in data:
print(f"worker {name}: itime {itime} s, stime {stime} s")
That should tell you what the bottleneck is.
Possible solutions if writing to disk is the bottleneck;
Use an SSD to store the simulation results.
Use a RAM-disk to store the simulation results. (Although compared to an SSD not a huge performance boost.)
Check if you can tune your OS for maximum write performance.
Edit1: Given your measurement result, the obvious performance improvement is to save less often.
Another option that might be worth looking at is staggering the writes. That only makes sense if there is significant overlap between the writes from different processes, and if those concurrent writes can saturate the disk I/O subsystem. So you'd have to measure that first.
If there is overlap, create a Lock object in the parent process. Then acquire the lock before (explicitly) saving, and release it after. This won't work with automateSimulationArchive.
A last option is to write your own save function using mmap. Using mmap is somewhat clunky compared to normal file handling in Python. But it can significantly improve performance. However I am unsure that the gains justify the effort in this case.
The straggler effect can have a big impact on such jobs.
straggler effect
Suppose you have N tasks for N cores,
and each task has a different duration.
Order by duration to find min_time and max_time.
All N cores will be busy up through min_time,
but then they go idle, one by one.
Just before max_time, only a single "straggler" core is being used.
predictions
If you can make a decent guess about task duration beforehand,
use that to sort them in descending order.
For T tasks > N cores, schedule the long tasks first.
Then N tasks run for a while, the shortest of those completes,
and the now-idle core picks up a task of "medium" duration.
By the time we get to the T-th task, each core has a random
amount of work still to do, and we're scheduling a "short" task.
So cores are mostly busy doing useful work, right up till near the end.
logging
If you cannot make a useful duration estimate a priori,
at least record the start times and durations.
Use that to figure out whether stragglers are causing you grief,
or if it's something else like L3 cache thrashing.

how to optimize script to have 1 second between spoken words using speech module

the problem is that when i run my script it takes longer than the expected time 1 second before it says the next command. i think this has something to do with the speech command. what can i do to optimize this?
edit: link to the sppech module https://pypi.python.org/pypi/speech/0.5.2
edit2: per request i measured the sleep time only using datetime.
2016-06-29 18:39:42.953000
2016-06-29 18:39:43.954000
i found that it was pretty accurate
edit3: i tried the build in import win32com.client and it didnt work either
import speech
import time
import os
def exercise1():
speech.say("exercise1")
time.sleep(0.5)
for n in range(0, rep*2):
speech.say("1")
t ime.sleep(1)
speech.say("2")
time.sleep(1)
speech.say("3")
time.sleep(1)
speech.say("switch")
Refer the post here How accurate is python's time.sleep()?
It says:
"The accuracy of the time.sleep function depends on the accuracy of
your underlying OS's sleep accuracy. For non-realtime OS's like a
stock Windows the smallest interval you can sleep for is about
10-13ms. I have seen accurate sleeps within several milliseconds of
that time when above the minimum 10-13ms."
As you say in the comments, sleep(1) is fairly accurately 1s.
What you want to do to make each part take 1s, is time the "say" call, and then wait the remaining time to fill out the second. Something like this:
start = time.time()
speech.say("whatever")
end = time.time()
sleep(1 - (end - start)) # Wait however long will bring the time up to 1 second total

time.time() drift over repeated calls

I am getting a timestamp every time a key is pressed like this:
init_timestamp = time.time()
while (True):
c = getch()
offset = time.time() - init_timestamp
print("%s,%s" % (c,offset), file=f)
(getch from this answer).
I am verifying the timestamps against an audio recording of me actually typing the keys. After lining the first timestamp up with the waveform, subsequent timestamps drift slighty but consistently. By this I mean that the saved timestamps are later than the keypress waveforms and get later and later as time goes on.
I am reasonably sure the waveform timing is correct (i.e. the recording is not fast or slow), because in the recording I also included the ticking of a very accurate clock which lines up perfectly with the second markers.
I am aware that there are unavoidable limits to the accuracy of time.time(), but this does not seem to account for what I'm seeing - if it was equally wrong on both sides that would be acceptable, but I do not want it to gradually diverge more and more from the truth.
Why would I be seeing this drifting behaviour and what can I do to avoid it?
Just solved this by using time.monotonic() instead of time.time(). time.time() seems to use gettimeofday (at least here it does) which is apparently really bad for measuring walltime differences because of NTP syncing issues:
gettimeofday() and time() should only be used to get the current time if the current wall-clock time is actually what you want. They should never be used to measure time or schedule an event X time into the future.
You usually aren't running NTP on your wristwatch, so it probably won't jump a second or two (or 15 minutes) in a random direction because it happened to sync up against a proper clock at that point. Good NTP implementations try to not make the time jump like this. They instead make the clock go faster or slower so that it will drift to the correct time. But while it's drifting you either have a clock that's going too fast or too slow. It's not measuring the passage of time properly.
(link). So basically measuring differences between time.time() calls is a bad idea.
Depending on which OS you are using you will either need to use time.time() or time.clock().
For windows OS's you will need to use time.clock this give you will clock seconds as a float. time.time() on windows if I remember correctly time.time() is only accurate within 16ms.
For posix systems (linux, osx) you should be using time.time() this is a float which returns the number of seconds since the epoch.
In your code add the following to make your application a little more cross system compatible.
import os
if os.name == 'posix':
from time import time as get_time
else:
from time import clock as get_time
# now use get_time() to return the timestamp
init_timestamp = get_time()
while (True):
c = getch()
offset = get_time() - init_timestamp
print("%s,%s" % (c,offset), file=f)
...

simplest way to time output of function

I am accessing a web API that seems to mysteriously hang every once in a while. Right now I am using print to do some simple logging. I am not familiar with threads or anything like it, and I'm hoping that there's just a simple way to keep a check on how long it's been since a new print statement was returned and gracefully quit my function if a maximum time interval has been reached. Thanks for any input.
Use the time.time() module to get time in seconds; from doc
'time() -> floating point number\n\nReturn the current time in seconds
since the Epoch.\nFractions of a second may be present if the system
clock provides them.'
Use it in code as,
import time
tic = time.time() #start
while True:
do_big_job()
toc = time.time();
if ( toc - tic > timeout ):
break

Python -- time.sleep() offset by code duration

I have a function that runs a tick() for all players and objects within my game server. I do this by looping through a set every .1 seconds. I need it to be a solid .1. Lots of timing and math depends on this pause being as exact as possible to .1 seconds. To achieve this, I added this to the tick thread:
start_time = time.time()
# loops and code and stuff for tick thread in here...
time_lapsed = time.time() - start_time # get the time it took to run the above code
if 0.1 - time_lapsed > 0:
time.sleep(0.1 - time_lapsed)
else:
print "Server is overloaded!"
# server lag is greater that .1, so don't sleep, and just eat it on this run.
# the goal is to never see this.
My question is, is this the best way to do this? If the duration of my loop is 0.01, then time_lapsed == 0.01 ... and then the sleep should only be for 0.09. I ask, because it doesn't seem to be working. I started getting the overloaded server message the other day, and the server was most definitely not overloaded. Any thoughts on a good way to "dynamically" control the sleep? Maybe there's a different way to run code every tenth of a second without sleeping?
It would be better to base your "timing and math" on the amount of time actually passed since the last tick(). Depending on "very exact" timings will be fragile at the best of times.
Update: what I mean is that your tick() method would take an argument, say "t", of the elapsed time since the last call. Then, to do movement you'd store each thing's position (say in pixels) and velocity (in "pixels/second") so the magnitude of its movement for that call to tick() becomes "velocity * t".
This has the additional benefit of decoupling your physics simulation from the frame-rate.
I see pygame mentioned below: their "pygame.time.Clock.tick()" method is meant to be used this way, as it returns the number of seconds since the last time you called it.
Other Python threads may run in between leaving your thread less time. Also time.time() is subject to system time adjustments; it can be set back.
There is a similar function Clock.tick() in pygame. Its purpose is to limit the maximum frame rate.
To avoid outside influence you could keep an independent frame/turn-based counter to measure the game time.

Categories

Resources