I am building a device based on raspberry PI. It will have several concurrent functions that should work simultaneously. In this case using asyncio looks like a reasonable choice (well, I can write all this stuff in C++ with threads, but python code looks much more compact)
One of the functions is to drive a stepper motor via GPIO pulses. These pulses should be 5-10 microseconds long. Is there a way to get asleep for a sub-milliseconds intervals with asyncio sleep?
Is there a way to get asleep for a sub-milliseconds intervals with asyncio sleep?
On Linux asyncio uses the epoll_wait system call, which specifies the timeout in milliseconds, so anything sub-millisecond won't work, despite being able to specify it in asyncio.sleep().
You can test it on your machine by running the following program:
import asyncio, os
SLEEP_DURATION = 5e-3 # 5 ms sleep
async def main():
while True:
# suspend execution
await asyncio.sleep(SLEEP_DURATION)
# execute a syscall visible in strace output
os.stat('/tmp')
asyncio.run(main())
Save the program e.g. as sleep1.py and run it under strace, like this:
$ strace -fo trc -T python3.7 sleep1.py
<wait a second or two, then press Ctrl-C to interrupt>
The trc file will contain reasonably precise timings of what goes on under the hood. After the Python startup sequence, the program basically does the following in an infinite loop:
24015 getpid() = 24015 <0.000010>
24015 epoll_wait(3, [], 1, 5) = 0 <0.005071>
24015 epoll_wait(3, [], 1, 0) = 0 <0.000010>
24015 stat("/tmp", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=45056, ...}) = 0 <0.000014>
We see a call to getpid(), two calls to epoll_wait, and finally the call to stat. The first epoll_wait is actually relevant, it specifies the timeout in milliseconds, and sleeps for approximately the desired period. If we lower the sleep duration to sub-milliseconds, e.g. 100e-6, strace shows that asyncio still requests a 1ms timeout from epoll_wait, and gets as much. The same happens with timeouts down to 15 us. If you specify a 14 us or smaller timeout, asyncio actually requests a no-timeout poll, and epoll_wait completes in 8 us. However, the second epoll_wait also takes 8 us, so you can't really count on microsecond resolution in any form of shape.
Even if you use threads and busy-looping, you are likely to encounter synchronization issues with the GIL. This should likely done in a lower-level language such as C++ or Rust, and even so you'll need to be careful about the OS scheduler.
Related
[Update]
I'm using Twisted 22.4.0 and Python 3.9.6
I'm trying to write an asynchronous application that must run an event loop at 250Hz. So far, Twisted is simply not fast enough to work for my application (but I would like to know if it's possible to fix this). On a Windows 10 i5 laptop, the highest frequency I can achieve in a LoopingCall is around 50hz. When I adjust the following code runs ok at 50hz and successfully prints out "took 1.002 sec", but at 100Hz, the code takes typically 1.5seconds to run, and I need my code to be able to run at .004 (250Hz).
from twisted.internet.task import LoopingCall
from twisted.internet import reactor
import time
class Loop():
def __init__(self, hz):
self.hz = hz
self.lc = LoopingCall(self.fast_task)
self.num_calls = 0
self.lc.start(1/hz)
reactor.run() # **Forgot to add this the first time**
def fast_task(self):
if self.num_calls == 0:
self.start_time = time.time()
if self.num_calls == self.hz:
print("Stopping reactor...")
print(f"took: {time.time() - self.start_time} sec")
reactor.stop()
return
self.num_calls += 1
if __name__ == "__main__":
l = Loop(100)
The above code typically takes ~1.5s to run.
My question:
Is there any way to speed this event loop up in Twisted on Windows?
I've run some similar code in asyncio and asyncio can definitely handle a 250Hz loop on my laptop. So one of the next things I tried was using the asyncioreactor with Twisted. Turns out, it still takes the same amount of time as the above code which doesn't use asyncioreactor.
But, I like the simplicity of Twisted for my use case - I need a few TCP servers and clients and a few UDP servers and clients plus some other heavy I/O processing.
One other note, I did find this ticket (https://twistedmatrix.com/trac/ticket/2424) for Twisted in which I found out [Edit](that the author of Twisted - Glyph - chose not to move to a monotonic based time unless there was an API change, which to my knowledge hasn't been implemented except maybe when used with asyncioreactor?). This also gives me other concerns about using Twisted as a reliable, high frequency event loop, such as NTP clock adjustments. Now, it may be that using asyncio under the hood (with asyncioreactor) takes care of this problem, but it certainly doesn't seem to offer any speed advantage.
[Update 2] This may have fixed my problem:
I adjusted the windows sleep resolution with the following code, and now my LoopingCall seems to run the above code reliably in 1 sec at 250Hz, and reliably up to 1000Hz:
from ctypes import windll
windll.winmm.timeBeginPeriod(1) # This sets the time sleep resolution to 1 ms
[Update 3]
I've included the code I used to create the loop with aysncioreactor.
Note: you'll notice that I'm using WindowsSelectorEventLoopPolicy() - this is due to not having the latest Visual C++ libraries installed (not sure if that's important info here, though)
Note 2: I'm new to twisted, so I could be using this incorrectly (the usage of asyncioreactor, or the actual LoopingCall - although the LoopingCall seems pretty straightforward)
Note 3:
I'm running on Windows 10 v21H2, Processor: 1.6GHz i5
The v21H2 is important here since it's after v2004:
From: https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod
Prior to Windows 10, version 2004, this function affects a global
Windows setting. For all processes Windows uses the lowest value (that
is, highest resolution) requested by any process. Starting with
Windows 10, version 2004, this function no longer affects global timer
resolution. For processes which call this function, Windows uses the
lowest value (that is, highest resolution) requested by any process.
For processes which have not called this function, Windows does not
guarantee a higher resolution than the default system resolution.
To see if I could prove this out, I've tried running Windows Media Player, Skype, and other programs while not calling timeBeginPeriod(1) (the thought being that another program by another process would set a lower resolution and that would affect my program. But this didn't change the timings you see below.
Note 4:
Timings for a 3 second run (3 runs each) # 1000Hz:
asyncioreactor with timeBeginPeriod(1): [3.019, 3.029, 3.009]
asyncioreactor with no timeBeginPeriod(1): [42.859, 43.65, 43.152]
no asyncioreactor with timeBeginPeriod(1): [3.012, 3.519, 3.146]
no asyncioreactor, no timeBeginPeriod(1): [45.247, 44.957, 45.325]
My implementation using asyncioreactor
import asyncio
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
from twisted.internet import asyncioreactor
asyncioreactor.install()
from twisted.internet.task import LoopingCall
from twisted.internet import reactor
import time
from ctypes import windll
windll.winmm.timeBeginPeriod(1)
class Loop():
def __init__(self, hz=1000):
self.hz = hz
...
...
To fix my problem:
I adjusted the windows sleep resolution with the following code, and now my LoopingCall seems to run the above code reliably in 1 sec at 250Hz, and reliably up to 1000Hz:
from ctypes import windll
windll.winmm.timeBeginPeriod(1) # This sets the time sleep resolution to 1 ms
Fairly new to Python; working on a Raspberry Pi 4 with Python 3.4.3.
Got a code working to listen for 2 distinct alarms in my lab - one for a -80 freezer getting too warm, and the other for a -20 freezer. Code listens on a microphone, streams data, Fourier-transforms it, detects the peaks I'm interested in, and triggers events when they're found - eventually going to email me and my team if an alarm goes off, but still just testing with Print commands atm. Let's call them Alarm A/EventA and Alarm B/Event B.
I want it to trigger Event A when Alarm A is detected, but then wait 1 hour before triggering Event A again (if Alarm A is still going off/goes off again in an hour).
Meanwhile, though, I also want it to continue listening for Alarm B and trigger Event B if detected - again, only once per hour.
Since I can't just do time.sleep, then, I'm trying to do it with Threads - but am having trouble starting, stopping, and restarting a thread for the 1 hour (currently just 10 second for testing purposes) delay.
I have variables CounterA and CounterB set to 0 to start. When Alarm A is detected I have the program execute EventA and up CounterA to 1; ditto for AlarmB/EventB/CounterB. EventA and EventB are only triggered if CounterA and CounterB are <1.
I'm having a real hard time resetting the counters after a time delay, though. Either I end up stalling the whole program after an event is triggered, or I get the error that threads can only be started once.
Here are the relevant sections of the code:
import time
import threading
CounterA = 0
CounterB = 0
def Aresetter():
time.sleep(10)
global CounterA
CounterA=CounterA-1
thA.join()
def Bresetter():
time.sleep(10)
global CounterB
CounterB=CounterB-1
thB.join()
thA = threading.Thread(target = Aresetter)
thB = threading.Thread(target = Bresetter)
if any(#Alarm A detection) and CounterA<1:
print('Alarm A!')
CounterA=CounterA+1
thA.start()
elif any(#Alarm B detection) and CounterB<1:
print('Alarm B!')
CounterB=CounterB+1
thB.start()
else:
pass
I think the crux of my problem is I can't have the resetter functions join the threads to main once they're finished with their delayed maths - but I also don't know how to do that in the main program without making it wait for the same amount of time and thus stalling everything...
You don't need threads for this at all.
Just keep track of the last time (time.time()) you triggered each alarm, and don't trigger them if less than 60 minutes (or whatever the threshold is) has elapsed since the last time.
Something like (semi pseudocode)...
import time
last_alarm_1 = 0 # a long time ago, so alarm can trigger immediately
# ...
if alarm_1_cond_met():
now = time.time()
if now - last_alarm_1 > 60 * 60: # seconds
send_alarm_1_mail()
last_alarm_1 = now
Repeat for alarm 2 :)
AKX has a better solution to your problem, but you should be aware of what this does when Aresetter() is called by the thA thread:
def Aresetter():
...
thA.join()
The thA.join() method doesn't do anything to the thA thread. All it does is, it waits for the thread to die, and then it returns. But, if it's the thA thread waiting for itself to die, it's going to be waiting for a very long time.
Also, there's this:
How to...restart a thread?
You can't. I don't want to explore why it makes any sense, but you just can't do that. It's not how threads work. If you want your program to do the same task more than one time "in another thread," you have a couple of options:
Create a new thread to do the task each time.
Create a single thread that does the same task again and again, possibly sleep()ing in between, or possibly awaiting some message/signal/trigger before each repetition.
Submit a task to a thread pool* each time you want the thing to be done.
Option (2) could be better than option (1) because creating and destroying threads is a lot of work. With option (2) you're only doing that once.
Option (1) could be better than option (2) because threads use a significant amount of memory. If the thread doesn't exist when it's not needed, then that memory could be used by something else.
Option (3) could be better than the both of them if the same thread pool is also used for other purposes in your program. The marginal cost of throwing a few more tasks at an already-existing thread pool is trivial.
* I don't know that Python has a ready-made, first-class ThreadPool class for you to use. It has this, https://stackoverflow.com/a/64373926/801894 , but I've never used it. It's not that hard though to create your own simple thread pool.
In Python, I am making a cube game (like Minecraft pre-classic) that renders chunk by chunk (16x16 blocks). It only renders blocks that are not exposed (not covered on all sides). Even though this method is fast when I have little height (like 16x16x2, which is 512 blocks in total), once I make the terrain higher (like 16x16x64, which is 16384 blocks in total), rendering each chunk takes roughly 0.03 seconds, meaning that when I render multiple chunks at once the game freezes for about a quarter of a second. I want to render the chunks "asynchronously", meaning that the program will keep on drawing frames and calling the chunk render function multiple times, no matter how long it takes. Let me show you some pictures to help:
I tried to make another program in order to test it:
import threading
def run():
n=1
for i in range(10000000):
n += 1
print(n)
print("Start")
threading.Thread(target=run()).start()
print("End")
I know that creating such a lot of threads is not the best solution, but nothing else worked.
Threading, however, didn't work, as this is what the output looked like:
>>> Start
>>> 10000001
>>> End
It also took about a quarter of a second to complete, which is about how long the multiple chunk rendering takes.
Then I tried to use async:
import asyncio
async def run():
n = 1
for i in range(10000000):
n += 1
print(n)
print("Start")
asyncio.run(run())
print("End")
It did the exact same thing.
My questions are:
Can I run a function without stopping/pausing the program execution until it's complete?
Did I use the above correctly?
Yes. No. The answer is complicated.
First, your example has at least one error on it:
print("Start")
threading.Thread(target=run).start() #notice the missing parenthesis after run
print("End")
You can use multithreading for your game of course, but it can come at a disadvantage of code complexity because of synchronization and you might not gain any performance because of GIL.
asyncio is probably not for this job either, since you don't need to highly parallelize many tasks and it has the same problems with GIL as multithreading.
The usual solution for this kind of problem is to divide your work into small batches and only process the next batch if you have time to do so on the same frame, kind of like so:
def runBatch(range):
for x in range:
print(x)
batches = [range (x, x+200) for x in range(0, 10000, 200)]
while (true): # main loop
while (timeToNextFrame() > 15):
runBatch(batch.pop())
renderFrame() #or whatever
However, in this instance, optimizing the algorithm itself could be even better than any other option. One thing that Minecraft does is it subdivides chunks into subchunks (you can mostly ignore subchunks that are full of blocks). Another is that it only considers the visible surfaces of the blocks (renders only those sides of the block that could be visible, not the whole block).
asyncio only works asynchronously only when your function is waiting on I/O task like network call or wait on disk I/O etc.
for non I/O tasks to execute asynchronously multi-threading is the only option so create all your threads and wait for the threads to complete their tasks using thread join method
from threading import Thread
import time
def draw_pixels(arg):
time.sleep(arg)
print(arg)
threads = []
args = [1,2,3,4,5]
for arg in args:
t = Thread(target=draw_pixels, args=(arg, ))
t.start()
threads.append(t)
# join all threads
for t in threads:
t.join()
the following is:
python sched:
from time import time, sleep
from sched import scheduler
def daemon(local_handler):
print 'hi'
local_handler.enter(3, 1, daemon, (local_handler,))
if __name__ == '__main__':
handler = scheduler(time, sleep)
handler.enter(0, 1, daemon, (handler,))
handler.run()
python loop + sleep:
from time import sleep
while True:
print 'hello'
sleep(3)
What is the difference between sched and loop+sleep, and sched will stop when the system time is changed?
A big difference is that the delay between multiple tasks is calculated as necessary. That means your loop will take:
time it needs to print("hello") or do the task that you need to do
time it takes to sleep(3)
while if you change the order in your scheduler to:
local_handler.enter(3, 1, daemon, (local_handler,))
do_the_task
your next task will be run either after 3 seconds, or immediately after do_the_task if it took longer than 3 seconds.
So the decision really comes down to: do you want your task executed every X time units, or with X time units space between executions.
Assuming you're using the typical (time, sleep) parameters, if the system time is changed, you'll get the next task run after the expected amount of time (sleep takes care of this, unless some signals were received in the meantime), but your next scheduled task time will be shifted. I believe that the next execution time will not be what you'd normally expect.
The difference between the two is that scheduler is more pythonic than loop + sleep for two reasons: elegance and modularity.
Long loops easily become difficult to read and require a lot more code to be written within. However, with a scheduler, a specific function can be called on a delay, containing all of the code within. This makes code much more readable and allows for moving code into classes and modules to be called within the main loop.
Python knows what the current time is by checking the local system. If the local system's time is changed, then that will affect a currently running program or script.
Becaused the python sched is use system time for next iteration.
The sleep is use cpu time clock for next iteration.
Running Python 2.6 and 2.7 on Windows 7 and Server 2012
Event::wait causes a delay when used with a timeout that is not triggered because event is set in time. I don't understand why.
Can someone explain?
The following program shows this and gives a possible explanation;
'''Shows that using a timeout in Event::wait (same for Queue::wait) causes a
delay. This is perhaps caused by a polling loop inside the wait implementation.
This polling loop sleeps some time depending on the timeout.
Probably wait timeout > 1ms => sleep = 1ms
A wait with timeout can take at least this sleep time even though the event is
set or queue filled much faster.'''
import threading
event1 = threading.Event()
event2 = threading.Event()
def receiver():
'''wait 4 event2, clear event2 and set event1.'''
while True:
event2.wait()
event2.clear()
event1.set()
receiver_thread = threading.Thread(target = receiver)
receiver_thread.start()
def do_transaction(timeout):
'''Performs a transaction; clear event1, set event2 and wait for thread to set event1.'''
event1.clear()
event2.set()
event1.wait(timeout = timeout)
while True:
# With timeout None this runs fast and CPU bound.
# With timeout set to some value this runs slow and not CPU bound.
do_transaction(timeout = 10.0)
Looking at the source code for wait() method of the threading.Condition class, there are two very different code paths. Without a timeout, we just wait on a lock forever, and when we get the lock, we return immediately.
However, with a timeout you cannot simply wait on the lock forever, and the low-level lock provides no timeout implementation. So the code sleeps for exponentially longer periods of time, after each sleep checking if the lock can be acquired. The relevant comment from the code:
# Balancing act: We can't afford a pure busy loop, so we
# have to sleep; but if we sleep the whole timeout time,
# we'll be unresponsive. The scheme here sleeps very
# little at first, longer as time goes on, but never longer
# than 20 times per second (or the timeout time remaining).
So in an average scenario where the condition/event cannot is not notified within a short period of time, you will see a 25ms delay (a random incoming event will arrive on average with half the max sleep time of 50ms left before the sleep ends).