I have a telegram bot in aiogram and I need to run few tasks of checking google sheets in background. Here is my current code:
async def on_startup(_):
asyncio.create_task(loop1())
asyncio.create_task(loop2())
asyncio.create_task(loop3())
if __name__ == '__main__':
executor.start_polling(dp, skip_updates=True, on_startup=on_startup)
Code in each function looks like this:
async def loop1():
while True:
# Iterating over data
time.sleep(PAUSE)
Code runs in single thread VPS so eventually loops "pausing" and waiting for each other and don't run in parallel. Is there a better way to run this background tasks?
Related
I have one program that collects data from a websocket, processes the data and if some conditions apply I want to call another function that does something with the data.
This is easy enough, but I want the program that collects the data from the websocket to keep running.
I have 'fixed' this quite ugly by writing the data in a database and letting the second program check the database every few seconds. But I don't want to use this solution, since I occasionally get database is locked errors.
Is there a way to start program B from program A while program A keeps running?
I have looked at multi threading and multi processing, and I feel this could be a way to solve it, but while I grasp the basic of that, it is still a bit too difficult for me to use.
Is there an easier way? and if not should I study multi threading or multi processing more?
(or if anyone knows a good guide/video, that would be great too!)
I suggest launching a worker thread, waiting for data to process. Main thread listen to websocket, and send data to worker through pipe.
The logic of worker is:
while True:
data = peek_data_or_sleep(pipe)
process_data(data)
This way you won't get thousands of workers when incoming traffic is high.
So the key point is how to send data to worker, usually a pipe or message queue.
I've used Celery with RabbitMQ as message queue. Send data to Celery from Django server, and Celery call your function from another process.
Here is an example assuming you are using asyncio for WebSockets:
import asyncio
from time import sleep
async def web_socket(queue: asyncio.Queue):
for i in range(5):
await asyncio.sleep(1.0)
await queue.put(f"Here is message n°{i}!")
await queue.put(None)
def expensive_work(message: str):
sleep(0.5)
print(message)
async def worker(queue: asyncio.Queue):
while True:
message = await queue.get()
if message is None: break
await asyncio.to_thread(expensive_work, message)
async def main():
queue = asyncio.Queue()
await asyncio.gather(
web_socket(queue),
worker(queue)
)
if __name__ == "__main__":
asyncio.run(main())
The web_socket() function simulates a websocket listener which receives messages. For each received message, it put it in a queue that will be shared with another task running concurrently and processing the message.
The expensive_work() function simulates the processing task to apply to each message.
The worker() function will be running concurrently to the websocket listener. It reads values from the shared queue and process them. If the processing is really expensive (for instance a CPU-bound task) consider running it in a ProcessPoolExecutor (see here how to do that) to avoid blocking the event loop.
Finally, the main() function creates the shared queue, launches the two tasks concurrently with asyncio.gather() and then awaits the completion of both tasks.
If you are using threads and blocking IO, the solution is essentially similar but using threading Threads and queue.Queue. Beware not to mix multithreading and asyncio concurrency, or search on how to do it properly.
Basically I'm using pyppeteer to connect to an existing browser connection which requires me to periodically time.sleep() the thread in order for the browser to behave normally (using asyncio.sleep() still causes dynamic HTML websites to behave funnily, I suspect it's to do with the underlying javascript detecting a puppeteer connection to the browser, something time.sleep() seems to block by (if I had to guess) temporarily pausing this connection)
What I need to be able to do is pause the part of the python telegram script thats connecting to the webpages similar to how time.sleep() does but without pausing all the other things the python telegram bot script is doing.
I suspect I could do this by disconnecting from the browser connection and reconnecting but I suspect this would mess up the ordering of the current active pages (just from working with pyppeteer for a while it seems to be incapable of ordering webpages identically between browser connections, especially if the webpage titles are identical) and cause other errors when it comes to my code.
So to the actual question, can I pause parts of an asyncio event loop in a method which is functionally identical to a time.sleep() but isn't asyncio.sleep() as this doesn't seem to work, probably as it switches from doing the current task to maintaining the background threads which are dealing with the browser connection.
The reason python telegram bot is involved is that my code works by triggering the pyppeteer code from telegram using a command however whilst the thread is sleeping using time.sleep the bot is unable to respond to telegram commands due to the entire script being paused.
can I pause parts of an asyncio event loop
If I understood your problem correctly, you should create multiple tasks and then apply asyncio.sleep where you want to pause that task, this will not affect other tasks. e.g.
async def task1():
for i in range(5):
print("task1: pausing task for 1 sec")
await asyncio.sleep(1)
print("task1: resumed again.")
print("task1: doing some work ...")
async def task2():
print("task2: working on ...")
await asyncio.sleep(10)
print("task2: work finished")
async def main():
t1 = asyncio.create_task(task1())
t2 = asyncio.create_task(task2())
print("doing some other tasks in main coro")
await asyncio.sleep(5)
print("main coro tasks done. waiting for t1, t2 to finish")
await asyncio.wait({t1,t2})
asyncio.run(main())
I could do this by disconnecting from the browser connection and reconnecting but I suspect this would mess up the ordering of the current active pages
If you think there would be no auto-reload on any page, then before disconnecting you can set some kind of attributes for you and when you connect back you can identify each page by its attr, like you can change the page titles, to use it later when get connected back.
I want to do the following:
while True:
if condition():
perform_task_in_background();
perform_other_task_without_interruption();
What's the best way to do this with asyncio? In particular, I want perform_other_task_without_interruption() to run without interruption when perform_task_in_background() is running in the background.
You can run the task in a separate thread using run_in_executor:
from concurrent.futures import ThreadPoolExecutor
executor = ThreadPoolExecutor(max_workers=2)
loop = asyncio.get_running_loop()
loop.run_in_executor(executor, perform_task_in_background)
This example runs the task in threads, but you can also run it in a separate process (i.e. dedicated cpu core) if performance is crucial. See ProcessPoolExecutor for that.
I'm trying understand if asyncio is a necessary part of the Python definition of coroutines or simply a convenience package.
Can I run this program without asyncio?
import time
async def clk():
time.sleep(0.1)
async def process():
for _ in range(2):
await clk();
time.sleep(0.2)
print("I am DONE waiting!")
def run():
await process()
if __name__ == "__main__":
run()
I get the error that run() is not defined with async, which I get, but there seems to be an infinite regress to the top. Interestingly, this code runs (without the run() function) in Jupyter notebook. I just type await process.
To run async functions, you need to provide an event loop. One of the main functionalities of asyncio is to provide such a loop: when you execute asyncio.run(process) it provides a loop internally.
The reason why this code works in a notebook is that Jupyter (as well as the ipython REPL) provides a loop under the hood; there are other third-party libraries that provide a loop, such as trio and curio.
That being said, you can freely provide your own loop instead of using a library, as demonstrated in this answer. But in practice there is no point in doing that as asyncio is part of the Python standard library.
I started learning python asyncio module. I was writing a basic program which basically waits for some time an prints the result.
async def say_helloworld(wait:int)->None:
print(f"starting say_helloworld({wait})")
await asyncio.sleep(wait)
print(f"say_helloworld({wait}) says hello world!")
I wrote my coroutine as given above and created some tasks.
tasks = (asyncio.create_task(say_helloworld(i)) for i in range(10))
I then wrapped the tasks in another coroutine.
async def main(tasks):
for task in tasks:
await task
Finally, I ran the wrapper coroutine
asyncio.run(main(tasks))
My expectation was that the program will finish in around 10 seconds, since each task will wait concurrently, and not in a synchronous fashion. But, the program nearly took 45 seconds (synchronous?).
What am I missing? Why is it running like a synchronous program?
I wrote my coroutine as given above and created some tasks.
The problem is that your code, as written, doesn't actually create tasks in advance, it just provides a recipe how to create them when needed. tasks is initialized to:
tasks = (asyncio.create_task(say_helloworld(i)) for i in range(10))
The above is a generator expression that will only create tasks as you iterate over it. So, while the intention was to create tasks in advance and then await them while they run in parallel, the actual implementation creates a single task in each iteration, and then awaits it immediately. This of leads to the undesirable result of the tasks being executed sequentially.
The fix is to build an actual list by switching from (asyncio.create_task... ) to [asyncio.create_task... ]. You will also need to do so inside a coroutine so there is an event loop running for the tasks. For example:
# ... say_helloworld defined as before ...
async def main():
tasks = [asyncio.create_task(say_helloworld(i)) for i in range(10)]
for task in tasks:
await task
asyncio.run(main())
This results in ten "starting ..." messages followed a single pause and ten "says hello..." messages, which was the initial goal.