Asyncio is blocking using FastAPI - python

I have a function that make a post request with a lot of treatment. All of that takes 30 seconds.
I need to execute this function every 6 mins. So I used asyncio for that ... But it's not asynchrone my api is blocked since the end of function ... Later I will have treatment that takes 5 minutes to execute.
def update_all():
# do request and treatment (30 secs)
async run_update_all():
while True:
await asyncio.sleep(6 * 60)
update_all()
loop = asyncio.get_event_loop()
loop.create_task(run_update_all())
So, I don't understand why during the execute time of update_all() all requests comming are in pending, waiting for the end of update_all() instead of being asynchronous

I found an answer with the indication of larsks
I did that :
def update_all():
# Do synchrone post request and treatment that take long time
async def launch_async():
loop = asyncio.get_event_loop()
while True:
await asyncio.sleep(120)
loop.run_in_executore(None, update_all)
asyncio.create_task(launch_async())
With that code I'm able to launch a synchrone function every X seconds without blocking the main thread of FastApi :D
I hope that will help other people in the same case than me.

Related

Python create both sync and async function in same main

I need to have two function in my Python 3.11 code.
One function must be sync, it retrive some data from a local machine so i need to wait to finish.
Another function must be async, it get the data from the first function and send to the server. Since i don't know how many time can be (5 seconds to 30 seconds) this function must doesn't interrupt the first one
Pratically, the second function start always when the first finish but the first always start and don't care about the second one. This code run H24
My attempt:
import time
import asyncio
async def task1():
print("Recover data... waiting")
time.sleep(3)
print("End data recover")
return "slow"
async def task2(p):
print("I'm so" + p)
time.sleep(10)
print("END--->")
async def main():
while True:
print("create task1 and wait to finish")
x = await task1()
print("create task2 and not wait to finishing")
asyncio.create_task(task2(x))
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.create_task(main())
loop.run_forever()
I dont' need to use asyncio like a requirement, i just want to meet the goal without run out all the memory of the machine. Thanks
basically, "yes". Like in: that is the way you do it. If you need some task to complete before goind on with code in a given place, the thing to do is to wait for that data - if it is in an async func, then you use the await keyword, just as depicted in the code above.
If there are other tasks in parallel to calling main, then it would be nice if the code in task1 would return execution to the asyncio.loop while it waits for its result - That code could run in another thread, with the use of await asyncio.run_in_executor(sync_call, args) or simply await asyncio.sleep(...)`

Faster way to iterate over dataframe?

I have a dataframe where each row is a record and I need to send each record in the body of a post request. Right now I am looping through the dataframe to accomplish this. I am constrained by the fact that each record must be posted individually. Is there a faster way to accomplish this?
Iterating over the data frame is not the issue here. The issue is you have to wait for the server to response to each of your request. Network request takes eons compared to CPU time need to iterate over the data frame. In other words, your program is I/O bound, not CPU bound.
One way to speed it up is to use coroutines. Let's say you have to make 1000 requests. Instead of firing one request, wait for the response, then fire the next request and so on, you fire 1000 requests at once and tell Python to wait until you have received all 1000 responses.
Since you didn't provide any code, here's a small program to illustrate the point:
import aiohttp
import asyncio
import numpy as np
import time
from typing import List
async def send_single_request(session: aiohttp.ClientSession, url: str):
async with session.get(url) as response:
return await response.json()
async def send_all_requests(urls: List[str]):
async with aiohttp.ClientSession() as session:
# Make 1 coroutine for each request
coroutines = [send_single_request(session, url) for url in urls]
# Wait until all coroutines have finished
return await asyncio.gather(*coroutines)
# We will make 10 requests to httpbin.org. Each request will take at least d
# seconds. If you were to fire them sequentially, they would have taken at least
# delays.sum() seconds to complete.
np.random.seed(42)
delays = np.random.randint(0, 5, 10)
urls = [f"https://httpbin.org/delay/{d}" for d in delays]
# Instead, we will fire all 10 requests at once, then wait until all 10 have
# finished.
t1 = time.time()
result = asyncio.run(send_all_requests(urls))
t2 = time.time()
print(f"Expected time: {delays.sum()} seconds")
print(f"Actual time: {t2 - t1:.2f} seconds")
Output:
Expected time: 28 seconds
Actual time: 4.57 seconds
You have to read up a bit on coroutines and how they work but for the most part, they are not too complicated for your use case. This comes with a couple caveats:
All your requests must be independent from each other.
The rate limit on the server must be sufficient to handle your workload. For example, if it restricts you to 2 requests per minute, there is no way around that other than upgrading to different service tier.

does using asyncio loops inside threads decrease performance

i am creating a system where i have to query a distant server periodically multiple times, about 10000 times a second. it is a bit a lot but it is still experimental and i won the server so no issues with exceeding load or anything.
how i did that is spin up 50 processes and each process spins up about 200 threads with each running a loop over 2 asyncio tasks forever.
the loop looks like this
async def getDataPeriodically(item):
while True:
self.getNewData(item)
await asyncio.sleep(replayInterval)
entriesLoop = asyncio.get_event_loop()
entriesLoop.create_task(getDataPeriodically("X"))
entriesLoop.create_task(getDataPeriodically("Y"))
entriesLoop.run_forever()
the issue i had is that although the replayInterval is set to 0.5 second or 1 second even, self.getNewData wouldn't finish the HTTP request on time . sometimes it finishes 10 seconds after and sometimes even 2 minutes after.
i would like to know if running an asyncio loop inside a thread decreases the efficiency or opposes the concurrency logic of the thread ?
If you can change getNewData(), you do not need the await calls.
Threads can update object attributes directly, so you can pass in a dictionary (or other object) and monitor a specific attribute.
This doesnt answer your question about asyncio, but may help with your overall problem?
....
def getNewData(self, obj):
#Request data
#Once data is received
obj['dataReceived'] = True
....
def getDataPeriodically(item):
obj = {'dataReceived': False}
while True:
self.getNewData(item, obj)
while not obj['dataReceived']: #Wait for getNewData to receive data
pass
#Do whatever with data
obj['dataReceived'] = False #Prep for next HTTP request
thread = threading.Thread(target=getDataPeriodically, args=(item,))
thread.daemon = True
thread.start()

Asyncio how to run something once a day that must complete

I am pretty new with Async.io and I am using it with Discord.py to create a bot. Once a day, I need to update a spreadsheet, but the problem is that the spreadsheet has gotten a little long so it now triggers the loop's default timeout. Is there anyway to overcome this? I have seen run_until_complete but as you see below there is a await asyncio.sleep(86400) which from my understanding will not work with wait until complete because it will wait for a day? I would also be fine with just changing the timeout for that function and then changing it back after it is complete, but I have not been able to find any resources.
Here is the function that needs to repeat everyday:
async def updateSheet():
while True:
print("Updating Sheet at " + datetime.now().strftime("%H:%M"))
user.updateAllUsers(os.getenv('CID'), os.getenv('CS'), subs) #This is the function that takes too long
print("Done Updating")
await asyncio.sleep(86400)
and here is how I am adding it to the loop (because I am using Discord.py):
#client.event
async def on_ready():
print('We have logged in as {0.user}'.format(client))
client.loop.create_task(updateSheet())
Any and all help will be appreciated since as long as this is down my project loses precious time. :)
If something is blocking, the direct method would be trying to convert it to a task, which might not be possible in your case. So we would have to use something like APS to schedule jobs.
sched = Scheduler()
sched.start()
#sched.cron_schedule(day='mon-fri')
def task():
user.updateAllUsers(os.getenv('CID'), os.getenv('CS'), subs)
Make sure you do this in a separate file, and use async scheduler for tasks.
You can simply measure how much time does the function take to execute and simply subtract it from 86400
import time
async def updateSheet():
while True:
start = time.monotonic()
print("Updating Sheet at " + datetime.now().strftime("%H:%M"))
user.updateAllUsers(os.getenv('CID'), os.getenv('CS'), subs) #This is the function that takes too long
end = time.monotonic()
total = end - start
sleep_time = 86400 - total
await asyncio.sleep(sleep_time)
I really suggest you that you run the blocking functions in a non-blocking way, refer to one of my previous answers for more info, (What does "blocking" mean)

Python run multiple background loops independently

In one of my projects, I need to run three different database updater functions at different intervals.
For instance, function one needs to run every 30 seconds, function two needs to run every 60 seconds and function 3 every 5 minutes (notably due to API call restrictions).
I've been trying to achieve this in python, looking up every possible solution but I cannot seem to find anything that works for my use case. I am rather fresh in python.
Here is (somewhat) what I have, using asyncio.
import asyncio
def updater1(url1, url2, time):
print(f"Doing my thing here every {time} seconds")
def updater2(url1, url2, time):
print(f"Doing my thing here every {time} seconds")
def updater3(url, time):
print(f"Doing my thing here every {time} seconds")
async def func1():
updater1(rankUrl, statsUrl, 30)
await asyncio.sleep(30)
async def func2():
updater2(rankUrl, statsUrl, 60)
await asyncio.sleep(60)
async def func3():
updater3(url, 300)
await asyncio.sleep(300)
# Initiate async loops
while True:
asyncio.run(func1())
asyncio.run(func2())
asyncio.run(func3())
The issue is that these tasks run one after each other, while what I am trying to achieve is that they run independently from each other, with a start time when the script is initiated, and respective to their own individual loop times
Any idea on how this could be done is much appreciated - I am open to new concepts and ideas if you have any for me to explore :)
Don't use asyncio.run() on individual coroutines, as async.run() is itself not asynchronous. The call to asyncio.run() won't return until the funcN() coroutine is done.
Create a single top-level coroutine that then runs others as tasks:
async def main():
task1 = asyncio.create_task(func1())
task2 = asyncio.create_task(func2())
task3 = asyncio.create_task(func3())
await asyncio.wait([task1, task2, task3])
The above kicks off three independent tasks, then waits for all 3 to complete.

Categories

Resources