does using asyncio loops inside threads decrease performance - python

i am creating a system where i have to query a distant server periodically multiple times, about 10000 times a second. it is a bit a lot but it is still experimental and i won the server so no issues with exceeding load or anything.
how i did that is spin up 50 processes and each process spins up about 200 threads with each running a loop over 2 asyncio tasks forever.
the loop looks like this
async def getDataPeriodically(item):
while True:
self.getNewData(item)
await asyncio.sleep(replayInterval)
entriesLoop = asyncio.get_event_loop()
entriesLoop.create_task(getDataPeriodically("X"))
entriesLoop.create_task(getDataPeriodically("Y"))
entriesLoop.run_forever()
the issue i had is that although the replayInterval is set to 0.5 second or 1 second even, self.getNewData wouldn't finish the HTTP request on time . sometimes it finishes 10 seconds after and sometimes even 2 minutes after.
i would like to know if running an asyncio loop inside a thread decreases the efficiency or opposes the concurrency logic of the thread ?

If you can change getNewData(), you do not need the await calls.
Threads can update object attributes directly, so you can pass in a dictionary (or other object) and monitor a specific attribute.
This doesnt answer your question about asyncio, but may help with your overall problem?
....
def getNewData(self, obj):
#Request data
#Once data is received
obj['dataReceived'] = True
....
def getDataPeriodically(item):
obj = {'dataReceived': False}
while True:
self.getNewData(item, obj)
while not obj['dataReceived']: #Wait for getNewData to receive data
pass
#Do whatever with data
obj['dataReceived'] = False #Prep for next HTTP request
thread = threading.Thread(target=getDataPeriodically, args=(item,))
thread.daemon = True
thread.start()

Related

Asyncio is blocking using FastAPI

I have a function that make a post request with a lot of treatment. All of that takes 30 seconds.
I need to execute this function every 6 mins. So I used asyncio for that ... But it's not asynchrone my api is blocked since the end of function ... Later I will have treatment that takes 5 minutes to execute.
def update_all():
# do request and treatment (30 secs)
async run_update_all():
while True:
await asyncio.sleep(6 * 60)
update_all()
loop = asyncio.get_event_loop()
loop.create_task(run_update_all())
So, I don't understand why during the execute time of update_all() all requests comming are in pending, waiting for the end of update_all() instead of being asynchronous
I found an answer with the indication of larsks
I did that :
def update_all():
# Do synchrone post request and treatment that take long time
async def launch_async():
loop = asyncio.get_event_loop()
while True:
await asyncio.sleep(120)
loop.run_in_executore(None, update_all)
asyncio.create_task(launch_async())
With that code I'm able to launch a synchrone function every X seconds without blocking the main thread of FastApi :D
I hope that will help other people in the same case than me.

Why I can't get the output of "I get the source"

I am trying to build a server to handle long time task job, so I made a global variable tasks so that request can easily returned by only put task info into tasks, and I using threading to build a function to handle the long time task job.
however I can't receive the tasks change in test(), why had this happen?
import time
import threading
from collections import OrderedDict
tasks = OrderedDict()
def request():
# network gross
# ...
global tasks
tasks['zdx'] = 2
def test():
print('test runing')
while True:
if tasks:
task = tasks.popitem()
print('I get the source!')
# very long time resolve task
time.sleep(1)
def init():
threading.Thread(target=test, daemon=True).start()
init()
time.sleep(3)
request()
You may want to review what daemon=True does for a thread. Effectively, right as you called request() to put an entry into tasks, your program exits and the thread gets terminated (as it has daemon=True set) before it finished sleeping and never got a chance to find out if anything is in tasks, thus it never got a chance to run. To correct for this, putting in a time.sleep(3) after request() at the end will ensure more than enough time for the loop in the thread to finish sleeping and process the check.

Best way to respond quickly to a variable that is changed by another thread

I have written a program to consume data that I am sending to it via UDP packets every 10 milliseconds. I am using separate threads because it can take a variable amount of time to run the logic to process the data and if more than 10ms have elapsed I just want it to process the most recently received datagram. I am currently running a while loop and checking every millisecond for a new quote via time.sleep(0.001). I just learned that this time.sleep() is actually taking up to 16 milliseconds to process on a windows server 2019 operating system and it is delaying everything. I could just put pass instead of time.sleep but this ends up using too much CPU (I am running multiple instances of the program). Is there a way I can have the program pause and just wait for maindata.newquote == True before proceeding? The trick is I would like it to respond very quickly (in less than a millisecond) rather than waiting for the next windows timer interrupt.
class maindata:
newquote = False
quote = ''
def startquotesUDP(maindata,myaddress,port):
UDPServerSocket = socket(family=AF_INET, type=SOCK_DGRAM)
UDPServerSocket.bind((myaddress, port))
while True:
bytesAddressPair = UDPServerSocket.recvfrom(bufferSize)
#parse raw data
maindata.quote = parsed_data
maindata.newquote = True
threading.Thread(target=startquotesUDP,args=(maindata,address,port,)).start()
while True:
if maindata.newquote == False:
time.sleep(0.001) #This is what I want to improve
else:
#process maindata.quote
maindata.newquote = False
My answer is the same as #balmy above, but I wouldn't even bother with a semaphore.
The producer just writes to a queue
while True:
result = ...
queue.put(result)
sleep as necessary
The receiver can receive every result by doing
while True:
result = queue.get()
handle result
If you prefer only to see the most recent result sent by the producer, in case it has send multiple results since the last time you looked, then:
while True:
result = queue.get()
while not queue.empty():
result = queue.get()
handle result

multithreading spawn new process when worker has finished

I would like to define a pool of n workers and have each execute tasks held in a rabbitmq queue. When this task finished (fails or succeeds) I want the worker execute another task from the queue.
I can see in docs how to spawn a pool of workers and have them all wait for their siblings to complete. I would something like different though: I would like to have a buffer of n tasks where when one worker finishes it adds another tasks to the buffer (so no more than n tasks are in the bugger). Im having difficulty searching for this in docs.
For context, my non-multithreading code is this:
while True:
message = get_frame_from_queue() # get message from rabbit mq
do_task(message.body) #body defines urls to download file
acknowledge_complete(message) # tell rabbitmq the message is acknowledged
At this stage my "multithreading" implementation will look like this:
#recieves('ask_for_a_job')
def get_a_task():
# this function is executed when `ask_for_a_job` signal is fired
message = get_frame_from_queue()
do_task(message)
def do_tasks(task_info):
try:
# do stuff
finally:
# once the "worker" has finished start another.
fire_fignal('ask_for_a_job')
# start the "workers"
for i in range(5):
fire_fignal('ask_for_a_job')
I don't want to reinvent the wheel. Is there a more built in way to achieve this?
Note get_frame_from_queue is not thread safe.
You should be able to have each subprocess/thread consume directly from the queue, and then within each thread, simply process from the queue exactly as you would synchronously.
from threading import Thread
def do_task(msg):
# Do stuff here
def consume():
while True:
message = get_frame_from_queue()
do_task(message.body)
acknowledge_complete(message)
if __name __ == "__main__":
threads = []
for i in range(5):
t = Thread(target=consume)
t.start()
threads.append(t)
This way, you'll always have N messages from the queue being processed simultaneously, without any need for signaling to occur between threads.
The only "gotcha" here is the thread-safety of the rabbitmq library you're using. Depending on how it's implemented, you may need a separate connection per thread, or possibly one connection with a channel per thread, etc.
One solution is to leverage the multiprocessing.Pool object. Use an outer loop to get N items from RabbitMQ. Feed the items to the Pool, waiting until the entire batch is done. Then loop through the batch, acknowledging each message. Lastly continue the outer loop.
source
import multiprocessing
def worker(word):
return bool(word=='whiskey')
messages = ['syrup', 'whiskey', 'bitters']
BATCHSIZE = 2
pool = multiprocessing.Pool(BATCHSIZE)
while messages:
# take first few messages, one per worker
batch,messages = messages[:BATCHSIZE],messages[BATCHSIZE:]
print 'BATCH:',
for res in pool.imap_unordered(worker, batch):
print res,
print
# TODO: acknowledge msgs in 'batch'
output
BATCH: False True
BATCH: False

How do I feed an infinite generator to eventlet (or gevent)?

The docs of both eventlet and gevent have several examples on how to asyncronously spawn IO tasks and get the results latter.
But so far, all the examples where a value should be returned from the async call,I allways find a blocking call after all the calls to spawn(). Either join(), joinall(), wait(), waitall().
This assumes that calling the functions that use IO is immediate and we can jump right into the point where we are waiting for the results.
But in my case I want to get the jobs from a generator that can be slow and or arbitrarily large or even infinite.
I obviously can't do this
pile = eventlet.GreenPile(pool)
for url in mybiggenerator():
pile.spawn(fetch_title, url)
titles = '\n'.join(pile)
because mybiggenerator() can take a long time before it is exhausted. So I have to start consuming the results while I am still spawning async calls.
This is probably usually done with resource to queues, but I'm not really sure how. Say I create a queue to hold jobs, push a bunch of jobs from a greenlet called P and pop them from another greenlet C.
When in C, if I find that the queue is empty, how do I know if P has pushed every job it had to push or if it is just in the middle of an iteration?
Alternativey,Eventlet allows me to loop through a pile to get the return values, but can I start doing this without having spawn all the jobs I have to spawn? How? This would be a simpler alternative.
You don't need any pool or pile by default. They're just convenient wrappers to implement a particular strategy. First you should get idea how exactly your code must work under all circumstances, that is: when and why you start another greenthread, when and why wait for something.
When you have some answers to these questions and doubt in others, ask away. In the meanwhile, here's a prototype that processes infinite "generator" (actually a queue).
queue = eventlet.queue.Queue(10000)
wait = eventlet.semaphore.CappedSemaphore(1000)
def fetch(url):
# httplib2.Http().request
# or requests.get
# or urllib.urlopen
# or whatever API you like
return response
def crawl(url):
with wait:
response = fetch(url)
links = parse(response)
for url in link:
queue.put(url)
def spawn_crawl_next():
try:
url = queue.get(block=False)
except eventlet.queue.Empty:
return False
# use another CappedSemaphore here to limit number of outstanding connections
eventlet.spawn(crawl, url)
return True
def crawler():
while True:
if spawn_crawl_next():
continue
while wait.balance != 0:
eventlet.sleep(1)
# if last spawned `crawl` enqueued more links -- process them
if not spawn_crawl_next():
break
def main():
queue.put('http://initial-url')
crawler()
Re: "concurrent.futures from Python3 does not really apply to "eventlet or gevent" part."
In fact, eventlet can be combined to deploy the concurrent.futures ThreadPoolExecutor as a GreenThread executor.
See: https://github.com/zopefiend/green-concurrent.futures-with-eventlet/commit/aed3b9f17ac27eeaf8c56210e0c8e4aff2ecbdb5
I had the same problem and it has been super difficult to find any answers.
I think I managed to get something working by having a consumer running on a separate thread and using Event for synchronization. Seems to work fine.
Only caveat is that you have to be careful with monkey-patching. If you monkey-patch threading facilities this will probably not work.
import gevent
import gevent.queue
import threading
import time
q = gevent.queue.JoinableQueue()
queue_not_empty = threading.Event()
def run_task(task):
print(f"Started task {task} # {time.time()}")
# Use whatever has been monkey-patched with gevent here
gevent.sleep(1)
print(f"Finished task {task} # {time.time()}")
def consumer():
while True:
print("Waiting for item in queue")
queue_not_empty.wait()
try:
task = q.get()
print(f"Dequed task {task} for consumption # {time.time()}")
except gevent.exceptions.LoopExit:
queue_not_empty.clear()
continue
try:
gevent.spawn(run_task, task)
finally:
q.task_done()
gevent.sleep(0) # Kickstart task
def enqueue(item):
q.put(item)
queue_not_empty.set()
# Run consumer on separate thread
consumer_thread = threading.Thread(target=consumer, daemon=True)
consumer_thread.start()
# Add some tasks
for i in range(5):
enqueue(i)
time.sleep(2)
Output:
Waiting for item in queue
Dequed task 0 for consumption # 1643232632.0220542
Started task 0 # 1643232632.0222237
Waiting for item in queue
Dequed task 1 for consumption # 1643232632.0222733
Started task 1 # 1643232632.0222948
Waiting for item in queue
Dequed task 2 for consumption # 1643232632.022315
Started task 2 # 1643232632.02233
Waiting for item in queue
Dequed task 3 for consumption # 1643232632.0223525
Started task 3 # 1643232632.0223687
Waiting for item in queue
Dequed task 4 for consumption # 1643232632.022386
Started task 4 # 1643232632.0224123
Waiting for item in queue
Finished task 0 # 1643232633.0235817
Finished task 1 # 1643232633.0236874
Finished task 2 # 1643232633.0237293
Finished task 3 # 1643232633.0237558
Finished task 4 # 1643232633.0237799
Waiting for item in queue
With the new concurrent.futures module in Py3k, I would say (assuming that the processing you want to do is actually something more complex than join):
with concurrent.futures.ThreadPoolExecutor(max_workers=foo) as wp:
res = [wp.submit(fetchtitle, url) for url in mybiggenerator()]
ans = '\n'.join([a for a in concurrent.futures.as_completed(res)]
This will allow you to start processing results before all of your fetchtitle calls complete. However, it will require you to exhaust mybiggenerator before you continue -- it's not clear how you want to get around this, unless you want to set some max_urls parameter or similar. That would still be something you could do with your original implementation, though.

Categories

Resources