Asyncio performance with synchronous code - python

I create the following test to check performance with running synchronous code in async function.
In return_random function can be something like write log, dump or load json, validate in-out date, which call other functions... etc.
count_sync and count_async variables using for skip overhead for open and close event loop. just calculate time inside function.
This part of code just call synchronous function count times.
import timeit
from time import time
from random import random
count = 100
run_numbers = 100000
count_sync = 0
def return_random():
return random()
def test():
global count_sync
start = time()
for _ in range(count):
return_random()
count_sync += time() - start
return
total_sunc = timeit.timeit('test()', globals=globals(),
number=run_numbers))
Same code but now return_random is asynchronous function:
import asyncio
import uvloop
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
count_async = 0
async def return_random_async():
return random()
async def test_async():
global count_async
start = time()
for _ in range(count):
await return_random_async()
count_async += time() - start
return
total_sunc = timeit.timeit('asyncio.run(test_async())', globals=globals(), number=run_numbers)
After running code with different numbers of call function and count of timeit running got following results:
RUNNING run_numbers: 1000. CALL FUNCTIONS count: 1000
total sync: 0.12023316
total Async: 0.48369559500000003
inside def sync 0.11995530128479004
inside def Async:0.24073457717895508
RUNNING run_numbers: 100000. CALL FUNCTIONS count: 100
total sync: 1.422697458
total Async: 25.452165134999998 (!!!)
inside def sync: 1.3965537548065186
inside def Async: 2.8397130966186523
All times run with synchronous function faster more than 2 times.
Is it means that running synchronous code better with not async function?
And preferably do not use a lot async functions ?

You need to use async functions only when you really need. Examples: asynchronous http libraries like aiohttp, asynchronous drivers like motor_asyncio for MongoDB, etc. In other cases it's better to run synchronous code with not async functions, because they have an overhead that you don't need to have.

Related

Call 4 methods at once in Python 3

I want to call 4 methods at once so they run parallel-ly in Python. These methods make HTTP calls and do some basic operation like verify response. I want to call them at once so the time taken will be less. Say each method takes ~20min to run, I want all 4methods to return response in 20min and not 20*4 80min
It is important to note that the 4methods I'm trying to run in parallel are async functions. When I tried using ThreadPoolExecutor to run the 4methods in parallel I didn't see much difference in time taken.
Example code - edited from #tomerar comment below
from concurrent.futures import ThreadPoolExecutor
async def foo_1():
print("foo_1")
async def foo_2():
print("foo_2")
async def foo_3():
print("foo_3")
async def foo_4():
print("foo_4")
with ThreadPoolExecutor() as executor:
for foo in [await foo_1,await foo_2,await foo_3,await foo_4]:
executor.submit(foo)
Looking for suggestions
You can use from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import ThreadPoolExecutor
def foo_1():
print("foo_1")
def foo_2():
print("foo_2")
def foo_3():
print("foo_3")
def foo_4():
print("foo_4")
with ThreadPoolExecutor() as executor:
for foo in [foo_1,foo_2,foo_3,foo_4]:
executor.submit(foo)
You can use "multiprocessing" in python.
it's so simple
from multiprocessing import Pool
pool = Pool()
result1 = pool.apply_async(solve1, [A]) # evaluate "solve1(A)"
result2 = pool.apply_async(solve2, [B]) # evaluate "solve2(B)"
answer1 = result1.get(timeout=10)
answer2 = result2.get(timeout=10)
you can see full details

Python Threading Timer - activate function every X seconds

Is there any simple way to activate the thread to fire up the function every X sec, to display some data?
def send_data():
data = "Message from client"
socket.sendall(data.encode())
write_thread = threading.Thread(target=send_data())
write_thread.start()
You could try the ischedule module - it provides very straightforward syntax for scheduling any given function.
Here's an example straight from the GitHub page:
from ischedule import run_loop, schedule
#schedule(interval=0.1)
def task():
print("Performing a task")
run_loop(return_after=1)
The return_after param in run_loop() is an optional timeout.
Also, in case you're unfamiliar, the # syntax is a Python decorator.
A simple way would be this:
import time
while True:
task()
time.sleep(1)

How to properly use asyncio in a multi-producer-consumer flow that involves writing to a file or a .gzip file?

I am implementing a python module that takes a tuple with three lists (x,y,val) and subsamples them according to a given ratio. Am I doing it the right way?
Do I write to the disk asynchronously?
Could I have many producers and consumers such that all of them generate and write data to the same output file?
When I compare this code to a naive implementation of a single thread they perform similary with respect to their runtime.
import bisect
import numpy as np
import gzip
import asyncio
class SignalPDF:
def __init__(self, inputSignal):
self.x = inputSignal[0][:]
self.y = inputSignal[1][:]
self.vals = inputSignal[2][:]
self.valCumsum = np.cumsum(self.vals)
self.totalSum = np.sum(self.vals)
self.N = len(self.vals)
class SignalSampler:
def __init__(self, inputSignal, ratio=1.0):
self.signalPDF = SignalPDF(inputSignal)
self.Q = asyncio.Queue()
self.ratio = float(ratio)
self.N = int(self.signalPDF.N/self.ratio)
self.sampledN = 0
async def randRead(self):
while self.sampledN < self.N:
i = np.random.randint(self.signalPDF.totalSum, size=1, dtype=np.uint64)[0]
self.sampledN += 1
cell = bisect.bisect(self.signalPDF.valCumsum, i)
yield (self.signalPDF.x[cell], self.signalPDF.y[cell], int(self.signalPDF.vals[cell]))
async def readShortFormattedLine(self):
async for read in self.randRead():
x = read[0]; y = read[1]; val = read[2];
yield '{0} {1} {2}'.format(x,y,val)
async def populateQueue(self):
async for i in self.readShortFormattedLine():
await self.Q.put(i)
await self.Q.put(None)
async def hanldeGzip(self, filePath):
with gzip.open(filePath, 'wt') as f:
while True:
item = await self.Q.get()
if item is None:
break
f.write('{0}\n'.format(item))
f.flush()
async def hanldeFile(self, filePath):
with open(filePath, 'w+') as f:
while True:
item = await self.Q.get()
if item is None:
break
f.write('{0}\n'.format(item))
f.flush()
def main(gzip, outputFile):
x=[]; y=[];val=[]
for i in range(100):
for j in range(100):
x.append(i)
y.append(j)
val.append(np.random.randint(0,250))
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
mixer = SignalSampler(inputSignal=[x,y,val], ratio=2.0)
futures = []
if gzip:
futures = [mixer.hanldeGzip(outputFile), mixer.populateQueue()]
else:
futures = [mixer.hanldeFile(outputFile), mixer.populateQueue()]
tasks = asyncio.wait(futures, loop=loop)
results = loop.run_until_complete(tasks)
loop.close()
main(gzip=False, outputFile='/tmp/a.txt')
main(gzip=True, outputFile='/tmp/a.txt.gz')
How asyncio works
Let's consider a task of making two web requests.
Synchronous version:
Send request 1
Wait for the answer for 1 sec.
Send request 2
Wait for the answer for 1 sec.
Both requests finished in 2 sec.
Asynchronous version:
Send request 1
Instead of waiting, immediately send request 2
Wait for the answers for over 1 sec.
Both requests finished in 1 sec.
asyncio allows you to write program that actually works like in second asynchronous version, while your code looks very similar to (intuitive) first version.
Note important thing here: only reason asynchronous version is faster is that it starts another concurrent operation immediately instead of waiting first one is fully finished. It has nothing to do with threads, asyncio works in a single main thread.
What about disk I/O?
Can your hardware read/write two files parallely?
If you have one physical HDD then, probably, not: it has one physical "needle" that can read/write single piece of data at the time. Asynchronous approach won't help you then.
Situation may differ if you have multiple disks. Although I have ho idea if OS/asyncio can handle working with multiple disks parallely (probably not).
Let's presume that you expect your hardware and OS to support multiple disk I/O. It will probably only work when you use multiple threads or processes for operations:
Module aiofiles uses threads to work with files - you can give it a try
To work with processes with ProcessPoolExecutor & asyncio you can use run_in_executor as shown here
There's also some chance that using processes or even threads will increase disk I/O purely due to parallelizing related CPU-bound operations, but I have no idea if it's the case and how beneficial it can be (probably not much comparing to disk I/O).

Coroutine as background job in Jupyter notebook

I have a coroutine which I'd like to run as a "background job" in a Jupyter notebook. I've seen ways to accomplish this using threading, but I'm wondering whether it's also possible to hook into the notebook's event loop.
For example, say I have the following class:
import asyncio
class Counter:
def __init__(self):
self.counter = 0
async def run(self):
while True:
self.counter += 1
await asyncio.sleep(1.0)
t = Counter()
and I'd like to execute the run method (which loops indefinitely), while still being able to check the t.counter variable at any point. Any ideas?
The following basically does what I want I think, but it does use a separate thread. However, I can still use the async primitives.
def run_loop():
loop = asyncio.new_event_loop()
run_loop.loop = loop
asyncio.set_event_loop(loop)
task = loop.create_task(t.run())
loop.run_until_complete(task)
from IPython.lib import backgroundjobs as bg
jobs = bg.BackgroundJobManager()
jobs.new('run_loop()')
loop = run_loop.loop # to access the loop outside
I think your code works perfectly, you just need to create a task to wrap the coroutine, i.e. :
import asyncio
class Counter:
def __init__(self):
self.counter = 0
async def run(self):
while True:
self.counter += 1
await asyncio.sleep(1.0)
t = Counter()
asyncio.create_task(t.run())
wait 10 seconds, and check
t.counter
should get
> 10
There's a simplified version of what Mark proposed:
from IPython.lib import backgroundjobs as bg
jobs = bg.BackgroundJobManager()
jobs.new(asyncio.get_event_loop().run_forever)
If you need, you can access the loop with asyncio.get_event_loop()

Asynchronous method call in Python?

I was wondering if there's any library for asynchronous method calls in Python. It would be great if you could do something like
#async
def longComputation():
<code>
token = longComputation()
token.registerCallback(callback_function)
# alternative, polling
while not token.finished():
doSomethingElse()
if token.finished():
result = token.result()
Or to call a non-async routine asynchronously
def longComputation()
<code>
token = asynccall(longComputation())
It would be great to have a more refined strategy as native in the language core. Was this considered?
Something like:
import threading
thr = threading.Thread(target=foo, args=(), kwargs={})
thr.start() # Will run "foo"
....
thr.is_alive() # Will return whether foo is running currently
....
thr.join() # Will wait till "foo" is done
See the documentation at https://docs.python.org/library/threading.html for more details.
You can use the multiprocessing module added in Python 2.6. You can use pools of processes and then get results asynchronously with:
apply_async(func[, args[, kwds[, callback]]])
E.g.:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=1) # Start a worker processes.
result = pool.apply_async(f, [10], callback) # Evaluate "f(10)" asynchronously calling callback when finished.
This is only one alternative. This module provides lots of facilities to achieve what you want. Also it will be really easy to make a decorator from this.
As of Python 3.5, you can use enhanced generators for async functions.
import asyncio
import datetime
Enhanced generator syntax:
#asyncio.coroutine
def display_date(loop):
end_time = loop.time() + 5.0
while True:
print(datetime.datetime.now())
if (loop.time() + 1.0) >= end_time:
break
yield from asyncio.sleep(1)
loop = asyncio.get_event_loop()
# Blocking call which returns when the display_date() coroutine is done
loop.run_until_complete(display_date(loop))
loop.close()
New async/await syntax:
async def display_date(loop):
end_time = loop.time() + 5.0
while True:
print(datetime.datetime.now())
if (loop.time() + 1.0) >= end_time:
break
await asyncio.sleep(1)
loop = asyncio.get_event_loop()
# Blocking call which returns when the display_date() coroutine is done
loop.run_until_complete(display_date(loop))
loop.close()
It's not in the language core, but a very mature library that does what you want is Twisted. It introduces the Deferred object, which you can attach callbacks or error handlers ("errbacks") to. A Deferred is basically a "promise" that a function will have a result eventually.
You can implement a decorator to make your functions asynchronous, though that's a bit tricky. The multiprocessing module is full of little quirks and seemingly arbitrary restrictions – all the more reason to encapsulate it behind a friendly interface, though.
from inspect import getmodule
from multiprocessing import Pool
def async(decorated):
r'''Wraps a top-level function around an asynchronous dispatcher.
when the decorated function is called, a task is submitted to a
process pool, and a future object is returned, providing access to an
eventual return value.
The future object has a blocking get() method to access the task
result: it will return immediately if the job is already done, or block
until it completes.
This decorator won't work on methods, due to limitations in Python's
pickling machinery (in principle methods could be made pickleable, but
good luck on that).
'''
# Keeps the original function visible from the module global namespace,
# under a name consistent to its __name__ attribute. This is necessary for
# the multiprocessing pickling machinery to work properly.
module = getmodule(decorated)
decorated.__name__ += '_original'
setattr(module, decorated.__name__, decorated)
def send(*args, **opts):
return async.pool.apply_async(decorated, args, opts)
return send
The code below illustrates usage of the decorator:
#async
def printsum(uid, values):
summed = 0
for value in values:
summed += value
print("Worker %i: sum value is %i" % (uid, summed))
return (uid, summed)
if __name__ == '__main__':
from random import sample
# The process pool must be created inside __main__.
async.pool = Pool(4)
p = range(0, 1000)
results = []
for i in range(4):
result = printsum(i, sample(p, 100))
results.append(result)
for result in results:
print("Worker %i: sum value is %i" % result.get())
In a real-world case I would ellaborate a bit more on the decorator, providing some way to turn it off for debugging (while keeping the future interface in place), or maybe a facility for dealing with exceptions; but I think this demonstrates the principle well enough.
Just
import threading, time
def f():
print "f started"
time.sleep(3)
print "f finished"
threading.Thread(target=f).start()
My solution is:
import threading
class TimeoutError(RuntimeError):
pass
class AsyncCall(object):
def __init__(self, fnc, callback = None):
self.Callable = fnc
self.Callback = callback
def __call__(self, *args, **kwargs):
self.Thread = threading.Thread(target = self.run, name = self.Callable.__name__, args = args, kwargs = kwargs)
self.Thread.start()
return self
def wait(self, timeout = None):
self.Thread.join(timeout)
if self.Thread.isAlive():
raise TimeoutError()
else:
return self.Result
def run(self, *args, **kwargs):
self.Result = self.Callable(*args, **kwargs)
if self.Callback:
self.Callback(self.Result)
class AsyncMethod(object):
def __init__(self, fnc, callback=None):
self.Callable = fnc
self.Callback = callback
def __call__(self, *args, **kwargs):
return AsyncCall(self.Callable, self.Callback)(*args, **kwargs)
def Async(fnc = None, callback = None):
if fnc == None:
def AddAsyncCallback(fnc):
return AsyncMethod(fnc, callback)
return AddAsyncCallback
else:
return AsyncMethod(fnc, callback)
And works exactly as requested:
#Async
def fnc():
pass
You could use eventlet. It lets you write what appears to be synchronous code, but have it operate asynchronously over the network.
Here's an example of a super minimal crawler:
urls = ["http://www.google.com/intl/en_ALL/images/logo.gif",
"https://wiki.secondlife.com/w/images/secondlife.jpg",
"http://us.i1.yimg.com/us.yimg.com/i/ww/beta/y3.gif"]
import eventlet
from eventlet.green import urllib2
def fetch(url):
return urllib2.urlopen(url).read()
pool = eventlet.GreenPool()
for body in pool.imap(fetch, urls):
print "got body", len(body)
Something like this works for me, you can then call the function, and it will dispatch itself onto a new thread.
from thread import start_new_thread
def dowork(asynchronous=True):
if asynchronous:
args = (False)
start_new_thread(dowork,args) #Call itself on a new thread.
else:
while True:
#do something...
time.sleep(60) #sleep for a minute
return
You can use concurrent.futures (added in Python 3.2).
import time
from concurrent.futures import ThreadPoolExecutor
def long_computation(duration):
for x in range(0, duration):
print(x)
time.sleep(1)
return duration * 2
print('Use polling')
with ThreadPoolExecutor(max_workers=1) as executor:
future = executor.submit(long_computation, 5)
while not future.done():
print('waiting...')
time.sleep(0.5)
print(future.result())
print('Use callback')
executor = ThreadPoolExecutor(max_workers=1)
future = executor.submit(long_computation, 5)
future.add_done_callback(lambda f: print(f.result()))
print('waiting for callback')
executor.shutdown(False) # non-blocking
print('shutdown invoked')
The newer asyncio running method in Python 3.7 and later is using asyncio.run() instead of creating loop and calling loop.run_until_complete() as well as closing it:
import asyncio
import datetime
async def display_date(delay):
loop = asyncio.get_running_loop()
end_time = loop.time() + delay
while True:
print("Blocking...", datetime.datetime.now())
await asyncio.sleep(1)
if loop.time() > end_time:
print("Done.")
break
asyncio.run(display_date(5))
Is there any reason not to use threads? You can use the threading class.
Instead of finished() function use the isAlive(). The result() function could join() the thread and retrieve the result. And, if you can, override the run() and __init__ functions to call the function specified in the constructor and save the value somewhere to the instance of the class.
The native Python way for asynchronous calls in 2021 with Python 3.9 suitable also for Jupyter / Ipython Kernel
Camabeh's answer is the way to go since Python 3.3.
async def display_date(loop):
end_time = loop.time() + 5.0
while True:
print(datetime.datetime.now())
if (loop.time() + 1.0) >= end_time:
break
await asyncio.sleep(1)
loop = asyncio.get_event_loop()
# Blocking call which returns when the display_date() coroutine is done
loop.run_until_complete(display_date(loop))
loop.close()
This will work in Jupyter Notebook / Jupyter Lab but throw an error:
RuntimeError: This event loop is already running
Due to Ipython's usage of event loops we need something called nested asynchronous loops which is not yet implemented in Python. Luckily there is nest_asyncio to deal with the issue. All you need to do is:
!pip install nest_asyncio # use ! within Jupyter Notebook, else pip install in shell
import nest_asyncio
nest_asyncio.apply()
(Based on this thread)
Only when you call loop.close() it throws another error as it probably refers to Ipython's main loop.
RuntimeError: Cannot close a running event loop
I'll update this answer as soon as someone answered to this github issue.
You can use process. If you want to run it forever use while (like networking) in you function:
from multiprocessing import Process
def foo():
while 1:
# Do something
p = Process(target = foo)
p.start()
if you just want to run it one time, do like that:
from multiprocessing import Process
def foo():
# Do something
p = Process(target = foo)
p.start()
p.join()

Categories

Resources