I am coding a framework, where the framework will call a user-supplied function.
I want to allow the used supplied function to be any of the following:
a plain function
a function returning asyncio.Future
a asyncio.coroutine
That is, the user function can be either synchronous or asynchronous, and the framework does not know in advance, but needs to cope with all variants.
Twisted has defer.maybeDeferred for this. What would be the asyncio way?
I have something like the following (full code here):
import types
types.GeneratorType
def maybe_async(value):
if isinstance(value, types.GeneratorType) or \
isinstance(value, asyncio.futures.Future):
return value
else:
future = asyncio.Future()
future.set_result(value)
return future
and then call the user supplied function f like this in the framework:
res = yield from maybe_async(f(x))
This wraps any plain function return value into a Future - always. And I am wary of the performance or other impacts of this.
Is above the "recommended" way?
Also, the "inline" version of above code does not have this overhead. How could I achieve the best of both: no overhead for the "plain" case, but no code duplication for checking for async returns all over the framework?
To sum up, there seem to be two (main) options:
Idiom 1:
res = f(x)
if yields(res):
res = yield from res
where
def yields(value):
return isinstance(value, asyncio.futures.Future) or inspect.isgenerator(value)
or
Idiom 2:
res = yield from asyncio.coroutine(f)(x)
Instead of isinstance(value, types.GeneratorType), you can write asyncio.iscoroutine(value):
https://docs.python.org/dev/library/asyncio-task.html#asyncio.iscoroutine
Related
I have written the following decorator:
def partializable(fn):
def arg_partializer(*fixable_parameters):
def partialized_fn(dynamic_arg):
return fn(dynamic_arg, *fixable_parameters)
return partialized_fn
return arg_partializer
The purpose of this decorator is to break the function call into two calls. If I decorate the following:
#partializable
def my_fn(dyn, fix1, fix2):
return dyn + fix1 + fix2
I then can do:
core_accepting_dynamic_argument = my_fn(my_fix_1, my_fix_2)
final_result = core_accepting_dynamic_argument(my_dyn)
My problem is that the now decorated my_fn exhibits the following signature: my_fn(*fixable_parameters)
I want it to be: my_fn(fix1, fix2)
How can I accomplish this? I probably have to use wraps or the decorator module, but I need to preserve only part of the original signature and I don't know if that's possible.
Taking inspiration from https://stackoverflow.com/a/33112180/9204395, it's possible to accomplish this by manually altering the signature of arg_partializer, since only the signature of fn is known in the relevant scope and can be handled with inspect.
from inspect import signature
def partializable(fn):
def arg_partializer(*fixable_parameters):
def partialized_fn(dynamic_arg):
return fn(dynamic_arg, *fixable_parameters)
return partialized_fn
# Override signature
sig = signature(fn)
sig = sig.replace(parameters=tuple(sig.parameters.values())[1:])
arg_partializer.__signature__ = sig
return arg_partializer
This is not particularly elegant, but as I think about the problem I'm starting to suspect that this (or a conceptual equivalent) is the only possible way to pull this stunt. Feel free to contradict me.
I'm writing a library which is using Tornado Web's tornado.httpclient.AsyncHTTPClient to make requests which gives my code a async interface of:
async def my_library_function():
return await ...
I want to make this interface optionally serial if the user provides a kwarg - something like: serial=True. Though you can't obviously call a function defined with the async keyword from a normal function without await. This would be ideal - though almost certain imposible in the language at the moment:
async def here_we_go():
result = await my_library_function()
result = my_library_function(serial=True)
I'm not been able to find anything online where someones come up with a nice solution to this. I don't want to have to reimplement basically the same code without the awaits splattered throughout.
Is this something that can be solved or would it need support from the language?
Solution (though use Jesse's instead - explained below)
Jesse's solution below is pretty much what I'm going to go with. I did end up getting the interface I originally wanted by using a decorator. Something like this:
import asyncio
from functools import wraps
def serializable(f):
#wraps(f)
def wrapper(*args, asynchronous=False, **kwargs):
if asynchronous:
return f(*args, **kwargs)
else:
# Get pythons current execution thread and use that
loop = asyncio.get_event_loop()
return loop.run_until_complete(f(*args, **kwargs))
return wrapper
This gives you this interface:
result = await my_library_function(asynchronous=True)
result = my_library_function(asynchronous=False)
I sanity checked this on python's async mailing list and I was lucky enough to have Guido respond and he politely shot it down for this reason:
Code smell -- being able to call the same function both asynchronously
and synchronously is highly surprising. Also it violates the rule of
thumb that the value of an argument shouldn't affect the return type.
Nice to know it's possible though if not considered a great interface. Guido essentially suggested Jesse's answer and introducing the wrapping function as a helper util in the library instead of hiding it in a decorator.
When you want to call such a function synchronously, use run_until_complete:
asyncio.get_event_loop().run_until_complete(here_we_go())
Of course, if you do this often in your code, you should come up with an abbreviation for this statement, perhaps just:
def sync(fn, *args, **kwargs):
return asyncio.get_event_loop().run_until_complete(fn(*args, **kwargs))
Then you could do:
result = sync(here_we_go)
I have written several functions that run sequentially, each one taking as its input the output of the previous function so in order to run it, I have to run this line of code
make_list(cleanup(get_text(get_page(URL))))
and I just find that ugly and inefficient, is there a better way to do sequential function calls?
Really, this is the same as any case where you want to refactor commonly-used complex expressions or statements: just turn the expression or statement into a function. The fact that your expression happens to be a composition of function calls doesn't make any difference (but see below).
So, the obvious thing to do is to write a wrapper function that composes the functions together in one place, so everywhere else you can make a simple call to the wrapper:
def get_page_list(url):
return make_list(cleanup(get_text(get_page(url))))
things = get_page_list(url)
stuff = get_page_list(another_url)
spam = get_page_list(eggs)
If you don't always call the exact same chain of functions, you can always factor out into the pieces that you frequently call. For example:
def get_clean_text(page):
return cleanup(get_text(page))
def get_clean_page(url):
return get_clean_text(get_page(url))
This refactoring also opens the door to making the code a bit more verbose but a lot easier to debug, since it only appears once instead of multiple times:
def get_page_list(url):
page = get_page(url)
text = get_text(page)
cleantext = cleanup(text)
return make_list(cleantext)
If you find yourself needing to do exactly this kind of refactoring of composed functions very often, you can always write a helper that generates the refactored functions. For example:
def compose1(*funcs):
#wraps(funcs[0])
def composed(arg):
for func in reversed(funcs):
arg = func(arg)
return arg
return composed
get_page_list = compose1(make_list, cleanup, get_text, get_page)
If you want a more complicated compose function (that, e.g., allows passing multiple args/return values around), it can get a bit complicated to design, so you might want to look around on PyPI and ActiveState for the various existing implementations.
You could try something like this. I always like separating train wrecks(the book "Clean Code" calls those nested functions train wrecks). This is easier to read and debug. Remember you probably spend twice as long reading your code than writing it so make it easier to read. You will thank yourself later.
url = get_page(URL)
url_text = get_text(url)
make_list(cleanup(url_text))
# you can also encapsulate that into its own function
def build_page_list_from_url(url):
url = get_page(URL)
url_text = get_text(url)
return make_list(cleanup(url_text))
Options:
Refactor: implement this series of function calls as one, aptly-named method.
Look into decorators. They're syntactic sugar for 'chaining' functions in this way. E.g. implement cleanup and make_list as a decorators, then decorate get_text with them.
Compose the functions. See code in this answer.
You could shorten constructs like that with something like the following:
class ChainCalls(object):
def __init__(self, *funcs):
self.funcs = funcs
def __call__(self, *args, **kwargs):
result = self.funcs[-1](*args, **kwargs)
for func in self.funcs[-2::-1]:
result = func(result)
return result
def make_list(arg): return 'make_list(%s)' % arg
def cleanup(arg): return 'cleanup(%s)' % arg
def get_text(arg): return 'get_text(%s)' % arg
def get_page(arg): return 'get_page(%r)' % arg
mychain = ChainCalls(make_list, cleanup, get_text, get_page)
print( mychain('http://is.gd') )
Output:
make_list(cleanup(get_text(get_page('http://is.gd'))))
I have used defer.inlineCallbacks in my code as I find it much easier to read and debug than using addCallbacks.
I am using PB and I have hit a problem when returning data to the client. The data is about 18Mb in size and I get a failed BananaError because of the length of the string being returned.
What I want to do is to write a generator so I can just keep calling the function and return some of the data each time the function is called.
How would I write that with inlineCallbacks already being used? Is it actually possible, If i return a value instead. Would something like the following work?
#defer.inlineCallbacks
def getLatestVersions(self):
returnlist = []
try:
latest_versions = yield self.cur.runQuery("""SELECT id, filename,path,attributes ,MAX(version) ,deleted ,snapshot , modified, size, hash,
chunk_table, added, isDir, isSymlink, enchash from files group by filename, path""")
except:
logger.exception("problem querying latest versions")
for result in latest_versions:
returnlist.append(result)
if len(return_list) >= 10:
yield return_list
returnlist = []
yield returnlist
A generator function decorated with inlineCallbacks returns a Deferred - not a generator. This is always the case. You can never return a generator from a function decorated with inlineCallbacks.
See the pager classes in twisted.spread.util for ideas about another approach you can take.
I have a python function that has a deterministic result. It takes a long time to run and generates a large output:
def time_consuming_function():
# lots_of_computing_time to come up with the_result
return the_result
I modify time_consuming_function from time to time, but I would like to avoid having it run again while it's unchanged. [time_consuming_function only depends on functions that are immutable for the purposes considered here; i.e. it might have functions from Python libraries but not from other pieces of my code that I'd change.] The solution that suggests itself to me is to cache the output and also cache some "hash" of the function. If the hash changes, the function will have been modified, and we have to re-generate the output.
Is this possible or ridiculous?
Updated: based on the answers, it looks like what I want to do is to "memoize" time_consuming_function, except instead of (or in addition to) arguments passed into an invariant function, I want to account for a function that itself will change.
If I understand your problem, I think I'd tackle it like this. It's a touch evil, but I think it's more reliable and on-point than the other solutions I see here.
import inspect
import functools
import json
def memoize_zeroadic_function_to_disk(memo_filename):
def decorator(f):
try:
with open(memo_filename, 'r') as fp:
cache = json.load(fp)
except IOError:
# file doesn't exist yet
cache = {}
source = inspect.getsource(f)
#functools.wraps(f)
def wrapper():
if source not in cache:
cache[source] = f()
with open(memo_filename, 'w') as fp:
json.dump(cache, fp)
return cache[source]
return wrapper
return decorator
#memoize_zeroadic_function_to_disk(...SOME PATH HERE...)
def time_consuming_function():
# lots_of_computing_time to come up with the_result
return the_result
Rather than putting the function in a string, I would put the function in its own file. Call it time_consuming.py, for example. It would look something like this:
def time_consuming_method():
# your existing method here
# Is the cached data older than this file?
if (not os.path.exists(data_file_name)
or os.stat(data_file_name).st_mtime < os.stat(__file__).st_mtime):
data = time_consuming_method()
save_data(data_file_name, data)
else:
data = load_data(data_file_name)
# redefine method
def time_consuming_method():
return data
While testing the infrastructure for this to work, I'd comment out the slow parts. Make a simple function that just returns 0, get all of the save/load stuff working to your satisfaction, then put the slow bits back in.
The first part is memoization and serialization of your lookup table. That should be straightforward enough based on some python serialization library. The second part is that you want to delete your serialized lookup table when the source code changes. Perhaps this is being overthought into some fancy solution. Presumably when you change the code you check it in somewhere? Why not add a hook to your checkin routine that deletes your serialized table? Or if this is not research data and is in production, make it part of your release process that if the revision number of your file (put this function in it's own file) has changed, your release script deletes the serialzed lookup table.
So, here is a really neat trick using decorators:
def memoize(f):
cache={};
def result(*args):
if args not in cache:
cache[args]=f(*args);
return cache[args];
return result;
With the above, you can then use:
#memoize
def myfunc(x,y,z):
# Some really long running computation
When you invoke myfunc, you will actually be invoking the memoized version of it. Pretty neat, huh? Whenever you want to redefine your function, simply use "#memoize" again, or explicitly write:
myfunc = memoize(new_definition_for_myfunc);
Edit
I didn't realize that you wanted to cache between multiple runs. In that case, you can do the following:
import os;
import os.path;
import cPickle;
class MemoizedFunction(object):
def __init__(self,f):
self.function=f;
self.filename=str(hash(f))+".cache";
self.cache={};
if os.path.exists(self.filename):
with open(filename,'rb') as file:
self.cache=cPickle.load(file);
def __call__(self,*args):
if args not in self.cache:
self.cache[args]=self.function(*args);
return self.cache[args];
def __del__(self):
with open(self.filename,'wb') as file:
cPickle.dump(self.cache,file,cPickle.HIGHEST_PROTOCOL);
def memoize(f):
return MemoizedFunction(f);
What you describe is effectively memoization. Most common functions can be memoized by defining a decorator.
A (overly simplified) example:
def memoized(f):
cache={}
def memo(*args):
if args in cache:
return cache[args]
else:
ret=f(*args)
cache[args]=ret
return ret
return memo
#memoized
def time_consuming_method():
# lots_of_computing_time to come up with the_result
return the_result
Edit:
From Mike Graham's comment and the OP's update, it is now clear that values need to be cached over different runs of the program. This can be done by using some of of persistent storage for the cache (e.g. something as simple as using Pickle or a simple text file, or maybe using a full blown database, or anything in between). The choice of which method to use depends on what the OP needs. Several other answers already give some solutions to this, so I'm not going to repeat that here.