Factoring out asynchronous code involving tornado.gen.Task

Factoring out asynchronous code involving tornado.gen.Task - python

I have numerous tornado.web.RequestHandler classes that test for authorized access using id and access key secure cookies. I access mongodb asynchronously with inline callbacks using gen.Task. I am having trouble figuring out a way to factor out the repetitive code because of its asynchronicity. How can I do this?
class MyHandler(RequestHandler):
#tornado.web.asynchronous
#gen.engine
def get(self):
id = self.get_secure_cookie('id', None)
accesskey = self.get_secure_cookie('accesskey', None)
if not id or not accesskey:
self.redirect('/a_public_area')
return
try:
# convert to bson id format to access mongodb
bson.objectid.ObjectId(id)
except:
# if not valid object id
self.redirect('/a_public_area')
return
found_id, error = yield gen.Task(asyncmong_client_inst.collection.find_one,
{'_id': id, 'accesskey': accesskey}, fields={'_id': 1})
if error['error']:
raise HTTPError(500)
return
if not found_id[0]:
self.redirect('/a_public_area')
return
# real business code follows
I would like to factor the above into a function that yields perhaps an HTTP status code.

Tornado has decorator #tornado.web.authenticated. Let's use it.
class BaseHandler(RequestHandler):
def get_login_url(self):
return u"/a_public_area"
#gen.engine #Not sure about this step
def get_current_user(self):
id = self.get_secure_cookie('id', None)
accesskey = self.get_secure_cookie('accesskey', None)
if not id or not accesskey:
return False
#Are you sure need this?
try:
# convert to bson id format to access mongodb
bson.objectid.ObjectId(id)
except:
# if not valid object id
return False
#I believe that you don't need asynchronous mongo on auth query, so if it's not working - replace it with sync call
found_id, error = yield gen.Task(asyncmong_client_inst.collection.find_one,
{'_id': id, 'accesskey': accesskey}, fields={'_id': 1})
if error['error']:
raise HTTPError(500)
if not found_id[0]:
return False
return found_id
class MyHandler(BaseHandler):
#tornado.web.asynchronous
#tornado.web.authenticated
#gen.engine
def get(self):
# real business code follows
Using gen everywhere - not good practice. It can turn this world in big spaghetti. Think about it.

perhaps a decorator (not tested or anything, just some ideas)
def sanitize(fn):
def _sanitize(self, *args, **kwargs):
id = self.get_secure_cookie('id', None)
accesskey = self.get_secure_cookie('accesskey', None)
if not id or not accesskey:
self.redirect('/a_public_area')
return
try:
# convert to bson id format to access mongodb
bson.objectid.ObjectId(id)
except:
# if not valid object id
self.redirect('/a_public_area')
return
return fn(self, *args, **kwargs)
return _sanitize
dunno if you can make the check_errors work with the business logic..but maybe..
def check_errors(fn):
def _check_errors(*args, **kwargs)
found_id, error = fn(*args, **kwargs)
if error['error']:
raise HTTPError(500)
return
if not found_id[0]:
self.redirect('/a_public_area')
return
return _check_errors
then
class MyHandler(RequestHandler):
#tornado.web.asynchronous
#gen.engine
#sanitize
#check_errors #..O.o decorators
def get(self):
found_id, error = yield gen.Task(asyncmong_client_inst.collection.find_one,
{'_id': id, 'accesskey': accesskey}, fields={'_id': 1})
return found_id, error

I'd like to address this general problem with gen.Task, which is that factoring out code is either impossible or extremely clumsy.
You can only do "yield gen.Task(...)" within the get() or post() method. If you want to have get() call another function foo(), and do the work in foo(), well: You can't, unless you want to write everything as a generator and chain them together in some unwieldy way. As your project gets bigger, this is going to be a huge problem.
This is a much better alternative: https://github.com/mopub/greenlet-tornado
We used this to convert a large synchronous codebase to Tornado, with almost no changes.

Related

Return HTTP response in the function called by view

In view, I accepts json keys and values in request.body. I plan to check for the existence of the json keys required (typo maybe) in another function
def checkJsonKey(form, *args):
for key in enumerate(args):
if key not in form:
return HttpResponse(status = 400) #<--
Instead of doing checking on the returned value, can this function directly return response and terminate this view function?
In my view function,
form = json.loads(request.body)
checkJsonKey(form,"user_preference","model_id", "filename")

Here's how i would handle this, using an exception and then catching it, rather than returning the response, this allows you to return a number of error conditions from your function and handle them in a try/except catcher.
def checkJsonKey(form, *args):
if type(form) not dict:
raise ValueError("Not dict")
# assuming you actually want to deal with a list of the args, and not a turple pairs from enumerate
for key in list(args):
# assuming that form is a dict
if key not in form.keys():
raise ValueError("Key not found")
return True
this would be called as
form = json.loads(request.body)
try:
checkJsonKey(form,"user_preference","model_id", "filename")
except ValueError as e:
return HttpResponseBadRequest("%s" % e)
# rest of the code
if you wanted to get really fancy, you could define your own error class and specifically catch for that

Check if the value returned from checkJsonKey is None, if it is not, then it must have returned your Response and you can return that from the view function.
def my_view(request, *args, **kwargs):
form = json.loads(request.body)
response = checkJsonKey(form, *args)
if response is not None:
return response

It won't make much difference, but if you want another method, json.loads() returns a dictionary from the json string passed to it, of which existence of keys can easily be checked by calling the form[key] syntax. You can further check the docs of the json API here.
Try this:
def checkJsonKey(form, *args):
for key in enumerate(args):
if form[key] != None:
return False
return True
def my_view(request, *args, **kwargs):
form = json.loads(request.body)
response = checkJsonKey(form, *args)
if response:
return HttpResponse(status = 200)
else:
return HttpResponse(status = 400)

Google App Engine NDB post_create hook

I wanted to create a proper post_create (also post_get and post_put) hooks, similar to the ones I had on the DB version of my app.
Unfortunately I can't use has_complete_key.
The problem is quite known: lack of is_saved in a model.
Right now I have implemented it like this:
class NdbStuff(HooksInterface):
def __init__(self, *args, **kwds):
super(NdbStuff, self).__init__(*args, **kwds)
self._is_saved = False
def _put_async(self, post_hooks=True, **ctx_options):
""" Implementation of pre/post create hooks. """
if not self._is_saved:
self._pre_create_hook()
fut = super(NdbStuff, self)._put_async(**ctx_options)
if not self._is_saved:
fut._immediate_callbacks.insert(
0,
(
self._post_create_hook,
[fut],
{},
)
)
self._is_saved = True
if post_hooks is False:
fut._immediate_callbacks = []
return fut
put_async = _put_async
#classmethod
def _post_get_hook(cls, key, future):
obj = future.get_result()
if obj is not None:
obj._is_saved = True
cls._post_get(key, future)
def _post_put_hook(self, future):
if future.state == future.FINISHING:
self._is_saved = True
else:
self._is_saved = False
self._post_put(future)
Everything except the post_create hook seems to work.
The post_create is triggered every time the I use put_async without retrieving the object first.
I would really appreciate a clue on how to trigger the post_create_hook only once after the object was created.

I am not sure why you are creating the NDBStuff class.
Any way if you creating an instance of a class, and you want to track _is_saved or something similar , use a factory to control creation and setting of the property, in this case it makes more sense to track _is_new for example.
class MyModel(ndb.Model):
some_prop = ndb.StringProperty()
def _pre_put_hook(self):
if getattr(self,'_is_new',None):
self._pre_create_hook()
# do something
def _pre_create_hook(self):
# do something on first save
log.info("First put for this object")
def _post_create_hook(self, future):
# do something
def _post_put_hook(self, future);
if getattr(self,'_is_new', None):
self._post_create_hook(future)
# Get rid of the flag on successful put,
# in case you make some changes and save again.
delattr(self,'_is_new')
#classmethod
def factory(cls,*args,**kwargs):
new_obj = cls(*args,**kwargs)
settattr(new_obj,'_is_new',True)
return new_obj
Then
myobj = MyModel.factory(someargs)
myobj.put()
myobj.some_prop = 'test'
myobj.put()
Will call the _pre_create_hook on the first put, and not on the second.
Always create the entities through the factory then you will always have the to call to _pre_create_hook executed.
Does that make sense ?

Pythonic - How to initialize a construtor with multiple arguments and validate

I'm a python noob and I'm trying to solve my problems the 'pythonic' way. I have a class, who's __init__ method takes 6 parameters. I need to validate each param and throw/raise an Exception if any fails to validate.
Is this the right way?
class DefinitionRunner:
def __init__(self, canvasSize, flightId, domain, definitionPath, harPath):
self.canvasSize = canvasSize
self.flightId = flightId
self.domain = domain
self.harPath = harPath
self.definitionPath = definitionPath
... bunch of validation checks...
... if fails, raise ValueError ...

If you want the variables to be settable independently of __init__, you could use properties to implement validations in separate methods.
They work only for new style classes though, so you need to define the class as class DefinitionRunner(object)
So for example,
#property
def canvasSize(self):
return self._canvasSize
#canvasSize.setter
def canvasSize(self, value):
# some validation here
self._canvasSize = value

Broadly speaking, that looks like the way you'd do it. Though strictly speaking, you might as well do validation before rather than after assignment, especially if assignment could potentially be time or resource intensive. Also, style convention says not to align assignment blocks like you are.

I would do it like you did it. Except the validating stuff. I would validate in a setter method and use it to set the attributes.

You could do something like this. Make a validator for each type of input. Make a helper function to run validation:
def validate_and_assign(obj, items_d, validators):
#validate all entries
for key, validator in validators.items():
if not validator[key](items_d[key]):
raise ValueError("Validation for %s failed" % (key,))
#set all entries
for key, val in items_d.items():
setattr(obj, key, val)
Which you'd use like this:
class DefinitionRunner:
validators = {
'canvasSize': canvasSize_validator,
'flightId': flightId_validator,
'domain': domain_validator,
'definitionPath': definitionPath_validator,
'harPath': harPath_validator,
}
def __init__(self, canvasSize, flightId, domain, definitionPath, harPath):
validate_and_assign(self, {
'canvasSize': canvasSize,
'flightId': flightId,
'domain': domain,
'definitionPath': definitionPath,
'harPath': harPath,
}, DefinitionRunner.validators)
The validators might be the same function, of course, if the data type is the same.

I'm not sure if this is exactly "Pythonic", but I've defined a function decorator called require_type. (To be honest, I think I found it somewhere online.)
def require_type(my_arg, *valid_types):
'''
A simple decorator that performs type checking.
#param my_arg: string indicating argument name
#param valid_types: list of valid types
'''
def make_wrapper(func):
if hasattr(func, 'wrapped_args'):
wrapped = getattr(func, 'wrapped_args')
else:
body = func.func_code
wrapped = list(body.co_varnames[:body.co_argcount])
try:
idx = wrapped.index(my_arg)
except ValueError:
raise(NameError, my_arg)
def wrapper(*args, **kwargs):
def fail():
all_types = ', '.join(str(typ) for typ in valid_types)
raise(TypeError, '\'%s\' was type %s, expected to be in following list: %s' % (my_arg, all_types, type(arg)))
if len(args) > idx:
arg = args[idx]
if not isinstance(arg, valid_types):
fail()
else:
if my_arg in kwargs:
arg = kwargs[my_arg]
if not isinstance(arg, valid_types):
fail()
return func(*args, **kwargs)
wrapper.wrapped_args = wrapped
return wrapper
return make_wrapper
Then, to use it:
class SomeObject(object):
#require_type("prop1", str)
#require_type("prop2", numpy.complex128)
def __init__(self, prop1, prop2):
pass

Python: How to enable the following API using class magic

Objective:
Given something like:
stackoverflow.users['55562'].questions.unanswered()
I want it converted into the following:
http://api.stackoverflow.com/1.1/users/55562/questions/unanswered
I have been able to achieve that, using the following class:
class SO(object):
def __init__(self,**kwargs):
self.base_url = kwargs.pop('base_url',[]) or 'http://api.stackoverflow.com/1.1'
self.uriparts = kwargs.pop('uriparts',[])
for k,v in kwargs.items():
setattr(self,k,v)
def __getattr__(self,key):
self.uriparts.append(key)
return self.__class__(**self.__dict__)
def __getitem__(self,key):
return self.__getattr__(key)
def __call__(self,**kwargs):
return "%s/%s"%(self.base_url,"/".join(self.uriparts))
if __name__ == '__main__':
print SO().abc.mno.ghi.jkl()
print SO().abc.mno['ghi'].jkl()
#prints the following
http://api.stackoverflow.com/1.1/abc/mno/ghi/jkl
http://api.stackoverflow.com/1.1/abc/mno/ghi/jkl
Now my problem is I can't do something like:
stackoverflow = SO()
user1 = stackoverflow.users['55562']
user2 = stackoverflow.users['55462']
print user1.questions.unanswered
print user2.questions.unanswered
#prints the following
http://api.stackoverflow.com/1.1/users/55562/users/55462/questions/unanswered
http://api.stackoverflow.com/1.1/users/55562/users/55462/questions/unanswered/questions/unanswered
Essentially, the user1 and user2 refer to the same SO object, so it can't represent different users.
I have been thinking any pointers to do that would be helpful, because this additional level of functionality would make the API far more interesting.

IMHO, when you recreate a new stackoverflow object, you need to separate the arguments from old instance attributes with a deep copy
import copy
........
def __getattr__(self,key):
dict = copy.deepcopy(self.__dict__)
dict['uriparts'].append(key)
return self.__class__(**dict)
....
If you want more flexibility on the URI parts, an abstraction is needed for a cleaner design. For example:
class SOURIParts(object):
def __init__(self, so, uriparts, **kwargs):
self.so = so
self.uriparts = uriparts
for k,v in kwargs.items():
setattr(self,k,v)
def __getattr__(self,key):
return SOURIParts(self.so, self.uriparts+[key])
def __getitem__(self,key):
return self.__getattr__(key)
def __call__(self,**kwargs):
return "%s/%s"%(self.so.base_url,"/".join(self.uriparts))
class SO(object):
def __init__(self, base_url='http://api.stackoverflow.com/1.1'):
self.base_url = base_url
def __getattr__(self,key):
return SOURIParts(self, [])
def __getitem__(self,key):
return self.__getattr__(key)
I hope this helps.

You could override __getslice__(Python 2.7), or getitem()(Python3.x) and use a memorizing decorator so that if the slice you request (the userid) has already been looked up it would use cached results -- otherwise it could retrieve the results and populate the existing SO instance object.
However, I think a more OO way to solve the problem is make SO a pure lookup module that returns stack overflow user objects which would then have the deeper-digging lookups for profile details. But thats just me.

How to put variables on the stack/context in Python

In essence, I want to put a variable on the stack, that will be reachable by all calls below that part on the stack until the block exits. In Java I would solve this using a static thread local with support methods, that then could be accessed from methods.
Typical example: you get a request, and open a database connection. Until the request is complete, you want all code to use this database connection. After finishing and closing the request, you close the database connection.
What I need this for, is a report generator. Each report consist of multiple parts, each part can rely on different calculations, sometimes different parts relies in part on the same calculation. As I don't want to repeat heavy calculations, I need to cache them. My idea is to decorate methods with a cache decorator. The cache creates an id based on the method name and module, and it's arguments, looks if it has this allready calculated in a stack variable, and executes the method if not.
I will try and clearify by showing my current implementation. Want I want to do is to simplify the code for those implementing calculations.
First, I have the central cache access object, which I call MathContext:
class MathContext(object):
def __init__(self, fn):
self.fn = fn
self.cache = dict()
def get(self, calc_config):
id = create_id(calc_config)
if id not in self.cache:
self.cache[id] = calc_config.exec(self)
return self.cache[id]
The fn argument is the filename the context is created in relation to, from where data can be read to be calculated.
Then we have the Calculation class:
class CalcBase(object):
def exec(self, math_context):
raise NotImplementedError
And here is a stupid Fibonacci example. Non of the methods are actually recursive, they work on large sets of data instead, but it works to demonstrate how you would depend on other calculations:
class Fibonacci(CalcBase):
def __init__(self, n): self.n = n
def exec(self, math_context):
if self.n < 2: return 1
a = math_context.get(Fibonacci(self.n-1))
b = math_context.get(Fibonacci(self.n-2))
return a+b
What I want Fibonacci to be instead, is just a decorated method:
#cache
def fib(n):
if n<2: return 1
return fib(n-1)+fib(n-2)
With the math_context example, when math_context goes out of scope, so does all it's cached values. I want the same thing for the decorator. Ie. at point X, everything cached by #cache is dereferrenced to be gced.

I went ahead and made something that might just do what you want. It can be used as both a decorator and a context manager:
from __future__ import with_statement
try:
import cPickle as pickle
except ImportError:
import pickle
class cached(object):
"""Decorator/context manager for caching function call results.
All results are cached in one dictionary that is shared by all cached
functions.
To use this as a decorator:
#cached
def function(...):
...
The results returned by a decorated function are not cleared from the
cache until decorated_function.clear_my_cache() or cached.clear_cache()
is called
To use this as a context manager:
with cached(function) as function:
...
function(...)
...
The function's return values will be cleared from the cache when the
with block ends
To clear all cached results, call the cached.clear_cache() class method
"""
_CACHE = {}
def __init__(self, fn):
self._fn = fn
def __call__(self, *args, **kwds):
key = self._cache_key(*args, **kwds)
function_cache = self._CACHE.setdefault(self._fn, {})
try:
return function_cache[key]
except KeyError:
function_cache[key] = result = self._fn(*args, **kwds)
return result
def clear_my_cache(self):
"""Clear the cache for a decorated function
"""
try:
del self._CACHE[self._fn]
except KeyError:
pass # no cached results
def __enter__(self):
return self
def __exit__(self, type, value, traceback):
self.clear_my_cache()
def _cache_key(self, *args, **kwds):
"""Create a cache key for the given positional and keyword
arguments. pickle.dumps() is used because there could be
unhashable objects in the arguments, but passing them to
pickle.dumps() will result in a string, which is always hashable.
I used this to make the cached class as generic as possible. Depending
on your requirements, other key generating techniques may be more
efficient
"""
return pickle.dumps((args, sorted(kwds.items())), pickle.HIGHEST_PROTOCOL)
#classmethod
def clear_cache(cls):
"""Clear everything from all functions from the cache
"""
cls._CACHE = {}
if __name__ == '__main__':
# used as decorator
#cached
def fibonacci(n):
print "calculating fibonacci(%d)" % n
if n == 0:
return 0
if n == 1:
return 1
return fibonacci(n - 1) + fibonacci(n - 2)
for n in xrange(10):
print 'fibonacci(%d) = %d' % (n, fibonacci(n))
def lucas(n):
print "calculating lucas(%d)" % n
if n == 0:
return 2
if n == 1:
return 1
return lucas(n - 1) + lucas(n - 2)
# used as context manager
with cached(lucas) as lucas:
for i in xrange(10):
print 'lucas(%d) = %d' % (i, lucas(i))
for n in xrange(9, -1, -1):
print 'fibonacci(%d) = %d' % (n, fibonacci(n))
cached.clear_cache()
for n in xrange(9, -1, -1):
print 'fibonacci(%d) = %d' % (n, fibonacci(n))

this question seems to be two question
a) sharing db connection
b) caching/Memoizing
b) you have answered yourselves
a) I don't seem to understand why you need to put it on stack?
you can do one of these
you can use a class and connection
could be attribute of it
you can decorate all your function
so that they get a connection from
central location
each function can explicitly use a
global connection method
you can create a connection and pass
around it, or create a context
object and pass around
context,connection can be a part of
context
etc, etc

You could use a global variable wrapped in a getter function:
def getConnection():
global connection
if connection:
return connection
connection=createConnection()
return connection

"you get a request, and open a database connection.... you close the database connection."
This is what objects are for. Create the connection object, pass it to other objects, and then close it when you're done. Globals are not appropriate. Simply pass the value around as a parameter to the other objects that are doing the work.
"Each report consist of multiple parts, each part can rely on different calculations, sometimes different parts relies in part on the same calculation.... I need to cache them"
This is what objects are for. Create a dictionary with useful calculation results and pass that around from report part to report part.
You don't need to mess with "stack variables", "static thread local" or anything like that.
Just pass ordinary variable arguments to ordinary method functions. You'll be a lot happier.
class MemoizedCalculation( object ):
pass
class Fibonacci( MemoizedCalculation ):
def __init__( self ):
self.cache= { 0: 1, 1: 1 }
def __call__( self, arg ):
if arg not in self.cache:
self.cache[arg]= self(arg-1) + self(arg-2)
return self.cache[arg]
class MathContext( object ):
def __init__( self ):
self.fibonacci = Fibonacci()
You can use it like this
>>> mc= MathContext()
>>> mc.fibonacci( 4 )
5
You can define any number of calculations and fold them all into a single container object.
If you want, you can make the MathContext into a formal Context Manager so that it work with the with statement. Add these two methods to MathContext.
def __enter__( self ):
print "Initialize"
return self
def __exit__( self, type_, value, traceback ):
print "Release"
Then you can do this.
with MathContext() as mc:
print mc.fibonacci( 4 )
At the end of the with statement, you can guaranteed that the __exit__ method was called.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Factoring out asynchronous code involving tornado.gen.Task - python

Related

Return HTTP response in the function called by view

Google App Engine NDB post_create hook

Pythonic - How to initialize a construtor with multiple arguments and validate

Python: How to enable the following API using class magic

How to put variables on the stack/context in Python

Categories

Resources