Async behaviour within a with statement - python

I am building a wrapper for an API, in order to make it more accessible to our users. The user initialises the SomeAPI object and then has access to lots of class methods, as defined below.
One of the operations I wish to support is creating what we call a "instance".
Once the instance is no longer required, it should be deleted. Therefore I would use contextlib.contextmanager like so:
class SomeAPI:
# Lots of methods
...
...
def create_instance(self, some_id):
# Create an instance for some_id
payload = {"id": some_id}
resp_url = ".../instance"
# This specific line of code may take a long time
resp = self.requests.post(resp_url, json=payload)
return resp.json()["instance_id"]
def delete_instance(self, instance_id):
# Delete a specific instance
resp_url = f".../instance/{instance_id}"
resp = self.requests.delete(resp_url)
return
#contextlib.contextmanager
def instance(self, some_id):
instance_id = self.create_instance(some_id)
try:
yield instance_id
finally:
if instance_id:
self.delete_instance(instance_id)
So then our users can write code like this
some_api = SomeApi()
# Necessary preprocessing - anywhere between 0-10 minutes
x = preprocess_1()
y = preprocess_2()
my_id = 1234
with some_api.instance(my_id):
# Once the instance is created, do some stuff with it in here
# Uses the preprocesses above
some_api.do_other_class_method_1(x)
some_api.do_other_class_method_2(y)
# Exited the with block - instance has been deleted
Which works fine. The problem is that creation of this instance always takes 60-90 seconds (as commented within the create_instance method), therefore if possible I would like to make this whole code more efficient by:
Starting the process of creating the instance (using a with block)
Only then, start the preprocessing (as commented, may take anywhere between 0-10 mins)
Once the preprocessing has been completed, use that with the instance
This order of operations would save up to 60 seconds each time, if the preprocessing happens to take more than 60 seconds. Note that there is no guarantee that the preprocessing will be longer or shorter than the creation of the instance.
I am aware of the existence of context.asynccontextmanager, but the whole async side of things does tie a knot in my brain. I have no idea how to get the order of operations right, while also maintaining the ability for the user to create and destroy the instance easily using a with statement.
Can anyone help?

Related

How to share session values created outside the context in Flask?

How do I store something created by a thread, in a session, so I can access that value later in another request?
Here is a sample:
#app.route('/one')
def one():
#copy_current_request_context
def x():
session['status'] = "done"
t = threading.Thread(target=x)
t.start()
return "One"
#app.route('/two')
def two():
status = session['status']
return "Two: {}".format(status)
In example above I store the 'status' from within the thread (I need to run the thread) inside the /one request but later, let's say 5s, I want to check for the status in another request (/two).
Also does #copy_current_request_context make a read-only (or read and discard write) copy of the session/request?
The easiest and somehow best answer is using global variables that have been described completely Here.
But if your application is going to be scaled and you need this data to be shared with other instances, you might use "Redis" as a fast In-Memory DB. More details
By using the suggestion from #soroosh-khodami I was able to achieve what I wanted. Bellow is the code that can do that.
warehouse = {} # This associative array will keep the data
#app.route('/one')
def one():
global warehouse
#copy_current_request_context
def x():
warehouse['status'] = "done"
t = threading.Thread(target=x)
t.start()
return "One"
#app.route('/two')
def two():
global warehouse
status = warehouse['status']
return "Two: {}".format(status)
Of course this is a naive implementation - as this object warehouse will be shared among all the requests and session - so a protection mechanism should be in place (Ex: 1.Store all things from a session under a certain key 2. Have a cleanup thread , etc)
Bonus: The addition to that is that is working even in non-dev environment (ex: Twisted server)
Yes, #copy_current_request_context makes a read only copy of the context (as far as I tested)

Passing an object created with SubFactory and LazyAttribute to a RelatedFactory in factory_boy

I am using factory.LazyAttribute within a SubFactory call to pass in an object, created in the factory_parent. This works fine.
But if I pass the object created to a RelatedFactory, LazyAttribute can no longer see the factory_parent and fails.
This works fine:
class OKFactory(factory.DjangoModelFactory):
class = Meta:
model = Foo
exclude = ['sub_object']
sub_object = factory.SubFactory(SubObjectFactory)
object = factory.SubFactory(ObjectFactory,
sub_object=factory.LazyAttribute(lambda obj: obj.factory_parent.sub_object))
The identical call to LazyAttribute fails here:
class ProblemFactory(OKFactory):
class = Meta:
model = Foo
exclude = ['sub_object', 'object']
sub_object = factory.SubFactory(SubObjectFactory)
object = factory.SubFactory(ObjectFactory,
sub_object=factory.LazyAttribute(lambda obj: obj.factory_parent.sub_object))
another_object = factory.RelatedFactory(AnotherObjectFactory, 'foo', object=object)
The identical LazyAttribute call can no longer see factory_parent, and can only access AnotherObject values. LazyAttribute throws the error:
AttributeError: The parameter sub_object is unknown. Evaluated attributes are...[then lists all attributes of AnotherObjectFactory]
Is there a way round this?
I can't just put sub_object=sub_object into the ObjectFactory call, ie:
sub_object = factory.SubFactory(SubObjectFactory)
object = factory.SubFactory(ObjectFactory, sub_object=sub_object)
because if I then do:
object2 = factory.SubFactory(ObjectFactory, sub_object=sub_object)
a second sub_object is created, whereas I need both objects to refer to the same sub_object. I have tried SelfAttribute to no avail.
I think you can leverage the ability to override parameters passed in to the RelatedFactory to achieve what you want.
For example, given:
class MyFactory(OKFactory):
object = factory.SubFactory(MyOtherFactory)
related = factory.RelatedFactory(YetAnotherFactory) # We want to pass object in here
If we knew what the value of object was going to be in advance, we could make it work with something like:
object = MyOtherFactory()
thing = MyFactory(object=object, related__param=object)
We can use this same naming convention to pass the object to the RelatedFactory within the main Factory:
class MyFactory(OKFactory):
class Meta:
exclude = ['object']
object = factory.SubFactory(MyOtherFactory)
related__param = factory.SelfAttribute('object')
related__otherrelated__param = factory.LazyAttribute(lambda myobject: 'admin%d_%d' % (myobject.level, myobject.level - 1))
related = factory.RelatedFactory(YetAnotherFactory) # Will be called with {'param': object, 'otherrelated__param: 'admin1_2'}
I solved this by simply calling factories within #factory.post_generation. Strictly speaking this isn't a solution to the specific problem posed, but I explain below in great detail why this ended up being a better architecture. #rhunwick's solution does genuinely pass a SubFactory(LazyAttribute('')) to RelatedFactory, however restrictions remained that meant this was not right for my situation.
We move the creation of sub_object and object from ProblemFactory to ObjectWithSubObjectsFactory (and remove the exclude clause), and add the following code to the end of ProblemFactory.
#factory.post_generation
def post(self, create, extracted, **kwargs):
if not create:
return # No IDs, so wouldn't work anyway
object = ObjectWithSubObjectsFactory()
sub_object_ids_by_code = dict((sbj.name, sbj.id) for sbj in object.subobject_set.all())
# self is the `Foo` Django object just created by the `ProblemFactory` that contains this code.
for another_obj in self.anotherobject_set.all():
if another_obj.name == 'age_in':
another_obj.attribute_id = sub_object_ids_by_code['Age']
another_obj.save()
elif another_obj.name == 'income_in':
another_obj.attribute_id = sub_object_ids_by_code['Income']
another_obj.save()
So it seems RelatedFactory calls are executed before PostGeneration calls.
The naming in this question is easier to understand, so here is the same solution code for that sample problem:
The creation of dataset, column_1 and column_2 are moved into a new factory DatasetAnd2ColumnsFactory, and the code below is then added to the end of FunctionToParameterSettingsFactory.
#factory.post_generation
def post(self, create, extracted, **kwargs):
if not create:
return
dataset = DatasetAnd2ColumnsFactory()
column_ids_by_name =
dict((column.name, column.id) for column in dataset.column_set.all())
# self is the `FunctionInstantiation` Django object just created by the `FunctionToParameterSettingsFactory` that contains this code.
for parameter_setting in self.parametersetting_set.all():
if parameter_setting.name == 'age_in':
parameter_setting.column_id = column_ids_by_name['Age']
parameter_setting.save()
elif parameter_setting.name == 'income_in':
parameter_setting.column_id = column_ids_by_name['Income']
parameter_setting.save()
I then extended this approach passing in options to configure the factory, like this:
whatever = WhateverFactory(options__an_option=True, options__another_option=True)
Then this factory code detected the options and generated the test data required (note the method is renamed to options to match the prefix on the parameter names):
#factory.post_generation
def options(self, create, not_used, **kwargs):
# The standard code as above
if kwargs.get('an_option', None):
# code for custom option 'an_option'
if kwargs.get('another_option', None):
# code for custom option 'another_option'
I then further extended this. Because my desired models contained self joins, my factory is recursive. So for a call such as:
whatever = WhateverFactory(options__an_option='xyz',
options__an_option_for_a_nested_whatever='abc')
Within #factory.post_generation I have:
class Meta:
model = Whatever
# self is the top level object being generated
#factory.post_generation
def options(self, create, not_used, **kwargs):
# This generates the nested object
nested_object = WhateverFactory(
options__an_option=kwargs.get('an_option_for_a_nested_whatever', None))
# then join nested_object to self via the self join
self.nested_whatever_id = nested_object.id
Some notes you do not need to read as to why I went with this option rather than #rhunwicks's proper solution to my question above. There were two reasons.
The thing that stopped me experimenting with it was that the order of RelatedFactory and post-generation is not reliable - apparently unrelated factors affect it, presumably a consequence of lazy evaluation. I had errors where a set of factories would suddenly stop working for no apparent reason. Once was because I renamed the variables RelatedFactory were assigned to. This sounds ridiculous but I tested it to death (and posted here) but there is no doubt - renaming the variables reliably switched the sequence of RelatedFactory and post-gen execution. I still assumed this was some oversight on my behalf until it happened again for some other reason (which I never managed to diagnose).
Secondly I found the declarative code confusing, inflexible and hard to re-factor. It isn't straightforward to pass different configurations during instantiation so that the same factory can be used for different variations of test data, meaning I had to repeat code, object needs adding to a Factory Meta.exclude list - sounds trivial but when you've pages of code generating data it was a reliable error. As a developer you'd have to pass over several factories several times to understand the control flow. Generation code would be spread between the declarative body, until you'd exhausted these tricks, then the rest would go in post-generation or get very convoluted. A common example for me is a triad of interdependent models (eg, a parent-children category structure or dataset/attributes/entities) as a foreign key of another triad of inter-dependent objects (eg, models, parameter values, etc, referring to other models' parameter values). A few of these types of structures, especially if nested, quickly become unmanagable.
I realize it isn't really in the spirit of factory_boy, but putting everything into post-generation solved all these problems. I can pass in parameters, so the same single factory serves all my composite model test data requirements and no code is repeated. The sequence of creation is easy to see immediately, obvious and completely reliable, rather than depending on confusing chains of inheritance and overriding and subject to some bug. The interactions are obvious so you don't need to digest the whole thing to add some functionality, and different areas of funtionality are grouped in the post-generation if clauses. There's no need to exclude working variables and you can refer to them for the duration of the factory code. The unit test code is simplified, because describing the functionality goes in parameter names rather than Factory class names - so you create data with a call like WhateverFactory(options__create_xyz=True, options__create_abc=True.., rather than WhateverCreateXYZCreateABC..(). This makes a nice division of responsibilities quite clean to code.

Delete Data store entries synchronously in Google App Engine

I use python in GAP and try to delete one entries in datastore by using db.delete(model_obj). I suppose this operation is undertaken synchronously, since the document tell the difference between delete() and delete_async(), but when I read the source code in the db, the delete method just simply call the delete_async, which is not match what the document says :(
So is there any one to do delete in synchronous flow?
Here is the source code in db:
def delete_async(models, **kwargs):
"""Asynchronous version of delete one or more Model instances.
Identical to db.delete() except returns an asynchronous object. Call
get_result() on the return value to block on the call.
"""
if isinstance(models, (basestring, Model, Key)):
models = [models]
else:
try:
models = iter(models)
except TypeError:
models = [models]
keys = [_coerce_to_key(v) for v in models]
return datastore.DeleteAsync(keys, **kwargs)
def delete(models, **kwargs):
"""Delete one or more Model instances.
"""
delete_async(models, **kwargs).get_result()
EDIT: From a comment, this is the original misbehaving code:
def tearDown(self):
print self.account
db.delete(self.device)
db.delete(self.account)
print Account.get_by_email(self.email, case_sensitive=False)
The result for two print statement is <Account object at 0x10d1827d0> <Account object at 0x10d1825d0>. Even two memory addresses are different but they point to the same object. If I put some latency after the delete like for loop, the object fetched is None.
The code you show for delete calls delete_async, yes, but then it calls get_result on the returned asynchronous handle, which will block until the delete actually occurs. So, delete is synchronous.
The reason the sample code you show is returning an object is that you're probably running a query to fetch the account; I presume the email is not the db.Key of the account? Normal queries are not guaranteed to return updated results immediately. To avoid seeing stale data, you either need to use an ancestor query or look up the entity by key, both of which are strongly consistent.

How can I, inside a transaction, read an entity that I just wrote?

I am using Python2.7, GAE and High Replication datastore.
I am trying to perform a transaction that first writes an entity and then reads it but the reading never finds the entity. This is a testcase I have:
class DemoTestCase(unittest.TestCase):
def setUp(self):
self.testbed = testbed.Testbed()
self.testbed.activate()
self.policy = datastore_stub_util.PseudoRandomHRConsistencyPolicy(probability=0)
self.testbed.init_datastore_v3_stub(consistency_policy=self.policy)
def tearDown(self):
self.testbed.deactivate()
def test1(self):
db.run_in_transaction(self._write)
db.run_in_transaction(self._read)
def test2(self):
db.run_in_transaction(self._write_read)
def _write(self):
self.root_key = db.Key.from_path("A", "root")
a = A(a=1, parent=self.root_key)
self.a_key = a.put()
def _read(self):
b = sample.read_a(self.a_key)
self.assertEqual(1, b.a)
self.assertEqual(1, A.all().ancestor(self.root_key).count(5))
def _write_read(self):
root_key = db.Key.from_path("A", "root")
a = A(a=1, parent=root_key)
a_key = a.put()
b = sample.read_a(a_key)
self.assertEqual(None, b)
self.assertEqual(0, A.all().ancestor(root_key).count(5))
Both testcases are passing now.
Test1 is running a transaction which performs a write. Then it's running a second transaction that performs two reads, one by key and one by ancestor-query. Reads work just fine in this case.
Test2 is running exactly the same code as test1, but this time everything gets run inside the same transaction. As you can see, reading the entity by key returns None. Doing a ancestor query returns 0 hits.
My question is: how can I, inside a transaction, read an entity that I just wrote? Or is this not possible?
Thanks.
You can't. All datastore reads inside the transaction show a snapshot of the datastore when the transaction started. Writes don't show up.
Theoretically, you shouldn't have to read, since you will have an instance of every entity your write. Use that instance.
Well, sometimes it's really helpful to re-read. A business rule may be triggered by the entity update and will need to reload it. BRs often are not aware of what triggered them and won't have immediate access to the new entity.
Don't know for Python but, in Java using Objectify, updates are made visible while in the transaction by Objectify session (transaction) cache. If there's something like a session cache in the Python persistence framework you are using, that may be a solution.

Is my resource update method thread-safe if it also creates a resource of a different type?

The Tastypie documentation states that bundles keep Tastypie more thread-safe but does not explain how and under what conditions. I have looked through the code however am not experienced enough to wrap my head around it.
I am prototyping a game that has a round object (for each round of play) and multiple states for each round (for each player's information for that round). Each player updates their own state with an answer to the rounds word-phrase. I need a mechanism that lazily creates the next round of play if it doesn't already exist. I currently trigger that round creation when a player updates their state.
If multiple players update their state (see StateResource.obj_update()) at the same time then could their attempt to create the next round collide? I am thinking that this could happen if one obj_update call checks to see if the next round exists and tries to create a next round before a different obj_update finishes creating a next round. I would solve this with some type of mutex but I'm not sure if that's necessary. I'm wondering if there is a Tastypie-way to solve this.
My code is as follows:
#models.py
class Round(models.Model):
game_uid = models.CharField(max_length=75)
word = models.CharField(max_length=75)
players = models.ManyToManyField(User)
next_round = models.OneToOneField('self',null=True,blank=True)
class PlayerRoundState(models.Model):
player = models.ForeignKey(User)
round = models.ForeignKey(Round)
answer = models.CharField(max_length=75)
#api.py
class RoundResource(ModelResource):
players = fields.ManyToManyField(UserResource, attribute='players',full=False)
states = fields.ManyToManyField('wordgame.api.StateResource',
attribute='playerroundstate_set',
full=True)
. . .
def obj_create(self, bundle, request=None, **kwargs):
bundle = super(RoundResource, self).obj_create(bundle, request,**kwargs)
bundle.obj.word = choice(words) #Gets a random word from a list
bundle.obj.round_number = 1
bundle.obj.game_uid = bundle.obj.calc_guid() #Creates a unique ID for the game
bundle.obj.save()
return bundle
class StateResource(ModelResource):
player = fields.ForeignKey(UserResource, 'player',full=False)
round = fields.ForeignKey(RoundResource, 'round')
. . .
def obj_update(self, bundle, request=None, skip_errors=False, **kwargs):
bundle = super(StateResource, self).obj_update(bundle, request,
skip_errors, **kwargs)
if bundle.obj.round.next_round is None:
new_round = Round()
new_round.word = choice(words)
new_round.round_number = bundle.obj.round.round_number + 1
new_round.game_uid = bundle.obj.round.game_uid
new_round.save()
for p in bundle.obj.round.players.all():
new_round.players.add(p)
new_round.save()
bundle.obj.round.next_round = new_round
bundle.obj.round.save()
return bundle
I think this doesn't have to do much with Tastypie.
The problem you're describing is related to the ORM and the database rather. The issue is that both requests could create new Round() under some circumstances (if they could be served in parallel and switch at any time which is the case with gunicorn and gevent) and one would become stale.
Consider the following situation:
First request arrives, retrieves the current round and "sees" that there is no "next" round. So it performs:
new_round = Round()
new_round.word = choice(words)
new_round.round_number = bundle.obj.round.round_number + 1
new_round.game_uid = bundle.obj.round.game_uid
new_round.save()
In the meantime a second request comes and (assuming it is possible in your setup) processing switches to that second request. It also retrieves the current round, and it also "sees" there is no next round, so it creates one too (a second object for the same logical round).
Then the processing switches back to the first request which performs:
for p in bundle.obj.round.players.all():
new_round.players.add(p)
new_round.save()
bundle.obj.round.next_round = new_round
bundle.obj.round.save()
So now there "is" the next round. First request is processed and everything looks nice.
But the second request has yet to finish, and it performs the very same operation, overwriting the current round object.
The result is that you have one stale instance (the Round created by the first request) and that the first group of players uses a different Round than the second group.
This leads to the inconsistent state in the database. So in such case your resource update method is NOT thread-safe.
One solution to this would be to use select_for_update for retrieving the current round from the database. See Django Docs. If you used that, the second and consecutive requests would wait until you modify the current round in first request and only then retrieve it from the database. The result would be that they would already "see" the next round and not attempt to create it. Of course you'd have to make sure that the whole update constitutes one transaction.
The way to "use" it would be to override obj_get() method in the StateResource resource and instead of:
base_object_list = self.get_object_list(request).filter(**kwargs)
use (haven't tested):
base_object_list = self.get_object_list(request).select_for_update().filter(**kwargs)
Of course this is not the only solution but others would probably involve redesigning your application so this might just be less involving.

Categories

Resources