How do I store something created by a thread, in a session, so I can access that value later in another request?
Here is a sample:
#app.route('/one')
def one():
#copy_current_request_context
def x():
session['status'] = "done"
t = threading.Thread(target=x)
t.start()
return "One"
#app.route('/two')
def two():
status = session['status']
return "Two: {}".format(status)
In example above I store the 'status' from within the thread (I need to run the thread) inside the /one request but later, let's say 5s, I want to check for the status in another request (/two).
Also does #copy_current_request_context make a read-only (or read and discard write) copy of the session/request?
The easiest and somehow best answer is using global variables that have been described completely Here.
But if your application is going to be scaled and you need this data to be shared with other instances, you might use "Redis" as a fast In-Memory DB. More details
By using the suggestion from #soroosh-khodami I was able to achieve what I wanted. Bellow is the code that can do that.
warehouse = {} # This associative array will keep the data
#app.route('/one')
def one():
global warehouse
#copy_current_request_context
def x():
warehouse['status'] = "done"
t = threading.Thread(target=x)
t.start()
return "One"
#app.route('/two')
def two():
global warehouse
status = warehouse['status']
return "Two: {}".format(status)
Of course this is a naive implementation - as this object warehouse will be shared among all the requests and session - so a protection mechanism should be in place (Ex: 1.Store all things from a session under a certain key 2. Have a cleanup thread , etc)
Bonus: The addition to that is that is working even in non-dev environment (ex: Twisted server)
Yes, #copy_current_request_context makes a read only copy of the context (as far as I tested)
Related
I am building a wrapper for an API, in order to make it more accessible to our users. The user initialises the SomeAPI object and then has access to lots of class methods, as defined below.
One of the operations I wish to support is creating what we call a "instance".
Once the instance is no longer required, it should be deleted. Therefore I would use contextlib.contextmanager like so:
class SomeAPI:
# Lots of methods
...
...
def create_instance(self, some_id):
# Create an instance for some_id
payload = {"id": some_id}
resp_url = ".../instance"
# This specific line of code may take a long time
resp = self.requests.post(resp_url, json=payload)
return resp.json()["instance_id"]
def delete_instance(self, instance_id):
# Delete a specific instance
resp_url = f".../instance/{instance_id}"
resp = self.requests.delete(resp_url)
return
#contextlib.contextmanager
def instance(self, some_id):
instance_id = self.create_instance(some_id)
try:
yield instance_id
finally:
if instance_id:
self.delete_instance(instance_id)
So then our users can write code like this
some_api = SomeApi()
# Necessary preprocessing - anywhere between 0-10 minutes
x = preprocess_1()
y = preprocess_2()
my_id = 1234
with some_api.instance(my_id):
# Once the instance is created, do some stuff with it in here
# Uses the preprocesses above
some_api.do_other_class_method_1(x)
some_api.do_other_class_method_2(y)
# Exited the with block - instance has been deleted
Which works fine. The problem is that creation of this instance always takes 60-90 seconds (as commented within the create_instance method), therefore if possible I would like to make this whole code more efficient by:
Starting the process of creating the instance (using a with block)
Only then, start the preprocessing (as commented, may take anywhere between 0-10 mins)
Once the preprocessing has been completed, use that with the instance
This order of operations would save up to 60 seconds each time, if the preprocessing happens to take more than 60 seconds. Note that there is no guarantee that the preprocessing will be longer or shorter than the creation of the instance.
I am aware of the existence of context.asynccontextmanager, but the whole async side of things does tie a knot in my brain. I have no idea how to get the order of operations right, while also maintaining the ability for the user to create and destroy the instance easily using a with statement.
Can anyone help?
I have Python script in my WebApp2 application in Google App Engine:
x = 0
class MyHandler(webapp2.RequestHandler):
def get(self):
global x
x = x + 1
print x
With each refresh of the page (or connect new user), the count increments higher. Python does not kick off a new process on each request (but I expected it). How could I handle scenarios where I'd want a global variable that persisted only for the lifetime of the request? Can I use an instance variable and how exactly?
The behaviour your seeing is expected. New instances are not started for every request.
Use the request object, env object or a thread local variable to store information that you want to have accessible any where in your code for the life of the request. (Environ is recreated in each request, so it's safe).
See Is threading.local() a safe way to store variables for a single request in Google AppEngine? for a discussion on using thread local storage.
Here is an example of storing the local request object to store specific information for the life of a request. All of this code must be inside your handler. All of the parts are documented in webapp2 docs. By the way I don't use webapp2, so this isn't tested. (I use pyramid/bobo and this model for performing request level caching).
class MyHandler(webapp2.RequestHandler):
def get(self):
req = webapp2.get_request()
# you have request in self, however this is to show how you get a
# request object anywhere in your code.
key = "Some Key"
if req:
# getting some store value from the environ of request (See WebOb docs)
someval = req.environ.get(key,None)
if someval :
# do something
# and setting
if req:
req.environ[key] = 'some value'
Doing it this way there is the limitation that environ['key'] value must be a string.
Read the Webob documents how to store arbitrary values in the request object. http://docs.webob.org/en/stable/reference.html#ad-hoc-attributes -
req.environ['webob.adhoc_attrs']
{'some_attr': 'blah blah blah'}
Also if you have a read of the webapp2 Request object docs, there is a registry that you can use to store information - http://webapp-improved.appspot.com/api/webapp2.html#webapp2.Request
Note, any variable you define outside a request handler is essentially cached, available for the instances lifetime. This is where you are going wrong.
To understand how/why app level caching works - and why your first attempt doesn't do what you want have a look at https://cloud.google.com/appengine/docs/python/requests#Python_App_caching
I use python in GAP and try to delete one entries in datastore by using db.delete(model_obj). I suppose this operation is undertaken synchronously, since the document tell the difference between delete() and delete_async(), but when I read the source code in the db, the delete method just simply call the delete_async, which is not match what the document says :(
So is there any one to do delete in synchronous flow?
Here is the source code in db:
def delete_async(models, **kwargs):
"""Asynchronous version of delete one or more Model instances.
Identical to db.delete() except returns an asynchronous object. Call
get_result() on the return value to block on the call.
"""
if isinstance(models, (basestring, Model, Key)):
models = [models]
else:
try:
models = iter(models)
except TypeError:
models = [models]
keys = [_coerce_to_key(v) for v in models]
return datastore.DeleteAsync(keys, **kwargs)
def delete(models, **kwargs):
"""Delete one or more Model instances.
"""
delete_async(models, **kwargs).get_result()
EDIT: From a comment, this is the original misbehaving code:
def tearDown(self):
print self.account
db.delete(self.device)
db.delete(self.account)
print Account.get_by_email(self.email, case_sensitive=False)
The result for two print statement is <Account object at 0x10d1827d0> <Account object at 0x10d1825d0>. Even two memory addresses are different but they point to the same object. If I put some latency after the delete like for loop, the object fetched is None.
The code you show for delete calls delete_async, yes, but then it calls get_result on the returned asynchronous handle, which will block until the delete actually occurs. So, delete is synchronous.
The reason the sample code you show is returning an object is that you're probably running a query to fetch the account; I presume the email is not the db.Key of the account? Normal queries are not guaranteed to return updated results immediately. To avoid seeing stale data, you either need to use an ancestor query or look up the entity by key, both of which are strongly consistent.
I am using Python2.7, GAE and High Replication datastore.
I am trying to perform a transaction that first writes an entity and then reads it but the reading never finds the entity. This is a testcase I have:
class DemoTestCase(unittest.TestCase):
def setUp(self):
self.testbed = testbed.Testbed()
self.testbed.activate()
self.policy = datastore_stub_util.PseudoRandomHRConsistencyPolicy(probability=0)
self.testbed.init_datastore_v3_stub(consistency_policy=self.policy)
def tearDown(self):
self.testbed.deactivate()
def test1(self):
db.run_in_transaction(self._write)
db.run_in_transaction(self._read)
def test2(self):
db.run_in_transaction(self._write_read)
def _write(self):
self.root_key = db.Key.from_path("A", "root")
a = A(a=1, parent=self.root_key)
self.a_key = a.put()
def _read(self):
b = sample.read_a(self.a_key)
self.assertEqual(1, b.a)
self.assertEqual(1, A.all().ancestor(self.root_key).count(5))
def _write_read(self):
root_key = db.Key.from_path("A", "root")
a = A(a=1, parent=root_key)
a_key = a.put()
b = sample.read_a(a_key)
self.assertEqual(None, b)
self.assertEqual(0, A.all().ancestor(root_key).count(5))
Both testcases are passing now.
Test1 is running a transaction which performs a write. Then it's running a second transaction that performs two reads, one by key and one by ancestor-query. Reads work just fine in this case.
Test2 is running exactly the same code as test1, but this time everything gets run inside the same transaction. As you can see, reading the entity by key returns None. Doing a ancestor query returns 0 hits.
My question is: how can I, inside a transaction, read an entity that I just wrote? Or is this not possible?
Thanks.
You can't. All datastore reads inside the transaction show a snapshot of the datastore when the transaction started. Writes don't show up.
Theoretically, you shouldn't have to read, since you will have an instance of every entity your write. Use that instance.
Well, sometimes it's really helpful to re-read. A business rule may be triggered by the entity update and will need to reload it. BRs often are not aware of what triggered them and won't have immediate access to the new entity.
Don't know for Python but, in Java using Objectify, updates are made visible while in the transaction by Objectify session (transaction) cache. If there's something like a session cache in the Python persistence framework you are using, that may be a solution.
I'm making a server that can let clients upload and download data of different models. Is there some elegant way handle the requests?
More precisely, I don't want to do something like this,
app = webapp.WSGIApplication([
('/my_upload_and_download_url/ModelA/(.*)', MyRequestHandlerForA),
('/my_upload_and_download_url/ModelB/(.*)', MyRequestHandlerForB),
('/my_upload_and_download_url/ModelC/(.*)', MyRequestHandlerForC),
])
run_wsgi_app(app)
since what I do inside the handler would all be the same. For example,
class MyRequestHandlerForX(webapp.RequestHandler):
def get(self, key=None):
# return the instance with the designated key
def post(self, key=None):
# create/get the model instance
# iterate through the property list of the instance and set the values
the only difference among the handlers is to create instance for different models. The urls are alike, and the handlers are almost the same.
I checked this post about redirect requests to other handlers, and I've also read some methods to create an instance by a class name; but I think neither of them is good.
Anyone has a good solution?
p.s. This is my first post here. If there is anything inappropriate please tell me, thanks.
How you do this depends largely on the details of your code in the request handler. You can do a fairly generic one like this:
class ModelHandler(webapp.RequestHandler):
def get(self, kind, key):
model = db.class_for_kind(kind)
instance = model.get(key)
# Do something with the instance - eg, print it out
def post(self, kind, key):
model = db.class_for_kind(kind)
instance = model.create_from_request(self.request)
application = webapp.WSGIApplication([
('/foo/([^/]+)/([^/]+)', ModelHandler),
])
def main():
run_wsgi_app(application)
if __name__ == '__main__':
main()
This assumes you define a 'create_from_request' class method on each model class; you probably don't want to do it exactly this way, as it tightly couples model definitions with the forms used to input them; instead, you probably want to store a mapping of kind name or class to handler function, or do your forms and creation entirely automatically by reflecting on the properties of the class. Since you haven't specified what it is about doing this you're unsure about, it's hard to be more specific.
Also note the inclusion of a main() and other boilerplate above; while it will work the way you've pasted it, adding a main is substantially more efficient, as it allows the App Engine runtime to avoid having to evaluate your module on every request.
In your case I'd probably just have everything hit the same url path, and put the specifics in the GET parameters, like /my_upload_and_download_url?model=modelA.
You can also use webapp2 (http://webapp-improved.appspot.com/guide/app.html) which has a bunch of url routing support.
You could parse out the url path and do a look up, like this:
import urlparse
model_lookup = {'ModelA':ModelA,'ModelB':ModelB, 'ModelC':ModelC}
class MyRequestHandler(webapp.RequestHandler):
def get(self):
url = urlparse.urlparse(self.request.uri)
path_model = url.path.replace('/my_upload_and_download_url/','')
model = model_lookup[path_model]
...
Which allows you to use the same class for each path:
app = webapp.WSGIApplication([
('/my_upload_and_download_url/ModelA/(.*)', MyRequestHandler),
('/my_upload_and_download_url/ModelB/(.*)', MyRequestHandler),
('/my_upload_and_download_url/ModelC/(.*)', MyRequestHandler),
])
run_wsgi_app(app)