This question already has answers here:
Are global variables thread-safe in Flask? How do I share data between requests?
(4 answers)
Closed 2 years ago.
I am trying to save a cache dictionary in my flask application.
As far as I understand it, the Application Context, in particular the flask.g object should be used for this.
Setup:
import flask as f
app = f.Flask(__name__)
Now if I do:
with app.app_context():
f.g.foo = "bar"
print f.g.foo
It prints bar.
Continuing with the following:
with app.app_context():
print f.g.foo
AttributeError: '_AppCtxGlobals' object has no attribute 'foo'
I don’t understand it and the docs are not helping at all. If I read them correctly the state should have been preserved.
Another idea I had was to simply use module-wide variables:
cache = {}
def some_function():
cache['foo'] = "bar"
But it seems like these get reset with every request.
How to do this correctly?
Edit: Flask 10.1
Based on your question, I think you're confused about the definition of "global".
In a stock Flask setup, you have a Flask server with multiple threads and potentially multiple processes handling requests. Suppose you had a stock global variable like "itemlist = []", and you wanted to keep adding to it in every request - say, every time someone made a POST request to an endpoint. This is totally possible in theory and practice. It's also a really bad idea.
The problem is that you can't easily control which threads and processes "win" - the list could up in a really wonky order, or get corrupted entirely. So now you need to talk about locks, mutexs, and other primitives. This is hard and annoying.
You should keep the webserver itself as stateless as possible. Each request should be totally independent and not share any state in the server. Instead, use a database or caching layer which will handle the state for you. This seems more complicated but is actually simpler in practice. Check out SQLite for example ; it's pretty simple.
To address the 'flask.g' object, that is a global object on a per request basis.
http://flask.pocoo.org/docs/api/#flask.g
It's "wiped clean" between requests and cannot be used to share state between them.
I've done something similar to your "module-wide variables" idea that I use in a flask server that I use to integrate two pieces of software where I know I will only ever have one simultaneous "user" (being the sender software).
My app.py looks like this:
from flask import Flask
from flask.json import jsonify
app = Flask(__name__)
cache = {}
#app.route("/create")
def create():
cache['foo'] = 0
return jsonify(cache['foo'])
#app.route("/increment")
def increment():
cache['foo'] = cache['foo'] + 1
return jsonify(cache['foo'])
#app.route("/read")
def read():
return jsonify(cache['foo'])
if __name__ == '__main__':
app.run()
You can test it like this:
import requests
print(requests.get('http://127.0.0.1:5000/create').json())
print(requests.get('http://127.0.0.1:5000/increment').json())
print(requests.get('http://127.0.0.1:5000/increment').json())
print(requests.get('http://127.0.0.1:5000/read').json())
print(requests.get('http://127.0.0.1:5000/increment').json())
print(requests.get('http://127.0.0.1:5000/create').json())
print(requests.get('http://127.0.0.1:5000/read').json())
Outputs:
0
1
2
2
3
0
0
Use with caution as I expect this to not behave in a proper multi user web server environment.
This line
with app.app_context():
f.g.foo = "bar"
Since you are using the "with" keyword, once this loop is executed, it calls the __exit__ method of the AppContext class. See this. So the 'foo' is popped out once done. Thats why you don't have it available again. You can instead try:
ctx = app.app_context()
f.g.foo = 'bar'
ctx.push()
Until you call the following, g.foo should be available
ctx.pop()
I am howver not sure if you want to use this for the purpose of caching.
Related
I have a Django application (using uWSGI and nginx, and atomic views) with a view that creates new items of a model in the DB (postgres). Before creating anything the view checks if the record doesn't already exist in the DB, something like:
...
try:
newfile = DataFile.objects.get(md5=request.POST['md5'])
except DataFile.DoesNotExist:
newfile = DataFile.objects.create(md5=request.POST['md5'], filename=request.POST['filename'])
return JsonResponse({'file_id': newfile.pk})
I noticed sometimes this doesn't work, and I get duplicates in the DB (which is easily solved with a unique constraint). I'm not sure why this happens, if there is caching or race conditions, but I'd like to at least cover the behaviour with a test in the Django test framework.However, I do not know how to simulate two parallel requests. Is there a way to fire the next request while not waiting for the first, built into the framework, or should one use multiprocessing or similar for this?
I suggest you use an async loop to trigger 2 quite simultaneous request.
Example:
async def test_case(request):
try:
newfile = DataFile.objects.get(md5=request.POST['md5'])
except DataFile.DoesNotExist:
newfile = DataFile.objects.create(md5=request.POST['md5'], filename=request.POST['filename'])
return JsonResponse({'file_id': newfile.pk})
async def simult(request):
t_case_0 = await test_case(request)
t_case_1 = await test_case(request)
asyncio.run(simult(request))
I'm new to both flask and python. I've got an application I'm working on to hold weather data. I'm allowing for both get and post commands to come into my flask application. unfortunately, the automated calls for my API are not always coming back with the proper results. I'm currently storing my data in a global variable when a post command is called, the new data is appended to my existing data. Unfortunately sometimes when the get is called, it is not receiving the most up to date version of my global data variable. I believe that the issue is that the change is not being passed up from the post function to the global variable before the get is called because I can run the get and the proper result comes back.
weatherData = [filed with data read from csv on initialization]
class FullHistory(Resource):
def get(self):
ret = [];
for row in weatherData:
val = row['DATE']
ret.append({"DATE":str(val)})
return ret
def post(self):
global weatherData
newWeatherData = weatherData
args = parser.parse_args()
newVal = int(args['DATE'])
newWeatherData.append({'DATE':int(args['DATE']),'TMAX':float(args['TMAX']),'TMIN':float(args['TMIN'])})
weatherData = newWeatherData
#time.sleep(5)
return {"DATE":str(newVal)},201
class SelectHistory(Resource):
def get(self, date_id):
val = int(date_id)
bVal = False
#time.sleep(5)
global weatherData
for row in weatherData:
if(row['DATE'] == val):
wd = row
bVal = True
break
if bVal:
return {"DATE":str(wd['DATE']),"TMAX":float(wd['TMAX']),"TMIN":float(wd['TMIN'])}
else:
return "HTTP Error code 404",404
def delete(self, date_id):
val = int(date_id)
wdIter = None
for row in weatherData:
if(row['DATE'] == val):
wdIter = row
break
if wdIter != None:
weatherData.remove(wdIter)
return {"DATE":str(val)},204
else:
return "HTTP Error code 404",404
Is there any way I can assure that my global variable is up to date or make my API wait to return until I'm sure that the update has been passed along? This was supposed to be a simple application. I would really rather not have to learn how to use threads in python just yet. I've made sure that my calls get request is not starting until after the post has given a response. I know that one workaround was to use sleep to delay my responses, I would rather understand why my update isn't occurring immediately in the first place.
I believe your problem is the application context. As stated here:
The application context is created and destroyed as necessary. It
never moves between threads and it will not be shared between
requests. As such it is the perfect place to store database connection
information and other things. The internal stack object is called
flask._app_ctx_stack. Extensions are free to store additional
information on the topmost level, assuming they pick a sufficiently
unique name and should put their information there, instead of on the
flask.g object which is reserved for user code.
Though it says you can store data at the "topmost level," it's not reliable, and if you extrapolate your project to use worker processes with uWSGI, for instance, you'll need persistence to share data between threads regardless. You should be using a database, redis, or at very least updating your .csv file each time you mutate your data.
I'm trying to access variables that are being passed from the client (iOS; Swift) to the server on a Flask-SocketIO connection on the connect action. Let me explain. When you want to do a random action you have something like this on the server which includes a callback (see data in the code below):
#socketio.on('custom action', namespace = '/mynamespace')
def handle_custom_action(data):
print data
There are some preset actions (like connect) and apparently connect does not have any callback when it's called so the client cannot send any data on the connect action:
#socketio.on('connect', namespace = '/mynamespace')
def handle_connection(data):
print data # nothing gets printed
I looked into the code a bit deeper and found this. The definition of the on function is:
def on(self, message, namespace=None):
And then within that function (I'm omitting a bit of code to get to the point):
if message == 'connect':
ret = handler()
else:
ret = handler(*args)
I could be wrong but it appears that code explicitly does not return anything back on connect and I'm not sure why? I've found some evidence that this is possible in node.js (I will update this with proper links when I find them) so I'm wondering why this isn't possible in the Flask-SocketIO library or whether I'm just misunderstanding what I'm looking at (and if so, how to get those parameters).
Thanks!
Update:
I did find a way to access connection parameters but it doesn't seem like the 'right' way. I'm using the global request and splitting the GET parameters / query string that come through on the request:
data = dict(item.split("=") for item in request.event["args"][0]["QUERY_STRING"].split("&"))
OR as two lines:
data = request.event["args"][0]["QUERY_STRING"].split("&"))
data = dict(item.split("=") for item in data.split("&"))
Flask-SocketIO adds event which connects a dictionary with keys of message and args and within args is the QUERY_STRING which I then split add turn into a dictionary. This works fine but it doesn't necessarily answer the original question as to why there is no callback?
Here is an example of the iOS connection params being passed:
let connectParams = SocketIOClientOption.connectParams(["user_id" : Int(user.userId)!, "connection_id" : self.socketConnectionId])
self.socket = SocketIOClient(socketURL: URL(string: "http://www.myurl.com")!, config: [.nsp("/namespace"), .forceWebsockets(true), .forceNew(true), connectParams])
I have python/django code hosted at dotcloud and redhat openshift. For handling different user, I use token and save it in dictionary. But when I get the value from dict, it sometimes throws an error(key value error).
import threading
thread_queue = {}
def download(request):
dl_val = request.POST["input1"]
client_token = str(request.POST["pagecookie"])
# save client token as keys and thread object as value in dictionary
thread_queue[client_token] = DownloadThread(dl_val,client_token)
thread_queue[client_token].start()
return render_to_response("progress.html",
{ "dl_val" : dl_val, "token" : client_token })
The code below is executed in 1 second intervals via javascript xmlhttprequest to server.
It will check variable inside another thread and return the value to user page.
def downloadProgress(request, token):
# sometimes i use this for check the content of dict
#resp = HttpResponse("thread_queue = "+str(thread_queue))
#return resp
prog, total = thread_queue[str(token)].getValue() # problematic line !
if prog == 0:
# prevent division by zero
return HttpResponse("0")
percent = float(prog) / float(total)
percent = round(percent*100, 2)
if percent >= 100:
try:
f_name = thread_queue[token].getFileName()[1]
except:
downloadProgress(request,token)
resp = HttpResponse('<a href="http://'+request.META['HTTP_HOST']+
'/dl/'+token+'/">'+f_name+'</a><br />')
return resp
else:
return HttpResponse(str(percent))
After testing for several days, it sometimes return :
thread_queue = {}
It sometimes succeeds :
thread_queue = {'wFVdMDF9a2qSQCAXi7za': , 'EVukb7QdNdDgCf2ZtVSw': , 'C7pkqYRvRadTfEce5j2b': , '2xPFhR6wm9bs9BEQNfdd': }
I never get this result when I'm running django locally via manage.py runserver, and accessing it with google chrome, but when I upload it to dotcloud or openshift, it always gives the above problem.
My question :
How can I solve this problem ?
Does dotcloud and openshift limit their python cpu usage ?
Or is the problem inside the python dictionary ?
Thank You.
dotCloud has 4 worker processes by default for the python service. When you run the dev server locally, you are only running one process. Like #martijn said, your issue is related to the fact that your dict isn't going to be shared between these processes.
To fix this issue, you could use something like redis or memcached to store this information instead. If you need a more long term storage solution, then using a database is probably better suited.
dotCloud does not limit the CPU usage, The CPU is shared amongst others on the same host, and allows bursting, but in the end everyone has the same amount of CPU.
Looking at your code, you should check to make sure there is a value in the dict before you access it, or at a minimum surround the code with a try except block, to handle the case when the data isn't there.
str_token = str(token)
if str_token in thread_queue:
prog, total = thread_queue[str_token].getValue() # problematic line !
else:
# value isn't there, do something else
Presumably dotcloud and openshift run multiple processes of your code; the dict is not going to be shared between these processes.
Note that that also means the extra processes will not have access to your extra tread either.
Use an external database for this kind of information instead. For long-running asynchronous jobs like these you also need to run them in a separate worker process. Look at Celery for an all-in-one solution for asynchronous job handling, for example.
With the new release of GAE 1.5.0, we now have an easy way to do async datastore calls. Are we required to call get_result() after calling
'put_async'?
For example, if I have an model called MyLogData, can I just call:
put_async(MyLogData(text="My Text"))
right before my handler returns without calling the matching get_result()?
Does GAE automatically block on any pending calls before sending the result to the client?
Note that I don't really care to handle error conditions. i.e. I don't mind if some of these puts fail.
I don't think there is any sure way to know if get_result() is required unless someone on the GAE team verifies this, but I think it's not needed. Here is how I tested it.
I wrote a simple handler:
class DB_TempTestModel(db.Model):
data = db.BlobProperty()
class MyHandler(webapp.RequestHandler):
def get(self):
starttime = datetime.datetime.now()
lots_of_data = ' '*500000
if self.request.get('a') == '1':
db.put(DB_TempTestModel(data=lots_of_data))
db.put(DB_TempTestModel(data=lots_of_data))
db.put(DB_TempTestModel(data=lots_of_data))
db.put(DB_TempTestModel(data=lots_of_data))
if self.request.get('a') == '2':
db.put_async(DB_TempTestModel(data=lots_of_data))
db.put_async(DB_TempTestModel(data=lots_of_data))
db.put_async(DB_TempTestModel(data=lots_of_data))
db.put_async(DB_TempTestModel(data=lots_of_data))
self.response.out.write(str(datetime.datetime.now()-starttime))
I ran it a bunch of times on a High Replication Application.
The data was always there, making me believe that unless there is a failure in the datastore side of things (unlikely), it's gonna be written.
Here's the interesting part. When the data is written with put_async() (?a=2), the amount of time (to process the request) was on average about 2 to 3 times as fast as put()(?a=1) (not a very scientific test - just eyeballing it).
But the cpu_ms and api_cpu_ms were the same for both ?a=1 and ?a=2.
From the logs:
ms=440 cpu_ms=627 api_cpu_ms=580 cpm_usd=0.036244
vs
ms=149 cpu_ms=627 api_cpu_ms=580 cpm_usd=0.036244
On the client side, looking at the network latency of the requests, it showed the same results, i.e. `?a=2' requests were at least 2 times faster. Definitely a win on the client side... but it seems to not have any gain on the server side.
Anyone on the GAE team care to comment?
db.put_async works fine without get_result when deployed (in fire-and-forget style) but in locally it won't take action until get_result gets called more context
I dunno, but this works:
import datetime
from google.appengine.api import urlfetch
def main():
rpc = urlfetch.create_rpc()
urlfetch.make_fetch_call(rpc, "some://artificially/slow.url")
print "Content-type: text/plain"
print
print str(datetime.datetime.now())
if __name__ == '__main__':
main()
The remote URL sleeps 3 seconds and then sends me an email. The App Engine handler returns immediately, and the remote URL completes as expected. Since both services abstract the same underlying RPC framework, I would guess the datastore behaves similarly.
Good question, though. Perhaps Nick or another Googler can answer definitively.