How to simulate two parallel requests in django testing framework - python

I have a Django application (using uWSGI and nginx, and atomic views) with a view that creates new items of a model in the DB (postgres). Before creating anything the view checks if the record doesn't already exist in the DB, something like:
...
try:
newfile = DataFile.objects.get(md5=request.POST['md5'])
except DataFile.DoesNotExist:
newfile = DataFile.objects.create(md5=request.POST['md5'], filename=request.POST['filename'])
return JsonResponse({'file_id': newfile.pk})
I noticed sometimes this doesn't work, and I get duplicates in the DB (which is easily solved with a unique constraint). I'm not sure why this happens, if there is caching or race conditions, but I'd like to at least cover the behaviour with a test in the Django test framework.However, I do not know how to simulate two parallel requests. Is there a way to fire the next request while not waiting for the first, built into the framework, or should one use multiprocessing or similar for this?

I suggest you use an async loop to trigger 2 quite simultaneous request.
Example:
async def test_case(request):
try:
newfile = DataFile.objects.get(md5=request.POST['md5'])
except DataFile.DoesNotExist:
newfile = DataFile.objects.create(md5=request.POST['md5'], filename=request.POST['filename'])
return JsonResponse({'file_id': newfile.pk})
async def simult(request):
t_case_0 = await test_case(request)
t_case_1 = await test_case(request)
asyncio.run(simult(request))

Related

Best approach to multiple websocket client connections in Python?

I appreciate that the question I am about to ask is rather broad but, as a newcomer to Python, I am struggling to find the [best] way of doing something which would be trivial in, say, Node.js, and pretty trivial in other environments such as C#.
Let's say that there is a warehouse full of stuff. And let's say that there is a websocket interface onto that warehouse with two characteristics: on client connection it pumps out a full list of the warehouse's current inventory, and it then follows that up with further streaming updates when the inventory changes.
The web is full of examples of how, in Python, you connect to the warehouse and respond to changes in its state. But...
What if I want to connect to two warehouses and do something based on the combined information retrieved separately from each one? And what if I want to do things based on factors such as time, rather than solely being driven by inventory changes and incoming websocket messages?
In all the examples I've seen - and it's beginning to feel like hundreds - there is, somewhere, in some form, a run() or a run_forever() or a run_until_complete() etc. In other words, the I/O may be asynchronous, but there is always a massive blocking operation in the code, and always two fundamental assumptions which don't fit my case: that there will only be one websocket connection, and that all processing will be driven by events sent out by the [single] websocket server.
It's very unclear to me whether the answer to my question is some sort of use of multiple event loops, or of multiple threads, or something else.
To date, experimenting with Python has felt rather like being on the penthouse floor, admiring the quirky but undeniably elegant decor. But then you get in the elevator, press the button marked "parallelism" or "concurrency", and the evelator goes into freefall, eventually depositing you in a basement filled with some pretty ugly and steaming pipes.
... Returning from flowery metaphors back to the technical, the key thing I'm struggling with is the Python equivalent of, say, Node.js code which could be as trivially simple as the following example [left inelegant for simplicity]:
var aggregateState = { ... some sort of representation of combined state ... };
var socket1 = new WebSocket("wss://warehouse1");
socket1.on("message", OnUpdateFromWarehouse);
var socket2 = new WebSocket("wss://warehouse2");
socket2.on("message", OnUpdateFromWarehouse);
function OnUpdateFromWarehouse(message)
{
... Take the information and use it to update aggregate state from both warehouses ...
}
Answering my own question, in the hope that it may help other Python newcomers... asyncio seems to be the way to go (though there are gotchas such as the alarming ease with which you can deadlock the event loop).
Assuming the use of an asyncio-friendly websocket module such as websockets, what seems to work is a framework along the following lines - shorn, for simplicity, of logic such as reconnects. (The premise remains a warehouse which sends an initial list of its full inventory, and then sends updates to that initial state.)
class Warehouse:
def __init__(self, warehouse_url):
self.warehouse_url = warehouse_url
self.inventory = {} # Some description of the warehouse's inventory
async def destroy():
if (self.websocket.open):
self.websocket.close() # Terminates any recv() in wait_for_incoming()
await self.incoming_message_task # keep asyncio happy by awaiting the "background" task
async def start(self):
try:
# Connect to the warehouse
self.websocket = await connect(self.warehouse_url)
# Get its initial message which describes its full state
initial_inventory = await self.websocket.recv()
# Store the initial inventory
process_initial_inventory(initial_inventory)
# Set up a "background" task for further streaming reads of the web socket
self.incoming_message_task = asyncio.create_task(self.wait_for_incoming())
# Done
return True
except:
# Connection failed (or some unexpected error)
return False
async def wait_for_incoming(self):
while self.websocket.open:
try:
update_message = await self.websocket.recv()
asyncio.create_task(self.process_update_message(update_message))
except:
# Presumably, socket closure
pass
def process_initial_inventory(self, initial_inventory_message):
... Process initial_inventory_message into self.inventory ...
async def process_update_message(self, update_message):
... Merge update_message into self.inventory ...
... And fire some sort of event so that the object's
... creator can detect the change. There seems to be no ...
... consensus about what is a pythonic way of implementing events, ...
... so I'll declare that - potentially trivial - element as out-of-scope ...
After completing the initial connection logic, one key thing is setting up a "background" task which repeatedly reads further update messages coming in over the websocket. The code above doesn't include any firing of events, but there are all sorts of ways in which process_update_message() can/could do this (many of them trivially simple), allowing the object's creator to deal with notifications whenever and however it sees fit. The streaming messages will continue to be received, and any events will be continued to be fired, for as long as the object's creator continues to play nicely with asyncio and to participate in co-operative multitasking.
With that in place, a connection can be established along the following lines:
async def main():
warehouse1 = Warehouse("wss://warehouse1")
if await warehouse1.start():
... Connection succeeded. Update messages will now be processed
in the "background" provided that other users of the event loop
yield in some way ...
else:
... Connection failed ...
asyncio.run(main())
Multiple warehouses can be initiated in several ways, including doing a create_task(warehouse.start()) on each one and then doing a gather on the tasks to ensure/check that they're all okay.
When it's time to quit, to keep asyncio happy, and to stop it complaining about orphaned tasks, and to allow everything to shut down nicely, it's necessary to call destroy() on each warehouse.
But there's one common element which this doesn't cover. Extending the original premise above, let's say that the warehouse also accepts requests from our websocket client, such as "ship X to Y". The success/failure responses to these requests will come in alongside the general update messages; it generally won't be possible to guarantee that the first recv() after the send() of a request will be the response to that request. This complicates process_update_message().
The best answer I've found may or may not be considered "pythonic" because it uses a Future in a way which is strongly analogous to a TaskCompletionSource in .NET.
Let's invent a couple of implementation details; any real-world scenario is likely to look something like this:
We can supply a request_id when submitting an instruction to the warehouse
The success/failure response from the warehouse repeats the request_id back to us (and thus also distinguishing between command-response messages versus inventory-update messages)
The first step is to have a dictionary which maps the ID of pending, in-progress requests to Future objects:
def __init__(self, warehouse_url):
...
self.pending_requests = {}
The definition of a coroutine which sends a request then looks something like this:
async def send_request(self, some_request_definition)
# Allocate a unique ID for the request
request_id = <some unique request id>
# Create a Future for the pending request
request_future = asyncio.Future()
# Store the map of the ID -> Future in the dictionary of pending requests
self.pending_requests[request_id] = request_future
# Build a request message to send to the server, somehow including the request_id
request_msg = <some request definition, including the request_id>
# Send the message
await self.websocket.send(request_msg)
# Wait for the future to complete - we're now asynchronously awaiting
# activity in a separate function
await asyncio.wait_for(command_future, timeout = None)
# Return the result of the Future as the return value of send_request()
return request_future.result()
A caller can create a request and wait for its asynchronous response using something like the following:
some_result = await warehouse.send_request(<some request def>)
The key to making this all work is then to modify and extend process_update_message() to do the following:
Distinguish between request responses versus inventory updates
For the former, extract the request ID (which our invented scenario says gets repeated back to us)
Look up the pending Future for the request
Do a set_result() on it (whose value can be anything depending on what the server's response says). This releases send_request() and causes the await from it to be resolved.
For example:
async def process_update_message(self, update_message):
if <some test that update_message is a request response>:
request_id = <extract the request ID repeated back in update_message>
# Get the Future for this request ID
request_future = self.pending_requests[request_id]
# Create some sort of return value for send_request() based on the response
return_value = <some result of the request>
# Complete the Future, causing send_request() to return
request_future.set_result(return_value)
else:
... handle inventory updates as before ...
I've not used sockets with asyncio, but you're likely just looking for asyncio's open_connection
async def socket_activity(address, callback):
reader, _ = await asyncio.open_connection(address)
while True:
message = await reader.read()
if not message: # empty bytes on EOF
break # connection was closed
await callback(message)
Then add these to the event loop
tasks = [] # keeping a reference prevents these from being garbage collected
for address in ["wss://warehouse1", "wss://warehouse2"]:
tasks.append(asyncio.create_task(
socket_activity(address, callback)
))
# return tasks # or work with them
If you want to wait in a coroutine until N operations are complete, you can use .gather()
Alternatively, you may find Tornado does everything you want and more (I based my Answer off this one)
Tornado websocket client: how to async on_message? (coroutine was never awaited)

How to save and edit server rendering data?

I am using Flask server with python.
I have an integer pics_to_show. Every time a request is recieved, the user recieves pics_to_show integer. And pics_to_show gets decremented by 1.
pics_to_show is an integer that`s shared with all website users. I could make a database to save it, but I want something simpler and flexible. Is there any OTHER way to save this integer.
I made a class that saves such variables in a JSON file.
class GlobalSate:
def __init__(self, path_to_file):
self.path = path_to_file
try:
open(path_to_file)
except FileNotFoundError:
f = open(path_to_file, 'x')
f.write('{}')
def __getitem__(self, key):
file = self.load_file()
data = json.loads(file.read())
return data[key]
def __setitem__(self, key, value):
file = self.load_file()
data = json.loads(file.read())
data[key] = value
json.dump(data, open(self.path, 'w+'), indent=4)
def load_file(self):
return open(self.path, 'r+')
The class is over-simplified ofcourse. I initialize an instance in the __init__.py and import it to all routes files (I am using BluePrints).
My apllication is threaded, so this class might not work... Since multiple users are editing the data at the same time. Does anybody have another solution?
Note:
The g variable would not work, since data is shared across users not requests.
Also, what if I want to increment such variable every week? Would it be thread safe to run a seperate python script to keep track of the date, or check the date on each request to the server?
You will definitely end up with inconsistent state, with out a locking mechanism between reads and writes, you will have race conditions. so you will loose some increments.
Also you are not closing the open file, if you do that enough times it will crash the application.
Also one good pice of advice is that you do not want to write a state managing software, (database) it is very very difficult to get it right.
I think in your situation the best solution is to use sqlite, as it is a lib that you call from your app, there is not additional server.
"I could make a database to save it, but I want something simpler and flexible"
In a multi threaded app you can not go simpler than sqlite, (if you want your app to be correct that is).
If you do not like SQL then there are some simpler options:
zodb http://www.zodb.org/en/latest/guide/transactions-and-threading.html
pikleDB https://github.com/patx/pickledb
python shelve but you will need to use file system locks

Twisted multiple concurrent or async streams

I'm writing an application in python using the twisted.web framework to stream video using html 5.
The videos are being server via static.File('pathtovideo').render_GET()
The problem is that only one video can be streamed at a time as it ties up the entire process.
Is there anyway to make the streaming async or non-block, whichever term would be appropriate here.
I tried using deferToThread but that still tied up the process.
This is the class Im currently using, where Movie is an ORM table and mid is just an id to an arbitrary row.
class MovieStream(Resource):
isLeaf=True
def __init__(self, mid):
Resource.__init__(self)
self.mid = mid
def render_GET(self, request):
movie = Movie.get(Movie.id == self.mid)
if movie:
defered = deferToThread(self._start_stream, path=movie.source), request=request)
defered.addCallback(self._finish_stream, request)
return NOT_DONE_YET
else:
return NoResource()
`
def _start_stream(self, path, request):
stream = File(path)
return stream.render_GET(request)
def _finish_stream(self, ret, request):
request.finish()
The part of this code that looks like it blocks is actually the Movie.get call.
It is incorrect to call _start_stream with deferToThread because _start_stream uses Twisted APIs (File and whatever File.render_GET uses) and it is illegal to use Twisted APIs except in the reactor thread (in other words, it is illegal to use them in a function you call with deferToThread).
Fortunately you can just delete the use of deferToThread to fix that bug.
To fix the problem that Movie.get blocks you'll need to find a way to access your database asynchronously. Perhaps using deferToThread(Movie.get, Movie.id == self.mid) - if the database library that implements Movie.get is thread-safe, that is.
For what it's worth, you can also avoid the render_GET hijinx by moving your database lookup logic earlier in the resource traversal hierarchy.
For example, I imagine your URLs look something like /foo/bar/<movie id>. In this case, the resource at /foo/bar gets asked for <movie id> children. If you implement that lookup like this:
from twisted.web.resource import Resource
from twisted.web.util import DeferredResource
class MovieContainer(Resource):
def getChild(self, movieIdentifier):
condition = (Movie.id == movieIdentifier)
getting = deferToThread(Movie.get, condition)
return DeferredResource(getting)
(assuming here that Movie.get is thread-safe) then you'll essentially be done.
Resource traversal will conclude with the object constructed by DeferredResource(getting) and when that object is rendered it will take care of waiting for getting to have a result (for the Deferred to "fire", in the lingo) and of calling the right method on it, eg render_GET, to produce a response for the request.

Preserving global state in a flask application [duplicate]

This question already has answers here:
Are global variables thread-safe in Flask? How do I share data between requests?
(4 answers)
Closed 2 years ago.
I am trying to save a cache dictionary in my flask application.
As far as I understand it, the Application Context, in particular the flask.g object should be used for this.
Setup:
import flask as f
app = f.Flask(__name__)
Now if I do:
with app.app_context():
f.g.foo = "bar"
print f.g.foo
It prints bar.
Continuing with the following:
with app.app_context():
print f.g.foo
AttributeError: '_AppCtxGlobals' object has no attribute 'foo'
I don’t understand it and the docs are not helping at all. If I read them correctly the state should have been preserved.
Another idea I had was to simply use module-wide variables:
cache = {}
def some_function():
cache['foo'] = "bar"
But it seems like these get reset with every request.
How to do this correctly?
Edit: Flask 10.1
Based on your question, I think you're confused about the definition of "global".
In a stock Flask setup, you have a Flask server with multiple threads and potentially multiple processes handling requests. Suppose you had a stock global variable like "itemlist = []", and you wanted to keep adding to it in every request - say, every time someone made a POST request to an endpoint. This is totally possible in theory and practice. It's also a really bad idea.
The problem is that you can't easily control which threads and processes "win" - the list could up in a really wonky order, or get corrupted entirely. So now you need to talk about locks, mutexs, and other primitives. This is hard and annoying.
You should keep the webserver itself as stateless as possible. Each request should be totally independent and not share any state in the server. Instead, use a database or caching layer which will handle the state for you. This seems more complicated but is actually simpler in practice. Check out SQLite for example ; it's pretty simple.
To address the 'flask.g' object, that is a global object on a per request basis.
http://flask.pocoo.org/docs/api/#flask.g
It's "wiped clean" between requests and cannot be used to share state between them.
I've done something similar to your "module-wide variables" idea that I use in a flask server that I use to integrate two pieces of software where I know I will only ever have one simultaneous "user" (being the sender software).
My app.py looks like this:
from flask import Flask
from flask.json import jsonify
app = Flask(__name__)
cache = {}
#app.route("/create")
def create():
cache['foo'] = 0
return jsonify(cache['foo'])
#app.route("/increment")
def increment():
cache['foo'] = cache['foo'] + 1
return jsonify(cache['foo'])
#app.route("/read")
def read():
return jsonify(cache['foo'])
if __name__ == '__main__':
app.run()
You can test it like this:
import requests
print(requests.get('http://127.0.0.1:5000/create').json())
print(requests.get('http://127.0.0.1:5000/increment').json())
print(requests.get('http://127.0.0.1:5000/increment').json())
print(requests.get('http://127.0.0.1:5000/read').json())
print(requests.get('http://127.0.0.1:5000/increment').json())
print(requests.get('http://127.0.0.1:5000/create').json())
print(requests.get('http://127.0.0.1:5000/read').json())
Outputs:
0
1
2
2
3
0
0
Use with caution as I expect this to not behave in a proper multi user web server environment.
This line
with app.app_context():
f.g.foo = "bar"
Since you are using the "with" keyword, once this loop is executed, it calls the __exit__ method of the AppContext class. See this. So the 'foo' is popped out once done. Thats why you don't have it available again. You can instead try:
ctx = app.app_context()
f.g.foo = 'bar'
ctx.push()
Until you call the following, g.foo should be available
ctx.pop()
I am howver not sure if you want to use this for the purpose of caching.

django/python : error when get value from dictionary

I have python/django code hosted at dotcloud and redhat openshift. For handling different user, I use token and save it in dictionary. But when I get the value from dict, it sometimes throws an error(key value error).
import threading
thread_queue = {}
def download(request):
dl_val = request.POST["input1"]
client_token = str(request.POST["pagecookie"])
# save client token as keys and thread object as value in dictionary
thread_queue[client_token] = DownloadThread(dl_val,client_token)
thread_queue[client_token].start()
return render_to_response("progress.html",
{ "dl_val" : dl_val, "token" : client_token })
The code below is executed in 1 second intervals via javascript xmlhttprequest to server.
It will check variable inside another thread and return the value to user page.
def downloadProgress(request, token):
# sometimes i use this for check the content of dict
#resp = HttpResponse("thread_queue = "+str(thread_queue))
#return resp
prog, total = thread_queue[str(token)].getValue() # problematic line !
if prog == 0:
# prevent division by zero
return HttpResponse("0")
percent = float(prog) / float(total)
percent = round(percent*100, 2)
if percent >= 100:
try:
f_name = thread_queue[token].getFileName()[1]
except:
downloadProgress(request,token)
resp = HttpResponse('<a href="http://'+request.META['HTTP_HOST']+
'/dl/'+token+'/">'+f_name+'</a><br />')
return resp
else:
return HttpResponse(str(percent))
After testing for several days, it sometimes return :
thread_queue = {}
It sometimes succeeds :
thread_queue = {'wFVdMDF9a2qSQCAXi7za': , 'EVukb7QdNdDgCf2ZtVSw': , 'C7pkqYRvRadTfEce5j2b': , '2xPFhR6wm9bs9BEQNfdd': }
I never get this result when I'm running django locally via manage.py runserver, and accessing it with google chrome, but when I upload it to dotcloud or openshift, it always gives the above problem.
My question :
How can I solve this problem ?
Does dotcloud and openshift limit their python cpu usage ?
Or is the problem inside the python dictionary ?
Thank You.
dotCloud has 4 worker processes by default for the python service. When you run the dev server locally, you are only running one process. Like #martijn said, your issue is related to the fact that your dict isn't going to be shared between these processes.
To fix this issue, you could use something like redis or memcached to store this information instead. If you need a more long term storage solution, then using a database is probably better suited.
dotCloud does not limit the CPU usage, The CPU is shared amongst others on the same host, and allows bursting, but in the end everyone has the same amount of CPU.
Looking at your code, you should check to make sure there is a value in the dict before you access it, or at a minimum surround the code with a try except block, to handle the case when the data isn't there.
str_token = str(token)
if str_token in thread_queue:
prog, total = thread_queue[str_token].getValue() # problematic line !
else:
# value isn't there, do something else
Presumably dotcloud and openshift run multiple processes of your code; the dict is not going to be shared between these processes.
Note that that also means the extra processes will not have access to your extra tread either.
Use an external database for this kind of information instead. For long-running asynchronous jobs like these you also need to run them in a separate worker process. Look at Celery for an all-in-one solution for asynchronous job handling, for example.

Categories

Resources