I am working on development of a system that collect data from rest servers and manipulates it.
One of the requirements is multiple and frequent API requests. We currently implement this in a somewhat synchronous way. I can easily implement this using threads but considering the system might need to support thousands of requests per second I think it would be wise to utilize Twisted's ability to efficiently implement the above. I have seen this blog post and the whole idea of deferred list seems to do the trick. But I am kind of stuck with how to structure my class (can't wrap my mind around how Twisted works).
Can you try to outline the structure of the class that will run the event-loop and will be able to get a list of URLs and headers and return a list of results after making the requests asynchronously?
Do you know of a better way of implementing this in python?
Sounds like you want to use a Twisted project called treq which allows you to send requests to HTTP endpoints. It works alot like requests. I recently helped a friend here in this thread. My answer there might be of some use to you. If you still need more help, just make a comment and I'll try my best to update this answer.
Related
I'm trying to create a simple python server that can handle multiple RCP calls at the same time. I would like to use twisted for the networking and spyne to handle the RPCs. I found a good example in the spyne github repo here, but when I make a call to say_hello_with_sleep using curl I get an error.
exceptions.AssertionError: It looks like this protocol is not
async-compliant yet
This is the only one of the RPCs thats doesn't seem to work and the one that defines the type of nonblocking call I'm looking for.
The final RPCs that I need to implement will take around 40 sec to process before returning the request and Im honestly not sure if this is the best way to go about handling multiple requests at the same time.
Any help or direction would be greatly appreciated. Thanks!
This is fixed and will be released as part of Spyne 2.13.
You can use code from the master branch of http://github.com/arskom/spyne if you can't wait an indefinite amount of time until the release. Code only gets merged there if it passes all tests.
I'd like to do the following:
the queries on a django site (first server) are send to a second
server (for performance and security reasons)
the query is processed on the second server using sqlite
the python search function has to keep a lot of data in memory. a simple cgi would always have to reread data from disk which would further slow down the search process. so i guess i need some daemon to run on the second server.
the search process is slow and i'd like to send partial results back, and show them as they arrive.
this looks like a common task, but somehow i don't get it.
i tried Pyro first which exposes the search class (and then i needed a workaround to avoid sqlite threading issues). i managed to get the complete search results onto the first server, but only as a whole. i don't know how to "yield" the results one by one (as generators cannot be pickled), and i anyway wouldn't know how to write them one by one onto the search result page.
i may need some "push technology" says this thread: https://stackoverflow.com/a/5346075/1389074 talking about some different framework. but which?
i don't seem to search for the right terms. maybe someone can point me to some discussions or frameworks that address this task?
thanks a lot in advance!
You can use python tornado websockets. This will allow you to establish 2 way connection from the client side to the server and return data as it comes. Tornado is an async framework built in python.
I am looking for a simple (i.e., not one that requires me to setup a separate server to handle a messaging queue) way to do long-polling for a small web-interface that runs calculations and produces a graph. This is what my web-interface needs to do:
User requests a graph/data in a web-interface
Server runs some calculations.
While the server is running calculations, a small container is updated (likely via AJAX/jQuery) with the calculation progress (similar to what you'd do in a consol with print (i.e. print 'calculating density function...'))
Calculation finishes and graph is shown to user.
As the calculation is all done server-side, I'm not really sure how to easily set this up. Obviously I'll want to setup a REST API to handle the polling, which would be easy in Flask. However, I'm not sure how to retrieve the actual updates. An obvious, albeit complicated for this purpose, solution would be to setup a messaging queue and do some long polling. However, I'm not sure sure this is the right approach for something this simple.
Here are my questions:
Is there a way to do this using the file system? Performance isn't a huge issue. Can AJAX/jQuery find messages from a file? Save the progress to some .json file?
What about pickling? (I don't really know much about pickling, but maybe I could pickle a message dict and it could be read by an API that is handling the polling).
Is polling even the right approach? Is there a better or more common pattern to handle this?
I have a feeling I'm overcomplicating things as I know this kind of thing is common on the web. Quite often I see stuff happening and a little "loading.gif" image is running while some calculation is going on (for example, in Google Analytics).
Thanks for your help!
I've built several apps like this using just Flask and jQuery. Based on that experience, I'd say your plan is good.
Do not use the filesystem. You will run into JavaScript security issues/protections. In the unlikely event you find reasonable workarounds, you still wouldn't have anything portable or scalable. Instead, use a small local web serving framework, like Flask.
Do not pickle. Use JSON. It's the language of web apps and REST interfaces. jQuery and those nice jQuery-based plugins for drawing charts, graphs and such will expect JSON. It's easy to use, human-readable, and for small-scale apps, there's no reason to go any place else.
Long-polling is fine for what you want to accomplish. Pure HTTP-based apps have some limitations. And WebSockets and similar socket-ish layers like Socket.IO "are the future." But finding good, simple examples of the server-side implementation has, in my experience, been difficult. I've looked hard. There are plenty of examples that want you to set up Node.js, REDIS, and other pieces of middleware. But why should we have to set up two or three separate middleware servers? It's ludicrous. So long-polling on a simple, pure-Python web framework like Flask is the way to go IMO.
The code is a little more than a snippet, so rather than including it here, I've put a simplified example into a Mercurial repository on bitbucket that you can freely review, copy, or clone. There are three parts:
serve.py a Python/Flask based server
templates/index.html 98% HTML, 2% template file the Flask-based server will render as HTML
static/lpoll.js a jQuery-based client
Long-polling was a reasonable work-around before simple, natural support for Web Sockets came to most browsers, and before it was easily integrated alongside Flask apps. But here in mid-2013, Web Socket support has come a long way.
Here is an example, similar to the one above, but integrating Flask and Web Sockets. It runs atop server components from gevent and gevent-websocket.
Note this example is not intended to be a Web Socket masterpiece. It retains a lot of the lpoll structure, to make them more easily comparable. But it immediately improves responsiveness, server overhead, and interactivity of the Web app.
Update for Python 3.7+
5 years since the original answer, WebSocket has become easier to implement. As of Python 3.7, asynchronous operations have matured into mainstream usefulness. Python web apps are the perfect use case. They can now use async just as JavaScript and Node.js have, leaving behind some of the quirks and complexities of "concurrency on the side." In particular, check out Quart. It retains Flask's API and compatibility with a number of Flask extensions, but is async-enabled. A key side-effect is that WebSocket connections can be gracefully handled side-by-side with HTTP connections. E.g.:
from quart import Quart, websocket
app = Quart(__name__)
#app.route('/')
async def hello():
return 'hello'
#app.websocket('/ws')
async def ws():
while True:
await websocket.send('hello')
app.run()
Quart is just one of the many great reasons to upgrade to Python 3.7.
I was trying to create a polling script in python that starts when another python script starts and then keeps supplying data back to this script.
I can obviously write an infinite loop but is that the right way to go about it? I might loose control over how the functions work and how many times a function should be called in an hour.
Edit:
What I am trying to accomplish is to poll the REST API of twitter and get new mentions and people who follow me. I obviously can't keep polling because I will run out of API requests per hour. Thus, the issue. This poller, will send the new mention and follower id/user to the main script that would be listening to any such update.
I highly suggest looking into Twisted, one of the most popular async frameworks using the reactor pattern.
The "infinite loop" you are looking for is really an application pattern that Twisted implements to respond to events asynchronously, and it almost never makes sense to roll your own.
Twisted is largely used for networking requirements, but the it has a LoopingCall interface to set up the kind of functionality you require. Using the core Twisted deferred as your request model allows you to set up a long-polling server that can perform the kind of conditional network test you need. It can intially be a little intimidating, but once you understand the core components (Factories, Reactors, Protocols etc) that you need to inherit it becomes much easier to visualize your problem.
This also might be a good tutorial to start looking at the basics of the "push" model:
http://carloscarrasco.com/simple-http-pubsub-server-with-twisted.html
I have been playing with other frameworks, such as NodeJS, lately.
I love the possibility to return a response, and still being able to do further operations.
e.g.
def view(request):
do_something()
return HttpResponse()
do_more_stuff() #not possible!!!
Maybe Django already offers a way to perform operations after returning a request, if that is the case that would be great.
Help would be very much appreciated! =D
not out of the box as you've already returned out of the method. You could use something like Celery which would pass the do_more_stuff task onto a queue and then have it run do_more_stuff() outside of http request / response flow.
Django lets you accomplish this with Signals, more information can be found here. (Please note, as I said in comments below, signals aren't non-blocking, but they do allow you to execute code after returning a response in a view.)
If you're looking into doing many, many asynchronous requests and need them to be non-blocking, you may want to check out Tornado.
Because you're returning from the function, do_more_stuff will never be called.
If you're looking at doing heavy lifting stuff queuing up something before you return as Ross suggests (+1 for Celery).
if however you're looking at returning some content... then doing something and returning more content to the user streaming is probably what you're looking for. You can pass an iterator or a generator to HttpResponse, and it'll iterate and push out the content in a trickle fashion. It feels a bit yuck, but if you're a generator rockstar you may be able to do enough in various states to accomplish what you want.
Or I guess you could simply redesign your page to use a lot of ajax to do what you need, including firing off events to django views, reading data from views, etc.
It kind of comes down to where the burden of async is going to sit: client, server or response.
I'm not that familiar with node.js yet, but it would be interesting to see the use case you're talking about.
EDIT: I did a little more looking into signals, and while they do occur in process, there is a built in signal for request_finished after the request has been handled by django, though it's more of a catchall than something specific.