I'm reading on the WSGI specification, and tried implementing a simple WSGI server from scratch, and tested it on a simple flask application. Currently it
opens a socket listener
passes each incoming connection to another thread to handle it
the handler parses the request, creates the environ, passes it to the flask app object, and returns the response.
Overall, it seems to work. Conceptually, what more does a real server, eg. gunicorn do? I'm asking in terms of the basic functionality, not in terms of supporting more features (eg. different protocols). What makes a server better, eg. why is gunicorn suitable for production, but wsgiref is not?
my 2c, is that getting something working is pretty easy, it's just that HTTP is such an old/complex standard it takes a lot of work to get all the edge cases working nicely
how well does it tolerate errors in the WSGI client code
HTTP 0.9, 1.0, 1.1 or 2/SPDY?
how do you handle malicious clients that send a byte every 10 seconds
the various Keep-Alive and Tranfer-Encoding variants seems to end up consuming a lot of code
does it do Content-Encoding as well
Related
I am running socket.io on an Apache server through Python Flask. We're integrating it into an iOS app (using the Socket.IO-Client-Swift library) and we're having a weird issue.
From the client side code in the app (written in Swift), I can view the actual connection log (client-side in XCode) and see the connection established from the client's IP and the requests being made. The client never receives the information back (or any information back; even when using a global event response handler) from the socket server.
I wrote a very simple test script in Javascript on an HTML page and sent requests that way and received the proper responses back. With that said, it seems to likely be an issue with iOS. I've found these articles (but none of them helped fix the problem):
https://github.com/nuclearace/Socket.IO-Client-Swift/issues/95
https://github.com/socketio/socket.io-client-swift/issues/359
My next thought is to extend the logging of socket.io to find out exact what data is being POSTed to the socket namespace. Is there a way to log exactly what data is coming into the server (bear in mind that the 'on' hook on the server side that I've set up is not getting any data; I've tried to log it from there but it doesn't appear to even get that far).
I found mod_dumpio for Linux to log all POST requests but I'm not sure how well it will play with multi-threading and a socket server.
Any ideas on how to get the exact data being posted so we can at least troubleshoot the syntax and make sure the data isn't being malformed when it's sent to the server?
Thanks!
Update
When testing locally, we got it working (it was a setting in the Swift code where the namespace wasn't being declared properly). This works fine now on localhost but we are having the exact same issues when emitting to the Apache server.
We are not using mod_wsgi (as far as I know; I'm relatively new to mod_wsgi, apologies for any ignorance). We used to have a .wsgi file that called the main app script to run but we had to change that because mod_wsgi is not compatible with Flask SocketIO (as stated in the uWSGI Web Server section here). The way I am running the script now is by using supervisord to run the .py file as a daemon (using that specifically so it will autostart in the event of a server crash).
Locally, it worked great once we installed the eventlet module through pip. When I ran pip freeze on my virtual environment on the server, eventlet was installed. I uninstalled and reinstalled it just to see if that cleared anything up and that did nothing. No other Python modules that are on my local copy seem to be something that would affect this.
One other thing to keep in mind is that in the function that initializes the app, we change the port to port 80:
socketio.run(app,host='0.0.0.0',port=80)
because we have other API functions that run through a domain that is pointing to the server in this app. I'm not sure if that would affect anything but it doesn't seem to matter on the local version.
I'm at a dead end again and am trying to find anything that could help. Thanks for your assistance!
Another Update
I'm not exactly sure what was happening yet but we went ahead and rewrote some of the code, making sure to pay extra special attention to the namespace declarations within each socket event on function. It's working fine now. As I get more details, I will post them here as I figure this will be something useful for other who have the same problem. This thread also has some really valuable information on how to go about debugging/logging these types of issues although we never actually fully figured out the answer to the original question.
I assume you have verified that Apache does get the POST requests. That should be your first test, if Apache does not log the POST requests coming from iOS, then you have a different kind of problem.
If you do get the POST requests, then you can add some custom code in the middleware used by Flask-SocketIO and print the request data forwarded by Apache's mod_wsgi. The this is in file flask_socketio/init.py. The relevant portion is this:
class _SocketIOMiddleware(socketio.Middleware):
# ...
def __call__(self, environ, start_response):
# log what you need from environ here
environ['flask.app'] = self.flask_app
return super(_SocketIOMiddleware, self).__call__(environ, start_response)
You can find out what's in environ in the WSGI specification. In particular, the body of the request is available in environ['wsgi.input'], which is a file-like object you read from.
Keep in mind that once you read the payload, this file will be consumed, so the WSGI server will not be able to read from it again. Seeking the file back to the position it was before the read may work on some WSGI implementations. A safer hack I've seen people do to avoid this problem is to read the whole payload into a buffer, then replace environ['wsgi.input'] with a brand new StringIO or BytesIO object.
Are you using flask-socketio on the server side? If you are, there is a lot of debugging available in the constructor.
socketio = SocketIO(app, async_mode=async_mode, logger=True, engineio_logger=True)
I initiate a request client-side, then I change my mind and call xhr.abort().
How does Django react to this? Does it terminate the thread somehow? If not, how do I get Django to stop wasting time trying to respond to the aborted request? How do I handle it gracefully?
Due to how http works and that you usually got a frontend in front of your django gunicorn app processes (or uswgi etc), your http cancel request is buffered by nginx. The gunicorns don't get a signal, they just finish processing and then output whatever to the http socket. But if that socket is closed it will have an error (which is caught as a closed connection and move one).
So it's easy to DOS a server if you can find a way to spawn many of these requests.
But to answer your question it depends on the backend, with gunicorn it will keep going until the timeout.
Just think of the Web as a platform for building easy-to-use, distributed, loosely couple systems, with no guarantee about the availability of resources as 404 status code suggests.
I think that creating tightly coupled solutions such as your idea is going against web principles and usage of REST. xhr.abort() is client side programming, it's completely different from server side. It's a bad practice trying to tighten client side technology to server side internal behavior.
Not only this is a waste of resources, but also there is no guarantee on processing status of the request by web server. It may lead to data inconsistency too.
If your request generates no server-side side effects for which the client
can be held responsible. It is better just to ignore it, since these kind of requests does not change server state & the response is usually cached for better performance.
If your request could cause changes in server state or data, for the sake of data consistency you can check whether the changes have taken effect or not using an API. In case of affection try to rollback using another API.
EDIT:Question Updated. Thanks Slott.
I have a TCP Server in Python.
It is a server with asynchronous behaviour. .
The message format is Binary Data.
Currently I have a python client that interacts with the code.
What I want to be able to do eventually implement a Web based Front End to this client.
I just wanted to know , what should be correct design for such an application.
Start with any WSGI-based web server. werkzeug is a choice.
The Asynchronous TCP/IP is a seriously complicated problem. HTTP is synchronous. So using the synchronous web server presenting some asynchronous data is always a problem. Always.
The best you can do is to buffer things and have two processes in your web application.
TCP/IP process that collects data from the remove server and buffers it in a file (or files) somewhere.
WSGI web process which handles GET/POST processing.
GET requests will fetch some or all of the buffer and display it.
POST requests will send a message to the TCP/IP server.
For Web-based, talk HTTP. Use JSON or XML as data formats.
Be standards-compliant and make use of the vast number of libraries out there. Don't reinvent the wheel. This way you have less headaches in the long run.
if you need to maintain a connection to a backend server across multiple HTTP requests, Twisted's HTTP server is an ideal choice, since it's built to manage multiple connections easily.
The title may be a bit vague, but here's my goal: I have a frontend webserver which takes incoming HTTP requests, does some preprocessing on them, and then passes the requests off to my real webserver to get the HTTP response, which is then passed back to the client.
Currently, my frontend is built off of BaseHTTPServer.HTTPServer and the backend is CherryPy.
So the question is: Is there a way to take these HTTP requests / client connections and insert them into a CherryPy server to get the HTTP response? One obvious solution is to run an instance of the CherryPy backend on a local port or using UNIX domain sockets, and then the frontend webserver establishes a connection with the backend and relays any requests/responses. Obviously, this isn't ideal due to the overhead.
What I'd really like is for the CherryPy backend to not bind to any port, but just sit there waiting for the frontend to pass the client's socket (as well as the modified HTTP Request info), at which point it does its normal CherryPy magic and returns the request directly to the client.
I've been perusing the CherryPy source to find some way to accomplish this, and currently am attempting to modify wsgiserver.CherryPyWSGIServer, but it's getting pretty hairy and is probably not the best approach.
Is your main app a wsgi application? If so, you could write some middleware that wraps around it and does all the request wrangling before passing on to the main application.
If this this is possible it would avoid you having to run two webservers and all the problems you are encountering.
Answered the Upgrade question at Handling HTTP/1.1 Upgrade requests in CherryPy. Not sure if that addresses this one or not.
How should I implement reverse AJAX when building a chat application in Django? I've looked at Django-Orbited, and from my understanding, this puts a comet server in front of the HTTP server. This seems fine if I'm just running the Django development server, but how does this work when I start running the application from mod_wsgi? How does having the orbited server handling every request scale? Is this the correct approach?
I've looked at another approach (long polling) that seems like it would work, although I'm not sure what all would be involved. Would the client request a page that would live in its own thread, so as not to block the rest of the application? Would it even block? Wouldn't the script requested by the client have to continuously poll for information?
Which of the approaches is more proper? Which is more portable, scalable, sane, etc? Are there other good approaches to this (aside from the client polling for messages) that I have overlooked?
How about using the awesome nginx push module?
Have take a look at Tornado?
Using WSGI for comet/long-polling apps is not a good choice because don't support non-blocking requests.
The Nginx Push Stream Module provides a simple HTTP interface for both the server and the client.
The Nginx HTTP Push Module is similar, but seems to no longer be maintained.