Accessing http upload data before upload completion - python

Is there any way to access a file being uploaded over http using a CGI script before the upload finishes? For example, say a 10 megabyte file is being uploaded, and is exactly 10% done, meaning the server has 1 megabyte of data. Is it possible to read that 1 megabyte of data without waiting for the upload to finish?
My understanding of http uploads is that the server won't call the CGI script handling the upload until all of the data is received, but I'm hoping there's some way around that. I'm using python to handle CGI requests if that makes any difference.
Thanks in advance for any help.

CGI is the specification of communication between the web server and the external application. It does not allow for this.
In fact, most web servers won't do anything with an upload until it finishes, but there's no reason you couldn't write/change one (or MAYBE find one, but I don't know which it would be) to allow access, but you're still not going to do it via a CGI.
http://www.ietf.org/rfc/rfc3875

Related

Django/Python - Serial line concurrency

I'm currently working on gateway with an embedded Linux and a Webserver. The goal of the gateway is to retrieve data from electrical devices through a RS485/Modbus line, and to display them on a server.
I'm using Nginx and Django, and the web front-end is delivered by "static" files. Repeatedly, a Javascript script file makes AJAX calls that send CGI requests to Nginx. These CGI requests are answered with JSON responses thanks to Django. The responses are mostly data that as been read on the appropriate Modbus device.
The exact path is the following :
Randomly timed CGI call -> urls.py -> ModbusCGI.py (import an other script ModbusComm.py)-> ModbusComm.py create a Modbus client and instantly try to read with it.
Next to that, I wanted to implement a Datalogger, to store data in a DB at regular intervals. I made a script that also import the ModbusComm.py script, but it doesn't work : sometime multiple Modbus frames are sent at the same time (datalogger and cgi scripts call the same function in ModbusComm.py "files" at the same time) which results in an error.
I'm sure this problem would also occur if there are a lot of users on the server (CGI requests sent at the same time). Or not ? (queue system already managed for CGI requests? I'm a bit lost)
So my goal would be to make a queue system that could handle calls from several python scripts => make them wait while it's not their turn => call a function with the right arguments when it's their turn (actually using the modbus line), and send back the response to the python script so it can generate the JSON response.
I really don't know how to achieve that, and I'm sure there are better way to do this.
If I'm not clear enough, don't hesitate to make me aware of it :)
I had the same problem when I had to allow multiple processes to read some Modbus (and not only Modbus) data through a serial port. I ended up with a standalone process (“serial port server”) that exclusively works with a serial port. All other processes work with that port through that standalone process via some inter processes communication mechanism (we used Unix sockets).
This way when an application wants to read a Modbus register it connects to the “serial port server”, sends its request and receives the response. All the actual serial port communication is done by the “serial port server” in sequential way to ensure consistency.

Serve dynamic data to many clients

I am writing a client-server type application. The server side gathers constantly changing data from other hardware and then needs to pass it to multiple clients (say about 10) for display. The server data gathering program will be written in Python 3.4 and run on Debian. The clients will be built with VB Winforms on .net framework 4 running on Windows.
I had the idea to run a lightweight web server on the server-side and use system.net.webclient.downloadstring calls on the client side to receive it. This is so that all the multi-threading async stuff is done for me by the web server.
Questions:
Does this seem like a good approach?
Having my data gathering program write a text file for the web server to serve seems unnecessary. Is there a way to have the data in memory and have the server just serve that so there is no disk file intermediary? Setting up a ramdisk was one solution I thought of but this seems like overkill.
How will the web server deal with the data being frequently updated, say, once a second? Do webservers deal with this elegantly or is there a chance the file will be served whilst it is being written to?
Thanks.
1) I am not very familiar with Python, but for the .net application you will likely want to push change notifications to it, rather than pull. The system.net.webclient.downloadstring is a request (pull). As I am not a Python developer I cannot assist in that.
3) As you are requesting data, it is possible to create some errors of the read/write while updating and reading at the same time. Even if this does not happen your data may be out of date as soon as you read it. This can be an acceptable problem, this just depends of how critical your data is.
This is why I would do a push notification rather than a pull. If worked correctly this can keep data synced and avoid some timing issues.

Bottle server not responding while calculating

I have a bottle server running on port 8080, using the "gevent" server. I use this server to support some simple "server sent events".
My question is probably related to not knowing exactly how my set up is working. I hope someone can take the time to elaborate on this.
All routes and serving of files from the server is working great, but I have an issue when accessing a specific route "/get_data". This gathers data from the web as well as from some internal data sources. The gathering takes about 30 minutes. While this process is running, I am not able to access any routes on the server, i.e. "/" or "/login". Once the process is finished, everything works again and the database is updated with the gathered information.
I tried replacing the gathering algorithms by a simple time.sleep(60), and while the timer was active, I was still able to access other routes just fine.
This leads to my two questions:
Why am I not able to access the server while this process is running. Is it the port that is blocked (from reading web-information), or maybe it has something to do with threading?
What would be the best way to run a demanding / long process on my server? Preferably I would like to access this from my web app, but I have thought about just putting this in a seperate python file and run this localy on the server, in a seperate instance of python. This process is run at most once per day, maybe as seldom as once per week.
This happen because WSGI handle request/response synchronously.
You can use gunicorn to run your application, it will handle multi requests and response, or you can use other methods described in bottle website:
Primer to Asynchronous Applications

Downloading files in background with Python

I have got a working web application in Python that downloads a file into the web server upon a user's request. This works fine for small file downloads but when the user requests a larger file, the connection times out. So, I think I need to process the download in the background but I'm not sure what tool is most suitable for this. Celery seems to be right but I don't really want it to be queued(the download must start immediately). What would you suggest?
Timout duration is up to you, you could just make it longer.
Anyway there are plenty of flash or AJAX uploaders out there, nothing you can do only server side AFAIK

Handle the HTTP PUT method in Python WSGI

I currently have a bash application that, among other things, uses cURL to upload a file to a web application with the PUT method. I am attempting to duplicate the web application as the client (bash) portion is GPL but the web portion is not. I also cannot alter the client application as it auto-updates itself from the developers' website.
I have found multitudes of information on how to handle the HTTP POST method with WSGI, CherryPy, Twisted, and practically every way of having Python scripts working on the WWW. However, I can't find a single thing about the PUT method. Does anyone know how to process a PUT request with WSGI, or is there some other framework with PUT functionality that I am missing?
As I understand it, you will just want to read the stream environ['wsgi.input'], because a PUT request will send the entire contents of the PUT as the body of the request.
I am not aware of any encoding issues you will have to deal with (other than the fact that it is binary).
Some time ago, I wrote a simple set of PHP scripts to take and give huge files from another server on a LAN. We started with POST, but quickly ran out of memory on the larger files. So we switched to PUT, where the PHP script could take it's good time looping through php://input 4096 bytes at a time (or whatever)... It works great.
Here is the PHP code:
$f1 = fopen('php://input', 'rb');
$f2 = fopen($FilePath, 'wb');
while($data = fread($f1, 4096))
{
fwrite($f2, $data);
}
fclose($f1);
fclose($f2);
From my experience in handling multipart/form-data in WSGI with POST, I have little doubt that you can handle a PUT by just reading the input stream.
The python code should be like this:
output = open('/tmp/input', 'wb')
while True:
buf = environ['wsgi.input'].read(4096)
if len(buf) == 0:
break
output.write(buf)

Categories

Resources