Streaming file upload using bottle (or flask or similar)

Streaming file upload using bottle (or flask or similar) - python

I have a REST frontend written using Python/Bottle which handles file uploads, usually large ones. The API is wirtten in such a way that:
The client sends PUT with the file as a payload. Among other things, it sends Date and Authorization headers. This is a security measure against replay attacks -- the request is singed with a temporary key, using target url, the date and several other things
Now the problem. The server accepts the request if the supplied date is in given datetime window of 15 minutes. If the upload takes long enough time, it will be longer than the allowed time delta. Now, the request authorization handling is done using decorator on bottle view method. However, bottle won't start the dispatch process unless the upload is finished, so the validation fails on longer uploads.
My question is: is there a way to explain to bottle or WSGI to handle the request immediately and stream the upload as it goes? This would be useful for me for other reasons as well. Or any other solutions? As I am writing this, WSGI middleware comes to mind, but still, I'd like external insight.
I would be willing to switch to Flask, or even other Python frameworks, as the REST frontend is quite lightweight.
Thank you

I recommend splitting the incoming file into smaller-sized chunks on the frontend. I'm doing this to implement a pause/resume function for large file uploads in a Flask application.
Using Sebastian Tschan's jquery plugin, you can implement chunking by specifying a maxChunkSize when initializing the plugin, as in:
$('#file-select').fileupload({
url: '/uploads/',
sequentialUploads: true,
done: function (e, data) {
console.log("uploaded: " + data.files[0].name)
},
maxChunkSize: 1000000 // 1 MB
});
Now the client will send multiple requests when uploading large files. And your server-side code can use the Content-Range header to patch the original large file back together. For a Flask application, the view might look something like:
# Upload files
#app.route('/uploads/', methods=['POST'])
def results():
files = request.files
# assuming only one file is passed in the request
key = files.keys()[0]
value = files[key] # this is a Werkzeug FileStorage object
filename = value.filename
if 'Content-Range' in request.headers:
# extract starting byte from Content-Range header string
range_str = request.headers['Content-Range']
start_bytes = int(range_str.split(' ')[1].split('-')[0])
# append chunk to the file on disk, or create new
with open(filename, 'a') as f:
f.seek(start_bytes)
f.write(value.stream.read())
else:
# this is not a chunked request, so just save the whole file
value.save(filename)
# send response with appropriate mime type header
return jsonify({"name": value.filename,
"size": os.path.getsize(filename),
"url": 'uploads/' + value.filename,
"thumbnail_url": None,
"delete_url": None,
"delete_type": None,})
For your particular application, you will just have to make sure that the correct auth headers are still sent with each request.
Hope this helps! I was struggling with this problem for a while ;)

When using plupload solution might be like this one:
$("#uploader").plupload({
// General settings
runtimes : 'html5,flash,silverlight,html4',
url : "/uploads/",
// Maximum file size
max_file_size : '20mb',
chunk_size: '128kb',
// Specify what files to browse for
filters : [
{title : "Image files", extensions : "jpg,gif,png"},
],
// Enable ability to drag'n'drop files onto the widget (currently only HTML5 supports that)
dragdrop: true,
// Views to activate
views: {
list: true,
thumbs: true, // Show thumbs
active: 'thumbs'
},
// Flash settings
flash_swf_url : '/static/js/plupload-2.1.2/js/plupload/js/Moxie.swf',
// Silverlight settings
silverlight_xap_url : '/static/js/plupload-2.1.2/js/plupload/js/Moxie.xap'
});
And your flask-python code in such case would be similar to this:
from werkzeug import secure_filename
# Upload files
#app.route('/uploads/', methods=['POST'])
def results():
content = request.files['file'].read()
filename = secure_filename(request.values['name'])
with open(filename, 'ab+') as fp:
fp.write(content)
# send response with appropriate mime type header
return jsonify({
"name": filename,
"size": os.path.getsize(filename),
"url": 'uploads/' + filename,})
Plupload always sends chunks in exactly same order, from first to last, so you do not have to bother with seek or anything like that.

Related

FastAPI: Transfering large files (200MB images) fast [duplicate]

I am trying to upload a large file (≥3GB) to my FastAPI server, without loading the entire file into memory, as my server has only 2GB of free memory.
Server side:
async def uploadfiles(upload_file: UploadFile = File(...):
Client side:
m = MultipartEncoder(fields = {"upload_file":open(file_name,'rb')})
prefix = "http://xxx:5000"
url = "{}/v1/uploadfiles".format(prefix)
try:
req = requests.post(
url,
data=m,
verify=False,
)
which returns:
HTTP 422 {"detail":[{"loc":["body","upload_file"],"msg":"field required","type":"value_error.missing"}]}
I am not sure what MultipartEncoder actually sends to the server, so that the request does not match. Any ideas?

With requests-toolbelt library, you have to pass the filename as well, when declaring the field for upload_file, as well as set the Content-Type header—which is the main reason for the error you get, as you are sending the request without setting the Content-Type header to multipart/form-data, followed by the necessary boundary string—as shown in the documentation. Example:
filename = 'my_file.txt'
m = MultipartEncoder(fields={'upload_file': (filename, open(filename, 'rb'))})
r = requests.post(url, data=m, headers={'Content-Type': m.content_type})
print(r.request.headers) # confirm that the 'Content-Type' header has been set
However, I wouldn't recommend using a library (i.e., requests-toolbelt) that hasn't provided a new release for over three years now. I would suggest using Python requests instead, as demonstrated in this answer and that answer (also see Streaming Uploads and Chunk-Encoded Requests), or, preferably, use the HTTPX library, which supports async requests (if you had to send multiple requests simultaneously), as well as streaming File uploads by default, meaning that only one chunk at a time will be loaded into memory (see the documentation). Examples are given below.
Option 1 (Fast) - Upload File and Form data using .stream()
As previously explained in detail in this answer, when you declare an UploadFile object, FastAPI/Starlette, under the hood, uses a SpooledTemporaryFile with the max_size attribute set to 1MB, meaning that the file data is spooled in memory until the file size exceeds the max_size, at which point the contents are written to disk; more specifically, to a temporary file on your OS's temporary directory—see this answer on how to find/change the default temporary directory—that you later need to read the data from, using the .read() method. Hence, this whole process makes uploading file quite slow; especially, if it is a large file (as you'll see in Option 2 below later on).
To avoid that and speed up the process, as the linked answer above suggested, one can access the request body as a stream. As per Starlette documentation, if you use the .stream() method, the (request) byte chunks are provided without storing the entire body to memory (and later to a temporary file, if the body size exceeds 1MB). This method allows you to read and process the byte chunks as they arrive. The below takes the suggested solution a step further, by using the streaming-form-data library, which provides a Python parser for parsing streaming multipart/form-data input chunks. This means that not only you can upload Form data along with File(s), but you also don't have to wait for the entire request body to be received, in order to start parsing the data. The way it's done is that you initialise the main parser class (passing the HTTP request headers that help to determine the input Content-Type, and hence, the boundary string used to separate each body part in the multipart payload, etc.), and associate one of the Target classes to define what should be done with a field when it has been extracted out of the request body. For instance, FileTarget would stream the data to a file on disk, whereas ValueTarget would hold the data in memory (this class can be used for either Form or File data as well, if you don't need the file(s) saved to the disk). It is also possible to define your own custom Target classes. I have to mention that streaming-form-data library does not currently support async calls to I/O operations, meaning that the writing of chunks happens synchronously (within a def function). Though, as the endpoint below uses .stream() (which is an async function), it will give up control for other tasks/requests to run on the event loop, while waiting for data to become available from the stream. You could also run the function for parsing the received data in a separate thread and await it, using Starlette's run_in_threadpool()—e.g., await run_in_threadpool(parser.data_received, chunk)—which is used by FastAPI internally when you call the async methods of UploadFile, as shown here. For more details on def vs async def, please have a look at this answer.
You can also perform certain validation tasks, e.g., ensuring that the input size is not exceeding a certain value. This can be done using the MaxSizeValidator. However, as this would only be applied to the fields you defined—and hence, it wouldn't prevent a malicious user from sending extremely large request body, which could result in consuming server resources in a way that the application may end up crashing—the below incorporates a custom MaxBodySizeValidator class that is used to make sure that the request body size is not exceeding a pre-defined value. The both validators desribed above solve the problem of limiting upload file (as well as the entire request body) size in a likely better way than the one desribed here, which uses UploadFile, and hence, the file needs to be entirely received and saved to the temporary directory, before performing the check (not to mention that the approach does not take into account the request body size at all)—using as ASGI middleware such as this would be an alternative solution for limiting the request body. Also, in case you are using Gunicorn with Uvicorn, you can also define limits with regards to, for example, the number of HTTP header fields in a request, the size of an HTTP request header field, and so on (see the documentation). Similar limits can be applied when using reverse proxy servers, such as Nginx (which also allows you to set the maximum request body size using the client_max_body_size directive).
A few notes for the example below. Since it uses the Request object directly, and not UploadFile and Form objects, the endpoint won't be properly documented in the auto-generated docs at /docs (if that's important for your app at all). This also means that you have to perform some checks yourself, such as whether the required fields for the endpoint were received or not, and if they were in the expected format. For instance, for the data field, you could check whether data.value is empty or not (empty would mean that the user has either not included that field in the multipart/form-data, or sent an empty value), as well as if isinstance(data.value, str). As for the file(s), you can check whether file_.multipart_filename is not empty; however, since a filename could likely not be included in the Content-Disposition by some user, you may also want to check if the file exists in the filesystem, using os.path.isfile(filepath) (Note: you need to make sure there is no pre-existing file with the same name in that specified location; otherwise, the aforementioned function would always return True, even when the user did not send the file).
Regarding the applied size limits, the MAX_REQUEST_BODY_SIZE below must be larger than MAX_FILE_SIZE (plus all the Form values size) you expcect to receive, as the raw request body (that you get from using the .stream() method) includes a few more bytes for the --boundary and Content-Disposition header for each of the fields in the body. Hence, you should add a few more bytes, depending on the Form values and the number of files you expect to receive (hence the MAX_FILE_SIZE + 1024 below).
app.py
from fastapi import FastAPI, Request, HTTPException, status
from streaming_form_data import StreamingFormDataParser
from streaming_form_data.targets import FileTarget, ValueTarget
from streaming_form_data.validators import MaxSizeValidator
import streaming_form_data
from starlette.requests import ClientDisconnect
import os
MAX_FILE_SIZE = 1024 * 1024 * 1024 * 4 # = 4GB
MAX_REQUEST_BODY_SIZE = MAX_FILE_SIZE + 1024
app = FastAPI()
class MaxBodySizeException(Exception):
def __init__(self, body_len: str):
self.body_len = body_len
class MaxBodySizeValidator:
def __init__(self, max_size: int):
self.body_len = 0
self.max_size = max_size
def __call__(self, chunk: bytes):
self.body_len += len(chunk)
if self.body_len > self.max_size:
raise MaxBodySizeException(body_len=self.body_len)
#app.post('/upload')
async def upload(request: Request):
body_validator = MaxBodySizeValidator(MAX_REQUEST_BODY_SIZE)
filename = request.headers.get('Filename')
if not filename:
raise HTTPException(status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
detail='Filename header is missing')
try:
filepath = os.path.join('./', os.path.basename(filename))
file_ = FileTarget(filepath, validator=MaxSizeValidator(MAX_FILE_SIZE))
data = ValueTarget()
parser = StreamingFormDataParser(headers=request.headers)
parser.register('file', file_)
parser.register('data', data)
async for chunk in request.stream():
body_validator(chunk)
parser.data_received(chunk)
except ClientDisconnect:
print("Client Disconnected")
except MaxBodySizeException as e:
raise HTTPException(status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE,
detail=f'Maximum request body size limit ({MAX_REQUEST_BODY_SIZE} bytes) exceeded ({e.body_len} bytes read)')
except streaming_form_data.validators.ValidationError:
raise HTTPException(status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE,
detail=f'Maximum file size limit ({MAX_FILE_SIZE} bytes) exceeded')
except Exception:
raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail='There was an error uploading the file')
if not file_.multipart_filename:
raise HTTPException(status_code=status.HTTP_422_UNPROCESSABLE_ENTITY, detail='File is missing')
print(data.value.decode())
print(file_.multipart_filename)
return {"message": f"Successfuly uploaded {filename}"}
As mentioned earlier, to upload the data (on client side), you can use the HTTPX library, which supports streaming file uploads by default, and thus allows you to send large streams/files without loading them entirely into memory. You can pass additional Form data as well, using the data argument. Below, a custom header, i.e., Filename, is used to pass the filename to the server, so that the server instantiates the FileTarget class with that name (you could use the X- prefix for custom headers, if you wish; however, it is not officially recommended anymore).
To upload multiple files, use a header for each file (or, use random names on server side, and once the file has been fully uploaded, you can optionally rename it using the file_.multipart_filename attribute), pass a list of files, as described in the documentation (Note: use a different field name for each file, so that they won't overlap when parsing them on server side, e.g., files = [('file', open('bigFile.zip', 'rb')),('file_2', open('bigFile2.zip', 'rb'))], and finally, define the Target classes on server side accordingly.
test.py
import httpx
import time
url ='http://127.0.0.1:8000/upload'
files = {'file': open('bigFile.zip', 'rb')}
headers={'Filename': 'bigFile.zip'}
data = {'data': 'Hello World!'}
with httpx.Client() as client:
start = time.time()
r = client.post(url, data=data, files=files, headers=headers)
end = time.time()
print(f'Time elapsed: {end - start}s')
print(r.status_code, r.json(), sep=' ')
Upload both File and JSON body
In case you would like to upload both file(s) and JSON instead of Form data, you can use the approach described in Method 3 of this answer, thus also saving you from performing manual checks on the received Form fields, as explained earlier (see the linked answer for more details). To do that, make the following changes in the code above.
app.py
#...
from fastapi import Form
from pydantic import BaseModel, ValidationError
from typing import Optional
from fastapi.encoders import jsonable_encoder
class Base(BaseModel):
name: str
point: Optional[float] = None
is_accepted: Optional[bool] = False
def checker(data: str = Form(...)):
try:
model = Base.parse_raw(data)
except ValidationError as e:
raise HTTPException(detail=jsonable_encoder(e.errors()), status_code=status.HTTP_422_UNPROCESSABLE_ENTITY)
return model
#...
#app.post('/upload')
async def upload(request: Request):
#...
# place this after the try-except block
model = checker(data.value.decode())
print(model.dict())
test.py
#...
import json
data = {'data': json.dumps({"name": "foo", "point": 0.13, "is_accepted": False})}
#...
Option 2 (Slow) - Upload File and Form data using UploadFile and Form
If you would like to use a normal def endpoint instead, see this answer.
app.py
from fastapi import FastAPI, File, UploadFile, Form, HTTPException, status
import aiofiles
import os
CHUNK_SIZE = 1024 * 1024 # adjust the chunk size as desired
app = FastAPI()
#app.post("/upload")
async def upload(file: UploadFile = File(...), data: str = Form(...)):
try:
filepath = os.path.join('./', os.path.basename(file.filename))
async with aiofiles.open(filepath, 'wb') as f:
while chunk := await file.read(CHUNK_SIZE):
await f.write(chunk)
except Exception:
raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail='There was an error uploading the file')
finally:
await file.close()
return {"message": f"Successfuly uploaded {file.filename}"}
As mentioned earlier, using this option would take longer for the file upload to complete, and as HTTPX uses a default timeout of 5 seconds, you will most likely get a ReadTimeout exception (as the server will need some time to read the SpooledTemporaryFile in chunks and write the contents to a permanent location on the disk). Thus, you can configure the timeout (see the Timeout class in the source code too), and more specifically, the read timeout, which "specifies the maximum duration to wait for a chunk of data to be received (for example, a chunk of the response body)". If set to None instead of some positive numerical value, there will be no timeout on read.
test.py
import httpx
import time
url ='http://127.0.0.1:8000/upload'
files = {'file': open('bigFile.zip', 'rb')}
headers={'Filename': 'bigFile.zip'}
data = {'data': 'Hello World!'}
timeout = httpx.Timeout(None, read=180.0)
with httpx.Client(timeout=timeout) as client:
start = time.time()
r = client.post(url, data=data, files=files, headers=headers)
end = time.time()
print(f'Time elapsed: {end - start}s')
print(r.status_code, r.json(), sep=' ')

Send file via JSON instead of uploading to server, Django

I have an app that currently allows a user to upload a file and it saves the file on the web server. My client has now decided to use a third party cloud hosting service for their file storage needs. The company has their own API for doing CRUD operations on their server, so I wrote a script to test their API and it sends a file as a base64 encoded JSON payload to the API. The script works fine but now I'm stuck on how exactly how I should implement this functionality into Django.
json_testing.py
import base64
import json
import requests
import magic
filename = 'test.txt'
# Open file and read file and encode it as a base64 string
with open(filename, "rb") as test_file:
encoded_string = base64.b64encode(test_file.read())
# Get MIME type using magic module
mime = magic.Magic(mime=True)
mime_type = mime.from_file(filename)
# Concatenate MIME type and encoded string with string data
# Use .decode() on byte data for mime_type and encoded string
file_string = 'data:%s;base64,%s' % (mime_type.decode(), encoded_string.decode())
payload = {
"client_id": 1,
"file": file_string
}
headers = {
"token": "AuthTokenGoesHere",
"content-type": "application/json",
}
request = requests.post('https://api.website.com/api/files/', json=payload, headers=headers)
print(request.json())
models.py
def upload_location(instance, filename):
return '%s/documents/%s' % (instance.user.username, filename)
class Document(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL)
category = models.ForeignKey(Category, on_delete=models.CASCADE)
file = models.FileField(upload_to=upload_location)
def __str__(self):
return self.filename()
def filename(self):
return os.path.basename(self.file.name)
So to reiterate, when a user uploads a file, instead of storing the file somewhere on the web server, I want to base64 encode the file so I can send the file as a JSON payload. Any ideas on what would be the best way to approach this?

The simplest way I can put this is that I want to avoid saving the
file to the web server entirely. I just want to encode the file, send
it as a payload, and discard it, if that's possible.
From the django docs:
Upload Handlers
When a user uploads a file, Django passes off the file data to an
upload handler – a small class that handles file data as it gets
uploaded. Upload handlers are initially defined in the
FILE_UPLOAD_HANDLERS setting, which defaults to:
["django.core.files.uploadhandler.MemoryFileUploadHandler",
"django.core.files.uploadhandler.TemporaryFileUploadHandler"]
Together
MemoryFileUploadHandler and TemporaryFileUploadHandler provide
Django’s default file upload behavior of reading small files into
memory and large ones onto disk.
You can write custom handlers that customize how Django handles files.
You could, for example, use custom handlers to enforce user-level
quotas, compress data on the fly, render progress bars, and even send
data to another storage location directly without storing it locally.
See Writing custom upload handlers for details on how you can
customize or completely replace upload behavior.
Contrary thoughts:
I think you should consider sticking with the default file upload handlers because they keep someone from uploading a file that will overwhelm the server's memory.
Where uploaded data is stored
Before you save uploaded files, the data needs to be stored somewhere.
By default, if an uploaded file is smaller than 2.5 megabytes, Django
will hold the entire contents of the upload in memory. This means that
saving the file involves only a read from memory and a write to disk
and thus is very fast.
However, if an uploaded file is too large, Django will write the
uploaded file to a temporary file stored in your system’s temporary
directory. On a Unix-like platform this means you can expect Django to
generate a file called something like /tmp/tmpzfp6I6.upload. If an
upload is large enough, you can watch this file grow in size as Django
streams the data onto disk.
These specifics – 2.5 megabytes; /tmp; etc. – are simply “reasonable
defaults” which can be customized as described in the next section.
request.FILES info:
#forms.py:
from django import forms
class UploadFileForm(forms.Form):
title = forms.CharField(max_length=50)
json_file = forms.FileField()
A view handling this form will receive the file data in request.FILES,
which is a dictionary containing a key for each FileField (or
ImageField, or other FileField subclass) in the form. So the data from
the above form would be accessible as request.FILES[‘json_file’].
Note that request.FILES will only contain data if the request method
was POST and the <form> that posted the request has the attribute
enctype="multipart/form-data". Otherwise, request.FILES will be empty.
HttpRequest.FILES
A dictionary-like object containing all uploaded files. Each key in
FILES is the name from the <input type="file" name="" />. Each value
in FILES is an UploadedFile.
Upload Handlers
When a user uploads a file, Django passes off the file data to an
upload handler – a small class that handles file data as it gets
uploaded. Upload handlers are initially defined in the
FILE_UPLOAD_HANDLERS setting, which defaults to:
["django.core.files.uploadhandler.MemoryFileUploadHandler",
"django.core.files.uploadhandler.TemporaryFileUploadHandler"]
The source code for TemporaryFileUploadHandler contains this:
lass TemporaryFileUploadHandler(FileUploadHandler):
"""
Upload handler that streams data into a temporary file.
"""
...
...
def new_file(self, *args, **kwargs):
"""
Create the file object to append to as data is coming in.
"""
...
self.file = TemporaryUploadedFile(....) #<***HERE
And the source code for TemporaryUploadedFile contains this:
class TemporaryUploadedFile(UploadedFile):
"""
A file uploaded to a temporary location (i.e. stream-to-disk).
"""
def __init__(self, name, content_type, size, charset, content_type_extra=None):
...
file = tempfile.NamedTemporaryFile(suffix='.upload') #<***HERE
And the python tempfile docs say this:
tempfile.NamedTemporaryFile(...., delete=True)
...
If delete is true (the default), the file is deleted as soon as it is closed.
Similarly, the other of the two default file upload handlers, MemoryFileUploadHandler, creates a file of type BytesIO:
A stream implementation using an in-memory bytes buffer. It inherits
BufferedIOBase. The buffer is discarded when the close() method is
called.
Therefore, all you have to do is close request.FILES[“field_name”] to erase the file (whether the file contents are stored in memory or on disk in the /tmp file directory), e.g.:
uploaded_file = request.FILES[“json_file”]
file_contents = uploaded_file.read()
#Send file_contents to other server here.
uploaded_file.close() #erases file
If for some reason you don't want django to write to the server's /tmp directory at all, then you'll need to write a custom file upload handler to reject uploaded files that are too large.

How do you get Google App Engine to gunzip during download?

I am trying to get Google App Engine to gunzip my .gz blob file (single file compressed) automatically by setting the response headers as follows:
class download(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
resource = str(urllib.unquote(resource))
blob_info = blobstore.BlobInfo.get(resource)
self.response.headers['Content-Encoding'] = str('gzip')
# self.response.headers['Content-type'] = str('application/x-gzip')
self.response.headers['Content-type'] = str(blob_info.content_type)
self.response.headers['Content-Length'] = str(blob_info.size)
cd = 'attachment; filename=%s' % (blob_info.filename)
self.response.headers['Content-Disposition'] = str(cd)
self.response.headers['Cache-Control'] = str('must-revalidate, post-check=0, pre-check=0')
self.response.headers['Pragma'] = str(' public')
self.send_blob(blob_info)
When this runs, the file is downloaded without the .gz extension. However, the downloaded file is still gzipped. The file size of the downloaded data match the .gz file size on the server. Also, I can confirm this by manually gunzipping the downloaded file. I am trying to avoid the manual gunzip step.
I am trying to get the blob file to automatically gunzip during the download. What am I doing wrong?
By the way, the gzip file contains only a single file. On my self-hosted (non Google) server, I could accomplish the automatic gunzip by setting same response headers; albeit, my code there is written in PHP.
UPDATE:
I rewrote the handler to serve data from the bucket. However, this generates HTML 500 error. The file is partially downloaded before the failure. The rewrite is as follows:
class download(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
resource = str(urllib.unquote(resource))
blob_info = blobstore.BlobInfo.get(resource)
file = '/gs/mydatabucket/%s' % blob_info.filename
print file
self.response.headers['Content-Encoding'] = str('gzip')
self.response.headers['Content-Type'] = str('application/x-gzip')
# self.response.headers['Content-Length'] = str(blob_info.size)
cd = 'filename=%s' % (file)
self.response.headers['Content-Disposition'] = str(cd)
self.response.headers['Cache-Control'] = str('must-revalidate, post-check=0, pre-check=0')
self.response.headers['Pragma'] = str(' public')
self.send_blob(file)
This downloads 540,672 bytes of the 6,094,848 bytes file to the client before the server terminate and issued a 500 error. When I issue 'file' on the partially downloaded file from the command line, Mac OS seems to correctly identify the file format as 'SQLite 3.x database' file. Any idea of why the 500 error on the server? How can I fix the problem?

You should first check to see if your requesting client supports gzipped content. If it does support gzip content encoding, then you may pass the gzipped blob as is with the proper content-encoding and content-type headers, otherwise you need to decompress the blob for the client. You should also verify that your blob's content_type isn't gzip (this depends on how you created your blob to begin with!)
You may also want to look at Google Cloud Storage as this automatically handles gzip transportation so long as you properly compress the data before storing it with the proper content-encoding and content-type metadata.
See this SO question: Google cloud storage console Content-Encoding to gzip
Or the GCS Docs: https://cloud.google.com/storage/docs/gsutil/addlhelp/WorkingWithObjectMetadata#content-encoding
You may use GCS as easily (if not more easily) as you use the blobstore in AppEngine and it seems to be the preferred storage layer to use going forward. I say this because the File API has been deprecated which made blobstore interaction easier and great efforts and advancements have been made to the GCS libraries making the API similar to the base python file interaction API
UPDATE:
Since the objects are stored in GCS, you can use 302 redirects to point users to files rather than relying on the Blobstore API. This eliminates any unknown behavior of the Blobstore API and GAE delivering your stored objects with the content-type and content-encoding you intended to use. For objects with a public-read ACL, you may simply direct them to either storage.googleapis.com/<bucket>/<object> or <bucket>.storage.googleapis.com/<object>. Alternatively, if you'd like to have application logic dictate access, you should keep the ACL to the objects private and can use GCS Signed URLs to create short lived URLs to use when doing a 302 redirect.
Its worth noting that if you want users to be able to upload objects via GAE, you'd still use the Blobstore API to handle storing the file in GCS, but you'd have to modify the object after it was uploaded to ensure proper gzip compressing and content-encoding meta data is used.
class legacy_download(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
filename = str(urllib.unquote(resource))
url = 'https://storage.googleapis.com/mybucket/' + filename
self.redirect(url)

GAE already serves everything using gzip if the client supports it.
So I think what's happening after your update is that the browser expects there to be more of the file, but GAE thinks it's already at the end of the file since it's already gzipped. That's why you get the 500.
(if that makes sense)
Anyway, since GAE already handles compression for you, the easiest way is probably to put non compressed files in GCS and let the Google infrastructure handle the compression automatically for you when you serve them.

using python tkinter for html file i/o

I have a web desktop app which writes and reads to local file system using XPCOM filepicker and it works flawlessly in firefox12. However later firefox versions (particularly current one v17) completely disables use of xpcom file functions.
I thought of passing file requests to python's tkinter on a server on the local machine. I can open the tkinter filepicker in IDLE and from a .py or .cgi file, but how to get the file dialog to appear back in the calling html page of the app? I need that interactivity without leaving the app's page. Any ideas much appreciated.

Depending on the complexity and frequency of I/O, you could just post the parameters to a python CGI script and return the result in a JSON object
First, in JS:
function doIO (data) {
var request = new XMLHttpRequest();
request.onreadystatechange = function () {
if (request.readyStatus == 4) {
if (request.status == 200) {
alert('Success!');
var data = JSON.parse(xmlhttp.responseText);
// do something with response data
}
else { alert('Failure'!); }
}
};
request.open("POST", python_cgi, true);
request.send(data);
}
In python you would need to implement a CGI script to parse the data, figure out the requested I/O and perform it. Your python script could be something like the following:
import cgi, json, sys
form = cgi.FieldStorage()
command = form.getFirst('command')
if command == "filePicker":
# run tkinter file picker
# build json response object
elif commoand == "myIoCommand":
# do some I/O
# build json response object
print "Status: 200"
print "Content-Type: application/json"
json.dump(response, sys.stdout)
See for example Google Maps' JSON responses if you need some inspiration forming your json response object.
If you need more frequent/complex I/O, maybe what you would have to do is set up a Python server with mirrored state to your application via smaller, more frequent AJAX calls. You can use a framework to make a RESTful app, or you can roll your own by subclassing from BaseHTTPServer.

GAE - how to use blobstore stub in testbed?

My code goes like this:
self.testbed.init_blobstore_stub()
upload_url = blobstore.create_upload_url('/image')
upload_url = re.sub('^http://testbed\.example\.com', '', upload_url)
response = self.testapp.post(upload_url, params={
'shopid': id,
'description': 'JLo',
}, upload_files=[('file', imgPath)])
self.assertEqual(response.status_int, 200)
how come it shows 404 error? For some reasons the upload path does not seem to exist at all.

You can't do this. I think the problem is that webtest (which I assume is where self.testapp came from) doesn't work well with testbed blobstore functionality. You can find some info at this question.
My solution was to override unittest.TestCase and add the following methods:
def create_blob(self, contents, mime_type):
"Since uploading blobs doesn't work in testing, create them this way."
fn = files.blobstore.create(mime_type = mime_type,
_blobinfo_uploaded_filename = "foo.blt")
with files.open(fn, 'a') as f:
f.write(contents)
files.finalize(fn)
return files.blobstore.get_blob_key(fn)
def get_blob(self, key):
return self.blobstore_stub.storage.OpenBlob(key).read()
You will also need the solution here.
For my tests where I would normally do a get or post to a blobstore handler, I instead call one of the two methods above. It is a bit hacky but it works.
Another solution I am considering is to use Selenium's HtmlUnit driver. This would require the dev server to be running but should allow full testing of blobstore and also javascript (as a side benefit).

I think Kekito is right, you cannot POST to the upload_url directly.
But if you want to test the BlobstoreUploadHandler, you can fake the POST request it would normally received from the blobstore in the following way. Assuming your handler is at /handler :
import email
...
def test_upload(self):
blob_key = 'abcd'
# The blobstore upload handler receives a multipart form request
# containing uploaded files. But instead of containing the actual
# content, the files contain an 'email' message that has some meta
# information about the file. They also contain a blob-key that is
# the key to get the blob from the blobstore
# see blobstore._get_upload_content
m = email.message.Message()
m.add_header('Content-Type', 'image/png')
m.add_header('Content-Length', '100')
m.add_header('X-AppEngine-Upload-Creation', '2014-03-02 23:04:05.123456')
# This needs to be valie base64 encoded
m.add_header('content-md5', 'd74682ee47c3fffd5dcd749f840fcdd4')
payload = m.as_string()
# The blob-key in the Content-type is important
params = [('file', webtest.forms.Upload('test.png', payload,
'image/png; blob-key='+blob_key))]
self.testapp.post('/handler', params, content_type='blob-key')
I figured that out by digging into the blobstore code. The important bit is that the POST request that the blobstore sends to the UploadHandler doesn't contain the file content. Instead, it contains an "email message" (well, informations encoded like in an email) with metadata about the file (content-type, content-length, upload time and md5). It also contains a blob-key that can be used to retrieve the file from the blobstore.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Streaming file upload using bottle (or flask or similar) - python

Related

FastAPI: Transfering large files (200MB images) fast [duplicate]

Send file via JSON instead of uploading to server, Django

How do you get Google App Engine to gunzip during download?

using python tkinter for html file i/o

GAE - how to use blobstore stub in testbed?

Categories

Resources