Send file via JSON instead of uploading to server, Django - python

I have an app that currently allows a user to upload a file and it saves the file on the web server. My client has now decided to use a third party cloud hosting service for their file storage needs. The company has their own API for doing CRUD operations on their server, so I wrote a script to test their API and it sends a file as a base64 encoded JSON payload to the API. The script works fine but now I'm stuck on how exactly how I should implement this functionality into Django.
json_testing.py
import base64
import json
import requests
import magic
filename = 'test.txt'
# Open file and read file and encode it as a base64 string
with open(filename, "rb") as test_file:
encoded_string = base64.b64encode(test_file.read())
# Get MIME type using magic module
mime = magic.Magic(mime=True)
mime_type = mime.from_file(filename)
# Concatenate MIME type and encoded string with string data
# Use .decode() on byte data for mime_type and encoded string
file_string = 'data:%s;base64,%s' % (mime_type.decode(), encoded_string.decode())
payload = {
"client_id": 1,
"file": file_string
}
headers = {
"token": "AuthTokenGoesHere",
"content-type": "application/json",
}
request = requests.post('https://api.website.com/api/files/', json=payload, headers=headers)
print(request.json())
models.py
def upload_location(instance, filename):
return '%s/documents/%s' % (instance.user.username, filename)
class Document(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL)
category = models.ForeignKey(Category, on_delete=models.CASCADE)
file = models.FileField(upload_to=upload_location)
def __str__(self):
return self.filename()
def filename(self):
return os.path.basename(self.file.name)
So to reiterate, when a user uploads a file, instead of storing the file somewhere on the web server, I want to base64 encode the file so I can send the file as a JSON payload. Any ideas on what would be the best way to approach this?

The simplest way I can put this is that I want to avoid saving the
file to the web server entirely. I just want to encode the file, send
it as a payload, and discard it, if that's possible.
From the django docs:
Upload Handlers
When a user uploads a file, Django passes off the file data to an
upload handler – a small class that handles file data as it gets
uploaded. Upload handlers are initially defined in the
FILE_UPLOAD_HANDLERS setting, which defaults to:
["django.core.files.uploadhandler.MemoryFileUploadHandler",
"django.core.files.uploadhandler.TemporaryFileUploadHandler"]
Together
MemoryFileUploadHandler and TemporaryFileUploadHandler provide
Django’s default file upload behavior of reading small files into
memory and large ones onto disk.
You can write custom handlers that customize how Django handles files.
You could, for example, use custom handlers to enforce user-level
quotas, compress data on the fly, render progress bars, and even send
data to another storage location directly without storing it locally.
See Writing custom upload handlers for details on how you can
customize or completely replace upload behavior.
Contrary thoughts:
I think you should consider sticking with the default file upload handlers because they keep someone from uploading a file that will overwhelm the server's memory.
Where uploaded data is stored
Before you save uploaded files, the data needs to be stored somewhere.
By default, if an uploaded file is smaller than 2.5 megabytes, Django
will hold the entire contents of the upload in memory. This means that
saving the file involves only a read from memory and a write to disk
and thus is very fast.
However, if an uploaded file is too large, Django will write the
uploaded file to a temporary file stored in your system’s temporary
directory. On a Unix-like platform this means you can expect Django to
generate a file called something like /tmp/tmpzfp6I6.upload. If an
upload is large enough, you can watch this file grow in size as Django
streams the data onto disk.
These specifics – 2.5 megabytes; /tmp; etc. – are simply “reasonable
defaults” which can be customized as described in the next section.
request.FILES info:
#forms.py:
from django import forms
class UploadFileForm(forms.Form):
title = forms.CharField(max_length=50)
json_file = forms.FileField()
A view handling this form will receive the file data in request.FILES,
which is a dictionary containing a key for each FileField (or
ImageField, or other FileField subclass) in the form. So the data from
the above form would be accessible as request.FILES[‘json_file’].
Note that request.FILES will only contain data if the request method
was POST and the <form> that posted the request has the attribute
enctype="multipart/form-data". Otherwise, request.FILES will be empty.
HttpRequest.FILES
A dictionary-like object containing all uploaded files. Each key in
FILES is the name from the <input type="file" name="" />. Each value
in FILES is an UploadedFile.
Upload Handlers
When a user uploads a file, Django passes off the file data to an
upload handler – a small class that handles file data as it gets
uploaded. Upload handlers are initially defined in the
FILE_UPLOAD_HANDLERS setting, which defaults to:
["django.core.files.uploadhandler.MemoryFileUploadHandler",
"django.core.files.uploadhandler.TemporaryFileUploadHandler"]
The source code for TemporaryFileUploadHandler contains this:
lass TemporaryFileUploadHandler(FileUploadHandler):
"""
Upload handler that streams data into a temporary file.
"""
...
...
def new_file(self, *args, **kwargs):
"""
Create the file object to append to as data is coming in.
"""
...
self.file = TemporaryUploadedFile(....) #<***HERE
And the source code for TemporaryUploadedFile contains this:
class TemporaryUploadedFile(UploadedFile):
"""
A file uploaded to a temporary location (i.e. stream-to-disk).
"""
def __init__(self, name, content_type, size, charset, content_type_extra=None):
...
file = tempfile.NamedTemporaryFile(suffix='.upload') #<***HERE
And the python tempfile docs say this:
tempfile.NamedTemporaryFile(...., delete=True)
...
If delete is true (the default), the file is deleted as soon as it is closed.
Similarly, the other of the two default file upload handlers, MemoryFileUploadHandler, creates a file of type BytesIO:
A stream implementation using an in-memory bytes buffer. It inherits
BufferedIOBase. The buffer is discarded when the close() method is
called.
Therefore, all you have to do is close request.FILES[“field_name”] to erase the file (whether the file contents are stored in memory or on disk in the /tmp file directory), e.g.:
uploaded_file = request.FILES[“json_file”]
file_contents = uploaded_file.read()
#Send file_contents to other server here.
uploaded_file.close() #erases file
If for some reason you don't want django to write to the server's /tmp directory at all, then you'll need to write a custom file upload handler to reject uploaded files that are too large.

Related

Generating a file for download without storing a copy locally on the server

I am creating a web application through Flask, which converts uploaded files into Base64 and stores this in a separate file which the user can then download.
This can be quite extensive in regards to storage as the file has to be saved once converted and then the user can download it, is it possible to create this file and push it to be downloaded by the user without having to store it locally?
You can create files on the server and return them for download using a Response object.
As an example, returning a text file (taken from a real life web app):
#app.route('/account/security/2fa_backup_codes.txt')
#login_required
def backup_codes():
codes = g.user.tfa_backup.replace(',', '\r\n')
generator = (cell for row in codes for cell in row)
return Response(generator, mimetype='text/plain',
headers={'Content-Disposition':
'attachment;filename=2fa_backup_codes.txt'})

How do you get Google App Engine to gunzip during download?

I am trying to get Google App Engine to gunzip my .gz blob file (single file compressed) automatically by setting the response headers as follows:
class download(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
resource = str(urllib.unquote(resource))
blob_info = blobstore.BlobInfo.get(resource)
self.response.headers['Content-Encoding'] = str('gzip')
# self.response.headers['Content-type'] = str('application/x-gzip')
self.response.headers['Content-type'] = str(blob_info.content_type)
self.response.headers['Content-Length'] = str(blob_info.size)
cd = 'attachment; filename=%s' % (blob_info.filename)
self.response.headers['Content-Disposition'] = str(cd)
self.response.headers['Cache-Control'] = str('must-revalidate, post-check=0, pre-check=0')
self.response.headers['Pragma'] = str(' public')
self.send_blob(blob_info)
When this runs, the file is downloaded without the .gz extension. However, the downloaded file is still gzipped. The file size of the downloaded data match the .gz file size on the server. Also, I can confirm this by manually gunzipping the downloaded file. I am trying to avoid the manual gunzip step.
I am trying to get the blob file to automatically gunzip during the download. What am I doing wrong?
By the way, the gzip file contains only a single file. On my self-hosted (non Google) server, I could accomplish the automatic gunzip by setting same response headers; albeit, my code there is written in PHP.
UPDATE:
I rewrote the handler to serve data from the bucket. However, this generates HTML 500 error. The file is partially downloaded before the failure. The rewrite is as follows:
class download(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
resource = str(urllib.unquote(resource))
blob_info = blobstore.BlobInfo.get(resource)
file = '/gs/mydatabucket/%s' % blob_info.filename
print file
self.response.headers['Content-Encoding'] = str('gzip')
self.response.headers['Content-Type'] = str('application/x-gzip')
# self.response.headers['Content-Length'] = str(blob_info.size)
cd = 'filename=%s' % (file)
self.response.headers['Content-Disposition'] = str(cd)
self.response.headers['Cache-Control'] = str('must-revalidate, post-check=0, pre-check=0')
self.response.headers['Pragma'] = str(' public')
self.send_blob(file)
This downloads 540,672 bytes of the 6,094,848 bytes file to the client before the server terminate and issued a 500 error. When I issue 'file' on the partially downloaded file from the command line, Mac OS seems to correctly identify the file format as 'SQLite 3.x database' file. Any idea of why the 500 error on the server? How can I fix the problem?
You should first check to see if your requesting client supports gzipped content. If it does support gzip content encoding, then you may pass the gzipped blob as is with the proper content-encoding and content-type headers, otherwise you need to decompress the blob for the client. You should also verify that your blob's content_type isn't gzip (this depends on how you created your blob to begin with!)
You may also want to look at Google Cloud Storage as this automatically handles gzip transportation so long as you properly compress the data before storing it with the proper content-encoding and content-type metadata.
See this SO question: Google cloud storage console Content-Encoding to gzip
Or the GCS Docs: https://cloud.google.com/storage/docs/gsutil/addlhelp/WorkingWithObjectMetadata#content-encoding
You may use GCS as easily (if not more easily) as you use the blobstore in AppEngine and it seems to be the preferred storage layer to use going forward. I say this because the File API has been deprecated which made blobstore interaction easier and great efforts and advancements have been made to the GCS libraries making the API similar to the base python file interaction API
UPDATE:
Since the objects are stored in GCS, you can use 302 redirects to point users to files rather than relying on the Blobstore API. This eliminates any unknown behavior of the Blobstore API and GAE delivering your stored objects with the content-type and content-encoding you intended to use. For objects with a public-read ACL, you may simply direct them to either storage.googleapis.com/<bucket>/<object> or <bucket>.storage.googleapis.com/<object>. Alternatively, if you'd like to have application logic dictate access, you should keep the ACL to the objects private and can use GCS Signed URLs to create short lived URLs to use when doing a 302 redirect.
Its worth noting that if you want users to be able to upload objects via GAE, you'd still use the Blobstore API to handle storing the file in GCS, but you'd have to modify the object after it was uploaded to ensure proper gzip compressing and content-encoding meta data is used.
class legacy_download(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
filename = str(urllib.unquote(resource))
url = 'https://storage.googleapis.com/mybucket/' + filename
self.redirect(url)
GAE already serves everything using gzip if the client supports it.
So I think what's happening after your update is that the browser expects there to be more of the file, but GAE thinks it's already at the end of the file since it's already gzipped. That's why you get the 500.
(if that makes sense)
Anyway, since GAE already handles compression for you, the easiest way is probably to put non compressed files in GCS and let the Google infrastructure handle the compression automatically for you when you serve them.

GAE - how to use blobstore stub in testbed?

My code goes like this:
self.testbed.init_blobstore_stub()
upload_url = blobstore.create_upload_url('/image')
upload_url = re.sub('^http://testbed\.example\.com', '', upload_url)
response = self.testapp.post(upload_url, params={
'shopid': id,
'description': 'JLo',
}, upload_files=[('file', imgPath)])
self.assertEqual(response.status_int, 200)
how come it shows 404 error? For some reasons the upload path does not seem to exist at all.
You can't do this. I think the problem is that webtest (which I assume is where self.testapp came from) doesn't work well with testbed blobstore functionality. You can find some info at this question.
My solution was to override unittest.TestCase and add the following methods:
def create_blob(self, contents, mime_type):
"Since uploading blobs doesn't work in testing, create them this way."
fn = files.blobstore.create(mime_type = mime_type,
_blobinfo_uploaded_filename = "foo.blt")
with files.open(fn, 'a') as f:
f.write(contents)
files.finalize(fn)
return files.blobstore.get_blob_key(fn)
def get_blob(self, key):
return self.blobstore_stub.storage.OpenBlob(key).read()
You will also need the solution here.
For my tests where I would normally do a get or post to a blobstore handler, I instead call one of the two methods above. It is a bit hacky but it works.
Another solution I am considering is to use Selenium's HtmlUnit driver. This would require the dev server to be running but should allow full testing of blobstore and also javascript (as a side benefit).
I think Kekito is right, you cannot POST to the upload_url directly.
But if you want to test the BlobstoreUploadHandler, you can fake the POST request it would normally received from the blobstore in the following way. Assuming your handler is at /handler :
import email
...
def test_upload(self):
blob_key = 'abcd'
# The blobstore upload handler receives a multipart form request
# containing uploaded files. But instead of containing the actual
# content, the files contain an 'email' message that has some meta
# information about the file. They also contain a blob-key that is
# the key to get the blob from the blobstore
# see blobstore._get_upload_content
m = email.message.Message()
m.add_header('Content-Type', 'image/png')
m.add_header('Content-Length', '100')
m.add_header('X-AppEngine-Upload-Creation', '2014-03-02 23:04:05.123456')
# This needs to be valie base64 encoded
m.add_header('content-md5', 'd74682ee47c3fffd5dcd749f840fcdd4')
payload = m.as_string()
# The blob-key in the Content-type is important
params = [('file', webtest.forms.Upload('test.png', payload,
'image/png; blob-key='+blob_key))]
self.testapp.post('/handler', params, content_type='blob-key')
I figured that out by digging into the blobstore code. The important bit is that the POST request that the blobstore sends to the UploadHandler doesn't contain the file content. Instead, it contains an "email message" (well, informations encoded like in an email) with metadata about the file (content-type, content-length, upload time and md5). It also contains a blob-key that can be used to retrieve the file from the blobstore.

Read local file from Google App with Python

I'm trying to upload files to the blobstore in my Google App without using a form. But I'm stuck at how to get the app to read my local file. I'm pretty new to python and Google Apps but after some cut and pasting I've ended up with this:
import webapp2
import urllib
import os
from google.appengine.api import files
from poster.encode import multipart_encode
class Upload(webapp2.RequestHandler):
def get(self):
# Create the file in blobstore
file_name = files.blobstore.create(mime_type='application/octet-stream')
# Get the local file path from an url param
file_path = self.request.get('file')
# Read the file
file = open(file_path, "rb")
datagen, headers = multipart_encode({"file": file})
data = str().join(datagen) # this is supposedly memory intense and slow
# Open the blobstore file and write to it
with files.open(file_name, 'a') as f:
f.write(data)
# Finalize the file. Do this before attempting to read it.
files.finalize(file_name)
# Get the file's blob key
blob_key = files.blobstore.get_blob_key(file_name)
The problem now is I don't really know how to get hold of the local file
You can't read from the local file system from within the application itself, you will need to use http POST to send the file to the app.
You can certainly do this from within another application - you just need to create the mime multipart message with the file content and POST it to your app, the sending application will just have to create the http request that you will post to the app manually. You should have a read on how to create a mime mulitpart message using c#.

How to receive an uploaded file to store it using Blobstore API

I have a server side code to process uploaded binary files:
class UploadHandler(webapp.RequestHandler):
def post(self):
file_name = files.blobstore.create(mime_type='application/octet-stream')
with files.open(file_name, 'a') as f:
f.write('data')
files.finalize(file_name)
blob_key = files.blobstore.get_blob_key(file_name)
It's the code from examples so actually it doesn't process any uploaded files, just create a new Blobstore entity and writes some data to this. From the client side I have this part of the code that actually sends the file to the server:
var xhr = new XMLHttpRequest();
xhr.open("post", "/upload", true);
xhr.setRequestHeader("Content-Type", "multipart/form-data");
xhr.setRequestHeader("X-File-Name", file.fileName);
xhr.setRequestHeader("X-File-Size", file.fileSize);
xhr.setRequestHeader("X-File-Type", file.type);
xhr.send(file);
In FireBug I see it uploads the file to the server and the server code creates a file as it is supposed to be. The thing I can't figure out is how to connect these two parts so that server side code could receive the uploaded file as a stream. I don't use forms so I can't get the file with something like upload_files = self.get_uploads('file'). How do I retrieve the file on the server side?
UPDATE: I have found an answer in GAE documentation about webapp request handlers. I need to use something like this uploaded_file = self.request.body to get the file stream. Then I just use f.write(uploaded_file) to save it. It seems to work for me. Please share you thoughts if it's a good approach.
Should be something like this:
class UploadHandler(webapp.RequestHandler):
def post(self):
mime_type = self.request.headers['X-File-Type']
name = self.request.headers['X-File-Name']
file_name = files.blobstore.create(mime_type=mime_type,
_blobinfo_uploaded_filename=name)
with files.open(file_name, 'a') as f:
f.write(self.request.body)
files.finalize(file_name)
blob_key = files.blobstore.get_blob_key(file_name)
Your custom headers and body can be pulled from the WebOb Request object. Note that you don't need to inherit from BlobStoreUploadHandler since you're not using an HTML upload form.

Categories

Resources