Extract file name from an api call - python

I am trying to upload a file via API gateway to a lambda.
I can get the file data by below
s3 = boto3.client("s3")
# decoding form-data into bytes
post_data = base64.b64decode(event["body"])
but am unclear on how to get the file name from here.
I can get this to work if I use a separate field as the file name, but would prefer to automatically detect it

Related

Python Django s3 error "file must be encoded before hashing"

We send a file via API.
When the file is saved locally or on the same ec2 instance, all works fine, and we get the response from the API
When the file is saved on AWS s3, we get the error 'Unicode-objects must be encoded before hashing'
This is the code that works to open and send file from local device but not when getting the file from s3
my_file = self.original_file.open(mode='rb').read()

Save .xlsx file to Azure blob storage

I have a Django application and form which accepts from a user an Excel(.xlsx) and CSV (.csv) file. I need to save both files to Azure Blob Storage. I found it to be trivial to handle the .csv file but the same code fails when attempting up upload an xlsx file:
from azure.storage.blob import BlobServiceClient
# This code executes successfully when saving a CSV to blob storage
blob_service_client = BlobServiceClient.from_connection_string(os.getenv('STORAGE_CONN_STRING'))
blob_client = blob_service_client.get_blob_client(container="my-container-name", blob=form.cleaned_data.get('name_of_form_field_for_csv_file'))
blob_client.upload_blob(form.cleaned_data.get('name_of_form_field_for_csv_file''))
# This code fails when saving xlsx to blob storage
blob_client = blob_service_client.get_blob_client(container="my-container-name", blob=form.cleaned_data.get('name_of_form_field_for_xlsx_file'))
blob_client.upload_blob(form.cleaned_data.get('name_of_form_field_for_xlsx_file''))
ClientAuthenticationError at /mypage/create/
Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
However, I've been unable to figure out how to save the .xlsx file. I--perhaps somewhat naively--assumed I could pass the .xlsx file as-is (like the .csv example above) but I get the error:
ClientAuthenticationError at /mypage/create/
Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
I found this SO Answer about the above error, but there's no concensus at all on what the error means and I've been unable to progress much further from that link. However, there was some discussion about sending the data to Azure blob storage as a byte stream. Is this a possible way forward? I should note here that, ideally, I need to process the files in memory as my app is deployed within App Service (my understanding is that I don't have access to a file system in which to create and manipulate files.)
I have also learned that .xlsx files are compressed so do I need to first decompress the file and then send it as a byte stream? If so, has anyone got any experience with this who could point me in the right direction?
Storage account connection string:
STORAGE_CONN_STRING=DefaultEndpointsProtocol=https;AccountName=REDACTED;AccountKey=REDACTED;EndpointSuffix=core.windows.net
Did you try like below:
# Create a local directory to hold blob data
local_path = "./data"
os.mkdir(local_path)
# Create a file in the local data directory to upload and download
local_file_name = str(uuid.uuid4()) + ".xlsx"
upload_file_path = os.path.join(local_path, local_file_name)
# Write text to the file
file = open(upload_file_path, 'w')
file.write("Hello, World!")
file.close()
# Create a blob client using the local file name as the name for the blob
blob_client =
blob_service_client.get_blob_client(container=container_name,
blob=local_file_name)
# Upload the created file
with open(upload_file_path, "rb") as data:
blob_client.upload_blob(data)
https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python
For reasons I don't fully understand (comments welcome for an explanation!), I can successfully save a .xlsx file to Azure Blob Storage with:
self.request.FILES['name_of_form_field_for_xlsx_file']
I suspect there's a difference in how csv vs. xlsx files are handled between request.FILES and form.cleaned_data.get() in Django, resulting in an authentication error as per the original question.
The full code to save a .csv and then a .xlsx is (note this is within a FormView):
from azure.storage.blob import BlobServiceClient
# Set connection string
blob_service_client = BlobServiceClient.from_connection_string(os.getenv('STORAGE_CONN_STRING'))
# Upload an xlsx file
blob_client = blob_service_client.get_blob_client(container="my-container", blob=self.request.FILES['xlsx_file'])
blob_client.upload_blob(self.request.FILES['xlsx_file'])
# Upload a CSV file
blob_client = blob_service_client.get_blob_client(container="my-container", blob=form.cleaned_data.get('csv_file'))
blob_client.upload_blob(form.cleaned_data.get('csv_file'))

Uploading and receiving json data and file in python3 via postman

i was a bit curious about how can i send json data and files via postman and receive the json data and the same file in my flask application.
Is there an convenient way to send files or shall i save the file in
another route and generate an url and pass it in the request json. Or
shall i directly send the file and save it in my server file system ?
if i do so ,can i fetch the file from the server ?
i would appreciate any help.
Code :
import os
from werkzeug.utils import secure_filename
class Test(Resource):
def post(self):
# keys = request.json.keys()
dat = request.form['request']
file_path = request.files['file_path']
file_path.save(os.path.join(app.config['UPLOAD_FOLDER'], secure_filename(file_path.filename)))
# create the folders when setting up your app
os.makedirs(os.path.join(app.instance_path, 'htmlfi'), exist_ok=True)
# when saving the file
file_path.save(os.path.join(app.instance_path, 'htmlfi', secure_filename(file_path.filename)))
print(dat)
# company_id =flask_praetorian.current_user().company_id
# data = dict(request.json)
# print(data)
return "done"
api.add_resource(Test,'/Test_data')
I am able to get the data ,but it is not json but manageable. but is it an efficient way to send file directly and save it in file system or is it better to use google cloud storage as i am using gcp? i was think about server load.
Also it is hectic to check for valid keys ,
e.g i have to
if "keys" not in request.json.keys():
which makes my work easier, but in the form data approach , i have to check like request.form['request'][0] for id key and as such
You can send your data at your python code, you dont have to send .json file to your server. If you are using dictionary data type, convert it to json and send your server in your request’s body. You will see the data at postman. If you want to save that as a json file, maybe you can get the data and do that at your server side.

Get content_type from Google Cloud file

I have two api endpoints, one that takes a file from an http request and uploads it to a google cloud bucket using the python api, and another that downloads it again. in the first view, i get the file content type from the http request and upload it to the bucket,setting that metadata:
from google.cloud import storage
file_obj = request.FILES['file']
client = storage.Client.from_service_account_json(path.join(
path.realpath(path.dirname(__file__)),
'..',
'settings',
'api-key.json'
))
bucket = client.get_bucket('storage-bucket')
blob = bucket.blob(filename)
blob.upload_from_string(
file_text,
content_type=file_obj.content_type
)
Then in another view, I download the file:
...
bucket = client.get_bucket('storage-bucket')
blob = bucket.blob(filename)
blob.download_to_filename(path)
How can I access the file metadata I set earlier (content_type) ? It's not available on the blob object anymore since a new one was instantiated, but it still holds the file.
You should try
blob = bucket.get_blob(blob_name)
blob.content_type

Send file via JSON instead of uploading to server, Django

I have an app that currently allows a user to upload a file and it saves the file on the web server. My client has now decided to use a third party cloud hosting service for their file storage needs. The company has their own API for doing CRUD operations on their server, so I wrote a script to test their API and it sends a file as a base64 encoded JSON payload to the API. The script works fine but now I'm stuck on how exactly how I should implement this functionality into Django.
json_testing.py
import base64
import json
import requests
import magic
filename = 'test.txt'
# Open file and read file and encode it as a base64 string
with open(filename, "rb") as test_file:
encoded_string = base64.b64encode(test_file.read())
# Get MIME type using magic module
mime = magic.Magic(mime=True)
mime_type = mime.from_file(filename)
# Concatenate MIME type and encoded string with string data
# Use .decode() on byte data for mime_type and encoded string
file_string = 'data:%s;base64,%s' % (mime_type.decode(), encoded_string.decode())
payload = {
"client_id": 1,
"file": file_string
}
headers = {
"token": "AuthTokenGoesHere",
"content-type": "application/json",
}
request = requests.post('https://api.website.com/api/files/', json=payload, headers=headers)
print(request.json())
models.py
def upload_location(instance, filename):
return '%s/documents/%s' % (instance.user.username, filename)
class Document(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL)
category = models.ForeignKey(Category, on_delete=models.CASCADE)
file = models.FileField(upload_to=upload_location)
def __str__(self):
return self.filename()
def filename(self):
return os.path.basename(self.file.name)
So to reiterate, when a user uploads a file, instead of storing the file somewhere on the web server, I want to base64 encode the file so I can send the file as a JSON payload. Any ideas on what would be the best way to approach this?
The simplest way I can put this is that I want to avoid saving the
file to the web server entirely. I just want to encode the file, send
it as a payload, and discard it, if that's possible.
From the django docs:
Upload Handlers
When a user uploads a file, Django passes off the file data to an
upload handler – a small class that handles file data as it gets
uploaded. Upload handlers are initially defined in the
FILE_UPLOAD_HANDLERS setting, which defaults to:
["django.core.files.uploadhandler.MemoryFileUploadHandler",
"django.core.files.uploadhandler.TemporaryFileUploadHandler"]
Together
MemoryFileUploadHandler and TemporaryFileUploadHandler provide
Django’s default file upload behavior of reading small files into
memory and large ones onto disk.
You can write custom handlers that customize how Django handles files.
You could, for example, use custom handlers to enforce user-level
quotas, compress data on the fly, render progress bars, and even send
data to another storage location directly without storing it locally.
See Writing custom upload handlers for details on how you can
customize or completely replace upload behavior.
Contrary thoughts:
I think you should consider sticking with the default file upload handlers because they keep someone from uploading a file that will overwhelm the server's memory.
Where uploaded data is stored
Before you save uploaded files, the data needs to be stored somewhere.
By default, if an uploaded file is smaller than 2.5 megabytes, Django
will hold the entire contents of the upload in memory. This means that
saving the file involves only a read from memory and a write to disk
and thus is very fast.
However, if an uploaded file is too large, Django will write the
uploaded file to a temporary file stored in your system’s temporary
directory. On a Unix-like platform this means you can expect Django to
generate a file called something like /tmp/tmpzfp6I6.upload. If an
upload is large enough, you can watch this file grow in size as Django
streams the data onto disk.
These specifics – 2.5 megabytes; /tmp; etc. – are simply “reasonable
defaults” which can be customized as described in the next section.
request.FILES info:
#forms.py:
from django import forms
class UploadFileForm(forms.Form):
title = forms.CharField(max_length=50)
json_file = forms.FileField()
A view handling this form will receive the file data in request.FILES,
which is a dictionary containing a key for each FileField (or
ImageField, or other FileField subclass) in the form. So the data from
the above form would be accessible as request.FILES[‘json_file’].
Note that request.FILES will only contain data if the request method
was POST and the <form> that posted the request has the attribute
enctype="multipart/form-data". Otherwise, request.FILES will be empty.
HttpRequest.FILES
A dictionary-like object containing all uploaded files. Each key in
FILES is the name from the <input type="file" name="" />. Each value
in FILES is an UploadedFile.
Upload Handlers
When a user uploads a file, Django passes off the file data to an
upload handler – a small class that handles file data as it gets
uploaded. Upload handlers are initially defined in the
FILE_UPLOAD_HANDLERS setting, which defaults to:
["django.core.files.uploadhandler.MemoryFileUploadHandler",
"django.core.files.uploadhandler.TemporaryFileUploadHandler"]
The source code for TemporaryFileUploadHandler contains this:
lass TemporaryFileUploadHandler(FileUploadHandler):
"""
Upload handler that streams data into a temporary file.
"""
...
...
def new_file(self, *args, **kwargs):
"""
Create the file object to append to as data is coming in.
"""
...
self.file = TemporaryUploadedFile(....) #<***HERE
And the source code for TemporaryUploadedFile contains this:
class TemporaryUploadedFile(UploadedFile):
"""
A file uploaded to a temporary location (i.e. stream-to-disk).
"""
def __init__(self, name, content_type, size, charset, content_type_extra=None):
...
file = tempfile.NamedTemporaryFile(suffix='.upload') #<***HERE
And the python tempfile docs say this:
tempfile.NamedTemporaryFile(...., delete=True)
...
If delete is true (the default), the file is deleted as soon as it is closed.
Similarly, the other of the two default file upload handlers, MemoryFileUploadHandler, creates a file of type BytesIO:
A stream implementation using an in-memory bytes buffer. It inherits
BufferedIOBase. The buffer is discarded when the close() method is
called.
Therefore, all you have to do is close request.FILES[“field_name”] to erase the file (whether the file contents are stored in memory or on disk in the /tmp file directory), e.g.:
uploaded_file = request.FILES[“json_file”]
file_contents = uploaded_file.read()
#Send file_contents to other server here.
uploaded_file.close() #erases file
If for some reason you don't want django to write to the server's /tmp directory at all, then you'll need to write a custom file upload handler to reject uploaded files that are too large.

Categories

Resources