Flask Application File Upload - Error while getting contents of file - python

I am developing a flask application which uploads a file to IBM Bluemix Cloudant DB. I need to save the contents of the file as a key value pair in Cloudant.
If I try to save a text file, it reads the content correctly. For other type of files it does not work.
Following is my flask REST API CODE:
#app.route('/upload', methods=['POST'])
def upload_file():
file_to_upload = request.files['file_upload'];
response = CloudantDB().upload_file_to_db(file_to_upload);
//tHE FUNCTION upload_file under CloudantDB is as shown below.
file_name = file.filename;
uploaded_file_content = file.read();
data = {
'file_name': file_name,
'file_contents': uploaded_file_content,
'version': version
}
my_doc = self.database.create_document(data);
I know the error is because "uploaded_file_content" is in a different format (i.e. For PDFs, JPGs etc).
Is there anyway I can overcome this?
Thanks!

The difference is that text files contain ordinary text whereas JPG, PNG etc. contain binary data.
Binary data should be uploaded as an attachment with a mime type, and you need to base64 encode the data. You don't show what create_document() is doing, but it's unlikely that it is able to treat binary data as an attachment. This might fix it for you:
from base64 import b64encode
uploaded_file_content = b64encode(file.read());
data = {
'file_name': file_name,
'version': version,
'_attachments': {
file_name : {
'content-type': 'image/png',
'data': uploaded_file_content
}
}
}
my_doc = self.database.create_document(data);
It should also be possible with your current code to simply base64 encode the file content and upload it. So that you know what type of data is stored should you later retrieve it, you will need to add another key value pair to store the mime type as content-type does above.
Attachments have advantages in that they can be individually addressed, read, deleted, updated without affecting the containing document, so you are probably better off using them.

Related

File corrupted when using send_file() from flask, data from pymongo gridfs

Well my English is not good, and the title may looks weird.
Anyway, I'm now using flask to build a website that can store files, and mongodb is the database.
The file upload, document insert functions have no problems, the weird thing is that the file sent from flask send_file() was truncated for no reasons. Here's my code
from flask import ..., send_file, ...
import pymongo
import gridfs
#...
#app.route("/record/download/<record_id>")
def api_softwares_record_download(record_id):
try:
#...
file = files_gridfs.find_one({"_id": record_id})
file_ext = filetype.guess_extension(file.read(2048))
filename = "{}-{}{}".format(
app["name"],
record["version"],
".{}".format(file_ext) if file_ext else "",
)
response = send_file(file, as_attachment=True, attachment_filename=filename)
return response
except ...
The original image file, for example, is 553KB. But the response body returns 549.61KB, and the image was broken. But if I just directly write the file to my disk
#...
with open('test.png', 'wb+') as file:
file.write(files_gridfs.find_one({"_id": record_id}).read())
The image file size is 553KB and the image is readable.
When I compare the two files with VS Code's text editor, I found that the correct file starts with �PNG, but the corrupted file starts with �ϟ8���>�L�y
search the corrupted file head in the correct file
And I tried to use Blob object and download it from the browser. No difference.
Is there any wrong with my code or I misused send_file()? Or should I use flask_pymongo?
And it's interesting that I have found what is wrong with my code.
This is how I solved it
...file.read(2048)
file.seek(0)
...
file.read(2048)
file.seek(0)
...
response = send_file(file, ...)
return response
And here's why:
For some reasons, I use filetype to detect the file's extension name and mime type, so I sent 2048B to filetype for detection.
file_ext = filetype.guess_extension(file.read(2048))
file_mime = filetype.guess_mime(file.read(2048)) #this line wasn't copied in my question. My fault.
And I have just learned from the pymongo API that python (or pymongo or gridfs, completely unknown to this before) reads file by using a cursor. When I try to find the cursor's position using file.seek(), it returns 4096. So when I call file.read() again in send_file(), the cursor reads from 4096B away to the file head. 549+4=553, and here's the problem.
Finally I set the cursor to position 0 after every read() operation, and it returns the correct file.
Hope this can help if you made the same mistake just like me.

Pyrebase sending profile pic in json

I am creating user accounts using prebase auth.create_user_with_email_and_password. Then I am storing the users' data in firebase realtime database. db.child("users").push("data") where data= {"name": name, "email" : email, "password": password, "picture": picture}.
here picture is; picture = request.files['file']
But I am not able to send picture alongwith other data of user, image can not be sent in json object.
How can we upload picture, there are some solutions to upload image data but I want to send it alngwith other data so it may get stored with the other attributes of user data.
I would suggest two approches you could use:
Solution 1:
You could store your images in file system in any directory (i.e img_dir) and renaming it to ensure the name is unique. (i usually use a timestamp prefix) (myimage_20210101.jpg).Now you can store this name in a DataBase. Then, while generating the JSON, you pull this filename ,generating a complete URL (http://myurl.com/img_dir/myimage_20210101.jpg) and then you can insert it into the JSON.
Solution 2:
Encode the image with Base 64 Encoding and decode it while fetching.
Remember to convert to string first by calling .decode(), since you can't JSON-serialize a bytes without knowing its encoding.
That's because base64.b64encode returns bytes, not strings.
import base64
encoded_= base64.b64encode(img_file.read()).decode('utf-8')
How to save an encoded64 image?
my_picture= '............'
You'll need to decode the picture from base64 first, and then save it to a file:
import base64
# Separate the metadata from the image data
head, data = my_picture.split(',', 1)
# Decode the image data
plain_image = base64.b64decode(data)
# Write the image to a file
with open('image.jpg', 'wb') as f:
f.write(plain_image)

Python Read Binary vs NodeJS Read Binary

I have a method in a Python REST service that expects binary contents of an image file that it extracts and saves it to a file that can later be opened as a valid image file:
#require_http_module(['POST'])
def my_method(request)
image_data = request.FILES['image']
fs.save('image.png', image_data)
If I send a request at this service via a Python script, it works fine:
import requests
image_file = '......image.png'
image_data = open(image_file, 'rb')
requests.post('http://127.0.0.1:8080/...', files = dict(image = image_data))
If, however, I use NodeJS to dispatch a request at the Python service, it doesn't work properly:
import { readFileSync } from 'fs';
import FormData from 'form-data';
import fetch from 'node-fetch';
const IMAGE_DATA = readFileSync('......image.png', { encoding: 'binary' });
const FORM_DATA = new FormData();
FORM_DATA.append('image', IMAGE_DATA, { filename: 'image.png' });
fetch('http://127.0.0.1:8080/...', { method: 'POST', body: FORM_DATA });
Opening the image files saved by the Python service for both these requests in VSCode as binary data makes the reason clear: Encoding/Charset differences between the two clients.
The window on the right is the PNG file sent by Python script and saved by the Python service.
The window on the left is the PNG file sent by NodeJS script and saved by the Python service.
Is there a way to fix this in NodeJS?
EDIT
Upon further testing, I found out that the following:
const IMAGE_DATA_1 = readFileSync('imageSource.png', { encoding: 'binary' });
writeFileSync('imageDest1.png', IMAGE_DATA_1);
// THE FILE imageDest1.png CONTAINS BAD DATA AND CANNOT BE OPENED IN A GRAPHICS PROGRAM
const IMAGE_DATA_2 = readFileSync('imageSource.png');
writeFileSync('imageDest2.png', IMAGE_DATA_2);
// THE FILE imageDest2.png CONTAINS VALID DATA AND CAN BE OPENED IN A GRAPHICS PROGRAM
So the solution was to not use { encoding: 'binary' } flags. However, I would've thought that image files contain binary data and should be opened with binary encoding. So why is this causing issues?

Copy text with formatting from a ArcGIS map server query to a text file, while maintaining formatting, using Python 3?

I am trying to automate a process by which I can query an ArcGIS map server, take the resulting text and save it as a .json file.
The map server can be queried through an API.
api = "https://csg.esri-southafrica.com/server/rest/services/CSGSearch/MapServer/2/query?where=1%3D1&text=&objectIds=&time=&geometry=2053965%2C-4019103%2C2054056%2C-4019169+&geometryType=esriGeometryEnvelope&inSR=3857&spatialRel=esriSpatialRelIntersects&relationParam=&outFields=*&returnGeometry=true&returnTrueCurves=false&maxAllowableOffset=&geometryPrecision=&outSR=3857&returnIdsOnly=false&returnCountOnly=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&returnZ=false&returnM=false&gdbVersion=&returnDistinctValues=false&resultOffset=&resultRecordCount=&queryByDistance=&returnExtentsOnly=false&datumTransformation=&parameterValues=&rangeValues=&f=pjson"
This URL is not that important but when I run this in a browser it gives this response:
{
"displayFieldName": "SP_NAME",
"fieldAliases": {
"OBJECTID": "OBJECTID",
"GID": "Geometry Identifier",
"PRCL_KEY": "26 Digit Code",
"PRCL_TYPE": "Parcel Type",
"LSTATUS": "Legal Status",
"WSTATUS": "Work Status",
"GEOM_AREA": "Geometry Area",
"COMMENTS": "Comments",
"TAG_X": "Longitude",
etc.etc.etc
If I copy this text to notepad and save as "any_file.json". I can then load this in QGIS and save it as a shapefile.
I have been using the following code to try and achieve this
import requests
mainapi = "https://csg.esri-southafrica.com/server/rest/services/CSGSearch/MapServer/2/query?where=1%3D1&text=&objectIds=&time=&geometry=2053965%2C-4019103%2C2054056%2C-4019169+&geometryType=esriGeometryEnvelope&inSR=3857&spatialRel=esriSpatialRelIntersects&relationParam=&outFields=*&returnGeometry=true&returnTrueCurves=false&maxAllowableOffset=&geometryPrecision=&outSR=3857&returnIdsOnly=false&returnCountOnly=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&returnZ=false&returnM=false&gdbVersion=&returnDistinctValues=false&resultOffset=&resultRecordCount=&queryByDistance=&returnExtentsOnly=false&datumTransformation=&parameterValues=&rangeValues=&f=pjson"
r = str(requests.get(mainapi).json())
#Write response to json text file
with open("csg_erven.json", "w") as f:
f.write(r)
The results in the .json file are not formatted
they appear as such:
{'displayFieldName': 'SP_NAME', 'fieldAliases': {'OBJECTID': 'OBJECTID', 'GID': 'Geometry Identifier', 'PRCL_KEY':
I am new to coding in general, but I am assuming the formatting is crucial here. How can I copy the text with formatting? Is it an encoding issue?
When I manually copy the formatted text from a browser it works fine, but the single line text does not work.
Any help would be greatly appreciated.
Looks to me like there's a bit of a mixup in how you're saving the file. Try using the json library rather than converting to a string and saving as text which could lead to problems
import json
with open('csg_erven.json', 'w') as f:
json.dump(requests.get(mainapi).json(), f)

How do I upload multiple files using the Flask test_client?

How can I use the Flask test_client to upload multiple files to one API endpoint?
I'm trying to use the Flask test_client to upload multiple files to a web service that accepts multiple files to combine them into one large file.
My controller looks like this:
#app.route("/combine/file", methods=["POST"])
#flask_login.login_required
def combine_files():
user = flask_login.current_user
combined_file_name = request.form.get("file_name")
# Store file locally
file_infos = []
for file_data in request.files.getlist('file[]'):
# Get the content of the file
file_temp_path="/tmp/{}-request.csv".format(file_id)
file_data.save(file_temp_path)
# Create a namedtuple with information about the file
FileInfo = namedtuple("FileInfo", ["id", "name", "path"])
file_infos.append(
FileInfo(
id=file_id,
name=file_data.filename,
path=file_temp_path
)
)
...
My test code looks like this:
def test_combine_file(get_project_files):
project = get_project_files["project"]
r = web_client.post(
"/combine/file",
content_type='multipart/form-data',
buffered=True,
follow_redirects=True,
data={
"project_id": project.project_id,
"file_name": "API Test Combined File",
"file": [
(open("data/CC-Th0-MolsPerCell.csv", "rb"), "CC-Th0-MolsPerCell.csv"),
(open("data/CC-Th1-MolsPerCell.csv", "rb"), "CC-Th1-MolsPerCell.csv")
]})
response_data = json.loads(r.data)
assert "status" in response_data
assert response_data["status"] == "OK"
However, I can't get the test_client to actually upload both files. With more than one file specified, the file_data is empty when the API code loops. I have tried my own ImmutableDict with two "file" entries, a list of file tuples, a tuple of file tuples, anything I could think of.
What is the API to specify multiple files for upload in the Flask test_client? I can't find this anywhere on the web! :(
The test client takes a list of file objects (as returned by open()), so this is the testing utility I use:
def multi_file_upload(test_client, src_file_paths, dest_folder):
files = []
try:
files = [open(fpath, 'rb') for fpath in src_file_paths]
return test_client.post('/api/upload/', data={
'files': files,
'dest': dest_folder
})
finally:
for fp in files:
fp.close()
I think if you lose your tuples (but keeping the open()s) then your code might work.
You should just send data object with your files named as you want:
test_client.post('/api/upload',
data={'title': 'upload sample',
'file1': (io.BytesIO(b'get something'), 'file1'),
'file2': (io.BytesIO(b'forthright'), 'file2')},
content_type='multipart/form-data')
Another way of doing this- if you want to explicitly name your file uploads here (my use case was for two CSVs, but could be anything) with test_client is like this:
resp = test_client.post(
'/data_upload_api', # flask route
file_upload_one=[open(FILE_PATH, 'rb')],
file_upload_two=[open(FILE_PATH_2, 'rb')]
)
Using this syntax, these files would be accessible as:
request.files['file_upload_one'] # etc.

Categories

Resources