how to read a file content from tornado http filebody?

how to read a file content from tornado http filebody? - python

My file is like this, but I can't exec the content correctly. I've spent my whole afternoon on this, and still so confused. The main reason is that I don't know what does that [file_obj[0]['body']] looks like.
here is part of my code
# user_file content
"uid = 'h123456789'"
"data = [something]"
# end of user_file
# code piece
file_obj = req.request.files.get('user_file', None)
for i in file_obj[0]['body']:
i.strip('\n') # I tried comment out this line, still can't work
exec(i)
# I failed
Can you tell me what does the user_file conentent would looks like in the file_obj body? So that I can figure out the solution maybe. I submitted it with http form to tornado.
Really thanks.

Maybe this will help.
#first file object in request.
file1 = self.request.files['file1'][0]
#where the file content actually placed.
content = file1['body']
#split content into lines, unix line terminals assumed.
lines = content.split(b'\n')
for l in lines:
#after decoding into strings, you're free to execute them.
try:
exec(l.decode())
except:
pass

Related

File corrupted when using send_file() from flask, data from pymongo gridfs

Well my English is not good, and the title may looks weird.
Anyway, I'm now using flask to build a website that can store files, and mongodb is the database.
The file upload, document insert functions have no problems, the weird thing is that the file sent from flask send_file() was truncated for no reasons. Here's my code
from flask import ..., send_file, ...
import pymongo
import gridfs
#...
#app.route("/record/download/<record_id>")
def api_softwares_record_download(record_id):
try:
#...
file = files_gridfs.find_one({"_id": record_id})
file_ext = filetype.guess_extension(file.read(2048))
filename = "{}-{}{}".format(
app["name"],
record["version"],
".{}".format(file_ext) if file_ext else "",
)
response = send_file(file, as_attachment=True, attachment_filename=filename)
return response
except ...
The original image file, for example, is 553KB. But the response body returns 549.61KB, and the image was broken. But if I just directly write the file to my disk
#...
with open('test.png', 'wb+') as file:
file.write(files_gridfs.find_one({"_id": record_id}).read())
The image file size is 553KB and the image is readable.
When I compare the two files with VS Code's text editor, I found that the correct file starts with �PNG, but the corrupted file starts with �ϟ8���>�L�y
search the corrupted file head in the correct file
And I tried to use Blob object and download it from the browser. No difference.
Is there any wrong with my code or I misused send_file()? Or should I use flask_pymongo?

And it's interesting that I have found what is wrong with my code.
This is how I solved it
...file.read(2048)
file.seek(0)
...
file.read(2048)
file.seek(0)
...
response = send_file(file, ...)
return response
And here's why:
For some reasons, I use filetype to detect the file's extension name and mime type, so I sent 2048B to filetype for detection.
file_ext = filetype.guess_extension(file.read(2048))
file_mime = filetype.guess_mime(file.read(2048)) #this line wasn't copied in my question. My fault.
And I have just learned from the pymongo API that python (or pymongo or gridfs, completely unknown to this before) reads file by using a cursor. When I try to find the cursor's position using file.seek(), it returns 4096. So when I call file.read() again in send_file(), the cursor reads from 4096B away to the file head. 549+4=553, and here's the problem.
Finally I set the cursor to position 0 after every read() operation, and it returns the correct file.
Hope this can help if you made the same mistake just like me.

Django download a BinaryField as a CSV file

I am working with a legacy project and we need to implement a Django Admin that helps download a csv report that was stored as a BinaryField.
The model is something like this:
class MyModel(models.Model):
csv_report = models.BinaryField(blank=True,null=True)
Everything seems to being stored as expected but I have no clue how to decode the field back to a csv file for later use.
I am using something like these (as an admin action on MyModelAdmin class)
class MyModelAdmin(admin.ModelAdmin):
...
...
actions = ["download_file",]
def download_file(self, request,queryset):
# just getting one for testing
contents = queryset[0].csv_report
encoded_data = base64.b64encode(contents).decode()
with open("report.csv", "wb") as binary_file:
# Write bytes to file
decoded_image_data = base64.decodebytes(encoded_data)
binary_file.write(decoded_image_data)
response = HttpResponse(encoded_data)
response['Content-Disposition'] = 'attachment; filename=report.csv'
return response
download_file.short_description = "file"
But all I download is a scrambled csv file. I don't seem to understand if it is a problem of the format I am using to decode (.decode('utf-8') does nothing either )
PD:
I know it is a bad practice to use BinaryField for this. But requirements are requirements. Nothing to do about it.
EDIT:
As #TimRoberts pointed out, encoding and then decoding is REALLY silly :$. I've changed the method like so:
def download_file(self, request,queryset):
# print(self,request)
contents = queryset[0].csv_report
# print(type(contents))
encoded_data = base64.b64decode(contents)
with open("my_file.csv", "wb") as binary_file:
binary_file.write(encoded_data)
response = HttpResponse(encoded_data)
response['Content-Disposition'] = 'attachment; filename=blob.csv'
return response
download_file.short_description = "file"
Still I am getting a csv file with something like this:

A big fat case of the old RTFM: I was getting carried away by the all base64.. Obviously I didn't have any idea of what I was doing.
After tampering with the shell and reading the docs, I just changed my method to:
def download_file(self, request,queryset):
**contents = bytes(queryset[0].csv_report)**
response = HttpResponse(contents)
response['ContentDisposition']='attachment;filename=report.csv'
return response
Note that I was scrambling the data on purpose by doing the encoded_data = base64.b64decode(contents) stuff. I just needed to apply bytes on my BinaryField and voilá

How does rfile.read() work?

I'm sending a text file with a string in a python script via POST to my server:
fo = open('data'.txt','a')
fo.write("hi, this is my testing data")
fo.close()
with open('data.txt', 'rb') as f:
r = requests.post("http://XXX.XX.X.X", data = {'data.txt':f})
f.close()
And receiving and handling it here in my server handler script, built off an example found online:
def do_POST(self):
data = self.rfile.read(int(self.headers.getheader('Content-Length')))
empty = [data]
with open('processing.txt', 'wb') as file:
for item in empty:
file.write("%s\n" % item)
file.close()
self._set_headers()
self.wfile.write("<html><body><h1>POST!</h1></body></html>")
My question is, how does:
self.rfile.read(int(self.headers.getheader('Content-Length')))
take the length of my data (an integer, # of bytes/characters) and read my file? I am confused how it knows what my data contains. What is going on behind the scenes with HTTP?
It outputs data.txt=hi%2C+this+is+my+testing+data
to my processing.txt, but I am expecting "hi this is my testing data"
I tried but failed to find documentation for what exactly rfile.read() does, and if simply finding that answers my question I'd appreciate it, and I could just delete this question.

Your client code snippet reads contents from the file data.txt and makes a POST request to your server with data structured as a key-value pair. The data sent to your server in this case is one key data.txt with the corresponding value being the contents of the file.
Your server code snippet reads the entire HTTP Request body and dumps it into a file. The key-value pair structured and sent from the client comes in a format that can be decoded by Python's built in library urlparse.
Here is a solution that could work:
def do_POST(self):
length = int(self.headers.getheader('content-length'))
field_data = self.rfile.read(length)
fields = urlparse.parse_qs(field_data)
This snippet of code was shamefully borrowed from: https://stackoverflow.com/a/31363982/705471
If you'd like to extract the contents of your text file back, adding the following line to the above snippet could help:
data_file = fields["data.txt"]
To learn more about how such information is encoded for the purposes of HTTP, read more at: https://en.wikipedia.org/wiki/Percent-encoding

Python - How to read the content of an URL twice?

I am using 'urllib.request.urlopen' to read the content of an HTML page. Afterwards, I want to print the content to my local file and then do a certain operation (e.g. constuct a parser on that page e.g. BeautifulSoup).
The problem
After reading the content for the first time (and writing it into a file), I can't read the content for the second time in order to do something with it (e.g. construct a parser on it). It is just empty and I can't move the cursor(seek(0)) back to the beginning.
import urllib.request
response = urllib.request.urlopen("http://finance.yahoo.com")
file = open( "myTestFile.html", "w")
file.write( response.read() ) # Tried responce.readlines(), but that did not help me
#Tried: response.seek() but that did not work
print( response.read() ) # Actually, I want something done here... e.g. construct a parser:
# BeautifulSoup(response).
# Anyway this is an empty result
file.close()
How can I fix it?
Thank you very much!

You can not read the response twice. But you can easily reuse the saved content:
content = response.read()
file.write(content)
print(content)

Python O365 send email with HTML file

I'm using O365 for Python.
Sending an email and building the body my using the setBodyHTML() function. However at the present I need to write the actual HTML code inside the function. I don't want to do that. I want to just have python look at an HTML file I saved somewhere and send an email using that file as the body. Is that possible? Or am I confined to copy/pasting my HTML into that function? I'm using office365 for business. Thanks.
In other words instead of this: msg.setBodyHTML("<h3>Hello</h3>") I want to be able to do this: msg.setBodyHTML("C:\somemsg.html")

I guess you can assign the file content to a variable first, i.e.:
file = open('C:/somemsg.html', 'r')
content = file.read()
file.close()
msg.setBodyHTML(content)

You can do this via a simple reading of that file into a string, which you then can pass to the setBodyHTML function.
Here's a quick function example that will do the trick:
def load_html_from_file(path):
contents = ""
with open(path, 'r') as f:
contents = f.read()
return contents
Later, you can do something along the lines of
msg.setBodyHTML(load_html_from_file("C:\somemsg.html"))
or
html_contents = load_html_from_file("C:\somemsg.html")
msg.setBodyHTML(html_contents)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to read a file content from tornado http filebody? - python

Related

File corrupted when using send_file() from flask, data from pymongo gridfs

Django download a BinaryField as a CSV file

How does rfile.read() work?

Python - How to read the content of an URL twice?

Python O365 send email with HTML file

Categories

Resources