How to save an encoded pdf in a zip file using django? - python

I've read some posts about this problem, but most of them didn't help my case, I'm trying to save an encoded pdf in a zip file (I'm using Docraptor API for the pdf generation, which return the encoded pdf).
def toZip(request, ...):
...
response = docraptor_api_call() #api call to generate pdf (encoded pdf)
with open('creation.pdf', 'wb') as f:
f.write(response)
#decode pdf
with open(f.name, 'rb') as pdf:
# this will download the pdf to the user
# doc = HttpResponse(pdf.read(), content_type='application/pdf')
# doc['Content-Disposition'] = "attachment; filename=filename.pdf"
# return doc
zip_io = io.BytesIO()
# create zipFile
zf = zipfile.ZipFile(zip_io, mode='w')
# write PDF in ZIP ?
save_zf = zf.write(pdf.read())
# save zip to FileField
zip = ZipStore.objects.create(zip=save_zf)
While trying the code on top I get this error :
UnicodeEncodeError: 'charmap' codec can't encode character '\u2019' in position 43: character maps to
I'm don't really get what am I doing wrong and how I should fix it, any suggestion ?

You've got an error in the way you're calling zf.write. You should be using:
# ZipFile.write would take the the file to write, not bytes to be written.
# f.name is the name of the file in the zip archive. So if I passed
# in "foo.txt", "1", I'd get a file named `foo.txt` after decompressing, and its
# contents would be 1
zf.writestr(f.name, pdf.read())
This method does not appear to return something, so you'll need to change this: zip = ZipStore.objects.create(zip=save_zf) probably to:
zip = ZipStore.objects.create(zip=zip_io)

Related

Read and upload to GitHub non UTF-8 file Python

I have code thats upload SQlite3 file to GitHub(module PyGithub).
import github
with open('server.db', 'r') as file:
content = file.read()
g = github.Github('token')
repo = g.get_user().get_repo("my-repo")
file = repo.get_contents("server.db")
repo.update_file("server.db", "Python Upload", content, file.sha, branch="main")
If you open this file through a text editor, then there will be characters that are not included in UTF-8, since this is a database file. I get this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd9 in position 99: invalid continuation byte
How i can fix it?
Maybe I can upload the file to GitHub so it is not text-based, like a PNG?
i use this.
f = open("yourtextfile", encoding="utf8")
contents = get_blob_content(repo, branch="main",path_name="yourfile")
repo.delete_file("yourfile", "test title", contents.sha)
repo.create_file("yourfile", "test title", f.read())
and this def
def get_blob_content(repo, branch, path_name):
# first get the branch reference
ref = repo.get_git_ref(f'heads/{branch}')
# then get the tree
tree = repo.get_git_tree(ref.object.sha, recursive='/' in path_name).tree
# look for path in tree
sha = [x.sha for x in tree if x.path == path_name]
if not sha:
# well, not found..
return None
# we have sha
return repo.get_git_blob(sha[0])

failed: Network error while downloading Excel file generated by jupyter notebook

my jupyter notebook is saving a dataframe(having styles) to an excel file. then I have created a link to download this excel file:
df=df.to_excel('ABC.xlsx', index=True)
filename ='ABC.xlsx'
file_link = "<a href='{href}' download='ABC.xlsx'> Download ABC.xlsx</a>"
html = HTML(file_link.format(href=filename))
dispaly(html)
but when i click on link-Download ABC.xlsx, I am getting- Failed: Network error.
On the contrary it is working fine when i am downloading CSV file the same way
Adding csv code, there is some base64 encoding added in csv code without which csv code is also not working:
def func(df,title="Download csv file",filename="ABC.csv"):
csv=df.to_csv(index=True)
b64 =base64.b64encode(csv.encode())
payload=b64.decode()
html = "{title}"
html = html.format(payload=payload,title=title,filename=filename)
return HTML(html)
i tried editing this function for excel file:
def func(df,title="Download excel file",filename="ABC.xlsx"):
xls=df.to_excel("xyz.xlsx",index=True)
b64 =base64.b64encode(xls.encode())
payload=b64.decode()
html = "{title}"
html = html.format(payload=payload,title=title,filename=filename)
return HTML(html)
for excel code it giving error: 'NoneType' object has no attribute 'encode'
In you csv code, you use csv=df.to_csv(index=True), according to docs
If path_or_buf is None, returns the resulting csv format as a string.
Otherwise returns None.
here you didn't specify path_or_buf, so return value is csv content. this is why you can download csv.
Now to_excel doc desn't say it has any return value. so your payload don't contain anything at all.
To solve, you can manually open file again and read as base64 format string:
def file_to_base64(file):
#file should be the actual file name you wrote
with open(file, "rb") as image_file:
encoded_string = base64.b64encode(image_file.read())
return encoded_string.decode()
replace the two lines
b64 =base64.b64encode(xls.encode())
payload=b64.decode()
with:
payload = file_to_base64(file)

Create a pdf file, write in it and return its byte stream with PyMuPDF

Using PyMuPDF, I need to create a PDF file, write some text into it, and return its byte stream.
This is the code I have, but it uses the filesystem to create and save the file:
import fitz
path = "PyMuPDF_test.pdf"
doc = fitz.open()
page = doc.newPage()
where = fitz.Point(50, 100)
page.insertText(where, "PDF created with PyMuPDF", fontsize=50)
doc.save(path) # Here Im saving to the filesystem
with open(path, "rb") as file:
return io.BytesIO(file.read()).getvalue()
Is there a way I can create a PDF file, write some text in it, and return its byte stream without using the filesystem?
Checking save() I found write() which gives it directly as bytes
import fitz
#path = "PyMuPDF_test.pdf"
doc = fitz.open()
page = doc.newPage()
where = fitz.Point(50, 100)
page.insertText(where, "PDF created with PyMuPDF", fontsize=50)
print(doc.write())

Django 1.7: serve a pdf -file (UnicodeDecodeError)

I'm trying to serve a PDF file with django 1.7, and this is basically the code that "should" work... it certainly works if I change the content_type to 'text' and download a .tex file with it, but when I try it with a binary file, I get "UnicodeDecodeError at /path/to/file/filename.pdf
'utf-8' codec can't decode byte 0xd0 in position 10: invalid continuation byte"
def download(request, file_name):
file = open('path/to/file/{}'.format(file_name), 'r')
response = HttpResponse(file, content_type='application/pdf')
response['Content-Disposition'] = "attachment; filename={}".format(file_name)
return response
So basically, if I understand correctly, it's trying to serve the file as a UTF-8 encoded text file, instead of a binary file. I've tried to change the content_type to 'application/octet-stream' with similar results. What am I missing?
Try opening the file using binary mode:
file = open('path/to/file/{}'.format(file_name), 'rb')

Write files to disk with python 3.x

Using BottlePy, I use the following code to upload a file and write it to disk :
upload = request.files.get('upload')
raw = upload.file.read()
filename = upload.filename
with open(filename, 'w') as f:
f.write(raw)
return "You uploaded %s (%d bytes)." % (filename, len(raw))
It returns the proper amount of bytes every single time.
The upload works fine for file like .txt, .php, .css ...
But it results in a corrupted file for other files like .jpg, .png, .pdf, .xls ...
I tried to change the open() function
with open(filename, 'wb') as f:
It returns the following error:
TypeError('must be bytes or buffer, not str',)
I guess its an issue related to binary files ?
Is there something to install on top of Python to run upload for any file type ?
Update
Just to be sure, as pointed out by #thkang I tried to code this using the dev version of bottlepy and the built-in method .save()
upload = request.files.get('upload')
upload.save(upload.filename)
It returns the exact same Exception error
TypeError('must be bytes or buffer, not str',)
Update 2
Here the final code which "works" (and dont pop the error TypeError('must be bytes or buffer, not str',)
upload = request.files.get('upload')
raw = upload.file.read().encode()
filename = upload.filename
with open(filename, 'wb') as f:
f.write(raw)
Unfortunately, the result is the same : every .txt file works fine, but other files like .jpg, .pdf ... are corrupted
I've also noticed that those file (the corrupted one) have a larger size than the orginal (before upload)
This binary thing must be the issue with Python 3x
Note :
I use python 3.1.3
I use BottlePy 0.11.6 (raw bottle.py file, no 2to3 on it or anything)
Try this:
upload = request.files.get('upload')
with open(upload.file, "rb") as f1:
raw = f1.read()
filename = upload.filename
with open(filename, 'wb') as f:
f.write(raw)
return "You uploaded %s (%d bytes)." % (filename, len(raw))
Update
Try value:
# Get a cgi.FieldStorage object
upload = request.files.get('upload')
# Get the data
raw = upload.value;
# Write to file
filename = upload.filename
with open(filename, 'wb') as f:
f.write(raw)
return "You uploaded %s (%d bytes)." % (filename, len(raw))
Update 2
See this thread, it seems to do same as what you are trying...
# Test if the file was uploaded
if fileitem.filename:
# strip leading path from file name to avoid directory traversal attacks
fn = os.path.basename(fileitem.filename)
open('files/' + fn, 'wb').write(fileitem.file.read())
message = 'The file "' + fn + '" was uploaded successfully'
else:
message = 'No file was uploaded'
In Python 3x all strings are now unicode, so you need to convert the read() function used in this file upload code.
The read() function returns a unicode string aswell, which you can convert into proper bytes via encode() function
Use the code contained in my first question, and replace the line
raw = upload.file.read()
with
raw = upload.file.read().encode('ISO-8859-1')
That's all ;)
Further reading : http://python3porting.com/problems.html

Categories

Resources