Django the powerpoint generated using python-pptx library has error message

Django the powerpoint generated using python-pptx library has error message - python

I use python-pptx v0.6.2 to generate powerpoint. I read a exist powerpoint into BytesIO, then do some modification and save it. I can download the file successfully, and I'm sure the content can be write into the file. But when I open the powerpoint, it will popup a error message "Powerpoint found a problem with content in foo.pptx. Powerpoint can attempt to repair the presatation.", then I have to click "repair" button, the powerpoint will display as "repaired" mode. My Python version is 3.5.2 and Django version is 1.10. Below is my code:
with open('foo.pptx', 'rb') as f:
source_stream = BytesIO(f.read())
prs = Presentation(source_stream)
first_slide = prs.slides[0]
title = first_slide.shapes.title
subtitle = first_slide.placeholders[1]
title.text = 'Title'
subtitle.text = "Subtitle"
response = HttpResponse(content_type='application/vnd.ms-powerpoint')
response['Content-Disposition'] = 'attachment; filename="sample.pptx"'
prs.save(source_stream)
ppt = source_stream.getvalue()
source_stream.close()
response.write(ppt)
return response
Any help is appreciate, thanks in advance!

It looks like you've got problems with the IO.
The first three lines can be replaced by:
prs = Presentation('foo.pptx')
Placing the file into a memory-based stream just uses unnecessary resources.
On the writing, you're writing to that original (unnecessary) stream, which is dicey. I suspect that because you didn't seek(0) that you're appending onto the end of it. Also it's conceptually more complicated to deal with reuse.
If you use a fresh BytesIO buffer for the save I think you'll get the proper behavior. It's also better practice because it decouples the open, modify, and save, which you can then factor into separate methods later.
If you eliminate the first BytesIO you should just need the one for the save in order to get the .pptx "file" into the HTTP response.

Related

Is it possible to generate PDF with StreamingHttpResponse as it's possible to do so with CSV for large dataset?

I have a large dataset that I have to generate CSV and PDF for. With CSV, I use this guide: https://docs.djangoproject.com/en/3.1/howto/outputting-csv/
import csv
from django.http import StreamingHttpResponse
class Echo:
"""An object that implements just the write method of the file-like
interface.
"""
def write(self, value):
"""Write the value by returning it, instead of storing in a buffer."""
return value
def some_streaming_csv_view(request):
"""A view that streams a large CSV file."""
# Generate a sequence of rows. The range is based on the maximum number of
# rows that can be handled by a single sheet in most spreadsheet
# applications.
rows = (["Row {}".format(idx), str(idx)] for idx in range(65536))
pseudo_buffer = Echo()
writer = csv.writer(pseudo_buffer)
response = StreamingHttpResponse((writer.writerow(row) for row in rows),
content_type="text/csv")
response['Content-Disposition'] = 'attachment; filename="somefilename.csv"'
return response
It works great. However, I can't find anything that can be done for PDF. Can it? I use render_to_pdf as well as I use a template for PDF.

Think of CSV as a fruit salad. You can slice bananas in a big pot, add some grapefruits, some pineapple, ... and then split the whole into individual portions that you bring together to the table (this is: you generate your CSV file, and then you send it to the client). But you could also make individual portions directly: Cut some slices of a banana in a small bowl, add some grapefruits, some pineapple, ... bring this small bowl to the table, and repeat the process for other individual portions (this is: you generate your CSV file and send it part by part to the client as you generate it).
Well if CSV is a fruit salad, then PDF is a cake. You have to mix all your ingredients and put it in the oven. This means you can't bring a slice of the cake to the table until you have baked the whole cake. Likewise, you can't start sending your PDF file to the client until it's entirely generated.
So, to answer your question, this (response = StreamingHttpResponse((writer.writerow(row) for row in rows), content_type="text/csv")) can't be done for PDF.
However, once your file is generated, you can stream it to the client using FileResponse as mentioned in other answers.
If your issue is that the generation of the PDF takes too much time (and might trigger a timeout error for instance), here are some things to consider:
Try to optimize the speed of your generation algorithm
Generate the file in the background before the client requests it and store it in your storage system. You might want to use a cronjob or celery to trigger the generation of the PDF without blocking the HTTP request.
Use websockets to send the file to the client as soon as it is ready to be downloaded (see django-channels)

Have you tried FileResponse?
Something like this should work, it is basically what you can find in the Django doc:
import io
from django.http import FileResponse
from reportlab.pdfgen import canvas
def stream_pdf(request):
buffer = io.BytesIO()
p = canvas.Canvas(buffer)
p.drawString(10, 10, "Hello world.")
p.showPage()
p.save()
buffer.seek(io.SEEK_SET)
return FileResponse(buffer, as_attachment=True, filename='helloworld.pdf')

I had a similar situation where I am able to "generate and stream download" files of csv, json and xml types and I want to do the same with Excel - xlsx file.
Unfortunately, I couldn't do that. But, during that time I found a few things
The files , CSV, JSON and XML are text files with a proper representation. But, when comes to PDF or Excel (or similar files), these files are built with a proper formatting and proper metadata.
The binary data of PDF and similar docs are written to the io buffer only when we call some specific methods. [ showPage() and save() methods of reportlab. (source- Django Doc) ]
If we inspect the file stream, PDF and Excel require sophisticated special applications (eg: PDF reader, Bowsers etc) to view/read the data whereas, with CSV and JSON, we need only a simple text editor.
So, I conclude that the process of "on the fly generation of file with stream download" (not sure what is the correct technical term I should use) is not possible for all file types, but only possible for a few text-oriented files
Note: This is my limited experience, which may be wrong.

Looking at the link you provided it does provide a link to a page on creating and sending pdf files dynamically using reportlab.
import io
from django.http import FileResponse
from reportlab.pdfgen import canvas
def some_view(request):
# Create a file-like buffer to receive PDF data.
buffer = io.BytesIO()
# Create the PDF object, using the buffer as its "file."
p = canvas.Canvas(buffer)
# Draw things on the PDF. Here's where the PDF generation happens.
# See the ReportLab documentation for the full list of functionality.
p.drawString(100, 100, "Hello world.")
# Close the PDF object cleanly, and we're done.
p.showPage()
p.save()
# FileResponse sets the Content-Disposition header so that browsers
# present the option to save the file.
buffer.seek(0)
return FileResponse(buffer, as_attachment=True, filename='hello.pdf')
Here's a link to the reportlab api documentation. Its kinda lengthy and stored in a annoying to navigate single page pdf, but it should get you up and running and able to nicely format the PDFs as you want.

Extracting GIF image from URL in Python (Only returns string?)

Firstly, there are plenty of related questions about extracting GIFs from a URL, but majority of them are in different languages to Python. Secondly, a google search provides many examples of how to do this using requests and a parser, like lxml or beautifulsoup. However, my problem is specific to this URL I think, and I cannot quite figure out why the image in question does not have a specific url attached to it ( http://cactus.nci.nih.gov/chemical/structure/3-Methylamino-1-%28thien-2-yl%29-propane-1-ol/image)
This is what I have tried
molecule_name = "3-Methylamino-1-(thien-2-yl)-propane-1-ol"
molecule = urllib.pathname2url(molecule_name)
response = requests.get("http://cactus.nci.nih.gov/chemical/structure/"+ molecule+"/image")
response.encoding = 'ISO-8859-1'
print type(response.content)
and I just get back a string that says GIF87au. I know it is something to do with GIF being in binary etc. But I cant quite work out how to donwload that GIF file in that particular page using the script.
Furthermore, if I do manage to download the GIF file, what are the best modules to use, to make tables (csv or excel style) with GIF files embedded in the last column for example?

As far as I can tell your code is working for me.
molecule_name = "3-Methylamino-1-(thien-2-yl)-propane-1-ol"
molecule = urllib.pathname2url(molecule_name)
response = requests.get("http://cactus.nci.nih.gov/chemical/structure/"+molecule+"/image")
response.encoding = 'ISO-8859-1'
print len(response.content)
It outputs "1080".
As for second task in hand ... putting it into document. I would use xlsxwriter like this:
import xlsxwriter
# Create an new Excel file and add a worksheet.
workbook = xlsxwriter.Workbook('molecules.xlsx')
worksheet = workbook.add_worksheet()
# Input data
worksheet.write(0, 0, "My molecule") # A1 == 0, 0
worksheet.insert_image('B1', 'molecule1234.png')
workbook.close()
See http://xlsxwriter.readthedocs.org/index.html
You will have to convert that .gif into .png, because as of now xlsxwriter does not support gifs (as jmcnamara pointed out). Here you can look how to do that using PIL - How to change gif file to png file using python pil.
You can display the gif using many various methods. I would just save it to file and used some other software. If you want to view them programmatically, you can use for instance Tkinter as used here Play Animations in GIF with Tkinter.

Using Tablib Library with Web2py

I've been trying for a while to make tablib work with web2py without luck. The code is delivering a .xls file as expected, but it's corrupted and empty.
import tablib
data = []
headers = ('first_name', 'last_name')
data = tablib.Dataset(*data, headers=headers)
data.append(('John', 'Adams'))
data.append(('George', 'Washington'))
response.headers['Content-Type']= 'application/vnd.ms-excel;charset=utf-8'
response.headers['Content-disposition']='attachment; filename=test.xls'
response.write(data.xls, escape=False)
Any ideas??
Thanks!

Per http://en.wikipedia.org/wiki/Process_state , response.write is documented as serving
to write text into the output page body
(my emphasis). data.xls is not text -- it's binary stuff! To verify that is indeed the cause of your problem, try using data.csv instead, and that should work, since it is text.
I believe you'll need to use response.stream instead, to send "binary stuff" as your response (or as an attachment thereto).

send data from blobstore as email attachment in GAE

Why isn't the code below working? The email is received, and the file comes through with the correct filename (it's a .png file). But when I try to open the file, it doesn't open correctly (Windows Gallery reports that it can't open this photo or video and that the file may be unsupported, damaged or corrupted).
When I download the file using a subclass of blobstore_handlers.BlobstoreDownloadHandler (basically the exact handler from the GAE docs), and the same blob key, everything works fine and Windows reads the image.
One more bit of info - the binary files from the download and the email appear very similar, but have a slightly different length.
Anyone got any ideas on how I can get email attachments sending from GAE blobstore? There are similar questions on S/O, suggesting other people have had this issue, but there don't appear to be any conclusions.
from google.appengine.api import mail
from google.appengine.ext import blobstore
def send_forum_post_notification():
blob_reader = blobstore.BlobReader('my_blobstore_key')
blob_info = blobstore.BlobInfo.get('my_blobstore_key')
value = blob_reader.read()
mail.send_mail(
sender='my.email#address.com',
to='my.email#address.com',
subject='this is the subject',
body='hi',
reply_to='my.email#address.com',
attachments=[(blob_info.filename, value)]
)
send_forum_post_notification()

I do not understand why you use a tuple for the attachment. I use :
message = mail.EmailMessage(sender = ......
message.attachments = [blob_info.filename,blob_reader.read()]

I found that this code doesn't work on dev_appserver but does work when pushed to production.

I ran into a similar problem using the blobstore on a Python Google App Engine application. My application handles PDF files instead of images, but I was also seeing a "the file may be unsupported, damaged or corrupted" error using code similar to your code shown above.
Try approaching the problem this way: Call open() on the BlobInfo object before reading the binary stream. Replace this line:
value = blob_reader.read()
... with these two lines:
bstream = blob_info.open()
value = bstream.read()
Then you can remove this line, too:
blob_reader = blobstore.BlobReader('my_blobstore_key')
... since bstream above will be of type BlobReader.
Relevant documentation from Google is located here:
https://cloud.google.com/appengine/docs/python/blobstore/blobinfoclass#BlobInfo_filename

Python/Django - Can I create multiple pdf file-like objects, zip them and send as attachment?

I'm using Django to create a web app where some parameters are input and plots are created. I want to have a link which will be to download ALL the plots in a zip file. To do this, I am writing a view which will create all the plots (I've already written views that create each of the single plots and display them), then zip them up, saving the zip file as the response object.
One way I could do this is to create each plot, save it as a pdf file to disk, and then at the end, zip them all up as the response. However, I'd like to sidestep the saving to disk if that's possible?
Cheers.

This is what worked for me, going by Krzysiek's suggestion of using StringIO. Here canvas is a canvas object created by matplotlib.
#Create the file-like objects from canvases
file_like_1 = StringIO.StringIO()
file_like_2 = StringIO.StringIO()
#... etc...
canvas_1.print_pdf(file_like_1)
canvas_2.print_pdf(file_like_2)
#...etc....
#NOW create the zipfile
response = HttpResponse(mimetype='application/zip')
response['Content-Disposition'] = 'filename=all_plots.zip'
buff = StringIO.StringIO()
archive = zipfile.ZipFile(buff,'w',zipfile.ZIP_DEFLATED)
archive.writestr('plot_1.pdf',file_like_1.getvalue())
archive.writestr('plot_2.pdf',file_like_2.getvalue())
#..etc...
archive.close()
buff.flush()
ret_zip = buff.getvalue()
buff.close()
response.write(ret_zip)
return response
The zipping part of all of this was taken from https://code.djangoproject.com/wiki/CookBookDynamicZip

Look at the StringIO python module. It implements file behavior on in-memory strings.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.