PDF files downloaded with Python cannot be opened in acrobat - python

I have a little python script that I am using to download a whole bunch of PDF files for archiving. The problem I have is that when I download the files, they appear correctly under the correct title, but they are the wrong size and they can't be opened by Acrobat, which fails with an error message saying Out of memory or Insufficient data for an image or some other arbitrary Acrobat error. Viewing the content of the page in a text editor looks a bit like a PDF document, by which I mean it is incomprehensible in general but with a few fragments of text and markup, including PDF identifiers.
The code to download the file is this:
def download_file( file_id):
folder_path = ".\\pdf_files\\"
file_download="http://myserver/documentimages.asp?SERVICE_ID=RETRIEVE_IMAGE&documentKey="
file_content = urllib.urlopen(file_download+file_id, proxies={})
file_local = open( folder_path + file_id + '.pdf', 'w' )
file_local.write(file_content.read())
file_content.close()
file_local.close()
If the same file is downloaded through a browser it looks fine, but is also larger on the disk. I am guessing that the problem might be to do with the encoding of the file when it is saved?

You need to write it as a binary file so:
file_local = open( folder_path + file_id + '.pdf', 'wb' )

Related

Excel file corrupted after using zipfile write method in Python

I'm developing a Flask application where I want the user to download a set of files from the server. To achieve this objective, I use the zipfile module in order to zip all the files and then send this compressed file to the user.
Here is my code:
#app.route("/get_my_files/<file_name>", methods = ["GET"])
def get_my_files(file_name):
file_exts = [".csv", ".xlsx", ".json"]
file_path = "./user_files/" + file_name
# Create compress file.
memory_file = io.BytesIO()
with zipfile.ZipFile(memory_file, 'w') as zf:
for ext in file_exts:
#data = zipfile.ZipInfo(individualFile)
data = zipfile.ZipInfo("resultado" + ext)
data.date_time = time.localtime(time.time())[:6]
data.compress_type = zipfile.ZIP_DEFLATED
zf.writestr(data, open(file_path + ext, "r").read())
memory_file.seek(0)
# , encoding="ISO-8859-1"
# Delete files.
for ext in file_exts:
os.remove(file_path + ext)
# Return zip file to client.
return send_file(
memory_file,
mimetype = "zip",
attachment_filename='resultados.zip',
as_attachment=True,
cache_timeout=0
)
Unfortunately, once I decompress the zip file, the Excel file is getting corrupted (CSV and JSON file can be read and opened without problems). I have tried several different types of encoding when writing the zip file, however I haven't been able to find a solution.
What is the problem and how can I do this correctly?
You opened the files in text mode, which worked for JSON and CSV, but not for a binary file like Excel.
open(file_path + ext, "r")
You need to open them in binary mode, that is rb = read binary.

How do I scrape the text from a FTP server using Python?

I am looking to extract all the information from this page:
Text data in FTP
I understand that requests lib wouldn't work for ftp, so I have resorted to using ftplib.
However, documentation seems to only explore the downloading of files in directories. How do I download this file without a "file type"
Thanks in advance.
If you want to download a text file contents to memory, without using any temporary file, use retrlines like:
contents = ""
def collectLines(s):
global contents
contents += s + "\n"
ftp.retrlines("RETR " + filename, collectLines)
Or use an array:
lines = []
ftp.retrlines("RETR " + filename, lines.append)
For binary files, see Read a file in buffer from FTP python.

App Engine - download files from Cloud Storage

I am using Python 2.7 and Reportlab to create .pdf files for display/print in my app engine system. I am using ndb.Model to store the data if that matters.
I am able to produce the equivalent of a bank statement for a single client on-line. That is; the user clicks the on-screen 'pdf' button and the .pdf statement appears on screen in a new tab, exactly as it should.
I am using the following code to save .pdf files to Google Cloud Storage successfully
buffer = StringIO.StringIO()
self.p = canvas.Canvas(buffer, pagesize=portrait(A4))
self.p.setLineWidth(0.5)
try:
# create .pdf of .csv data here
finally:
self.p.save()
pdfout = buffer.getvalue()
buffer.close()
filename = getgcsbucket() + '/InvestorStatement.pdf'
write_retry_params = gcs.RetryParams(backoff_factor=1.1)
try:
gcs_file = gcs.open(filename,
'w',
content_type='application/pdf',
retry_params=write_retry_params)
gcs_file.write(pdfout)
except:
logging.error(traceback.format_exc())
finally:
gcs_file.close()
I am using the following code to create a list of all files for display on-screen, it shows all the files stored above.
allfiles = []
bucket_name = getgcsbucket()
rfiles = gcs.listbucket(bucket_name)
for rfile in rfiles:
allfiles.append(rfile.filename)
return allfiles
My screen (html) shows rows of ([Delete] and Filename). When the user clicks the [Delete] button, the following delete code snippet works (filename is /bucket/filename, complete)
filename = self.request.get('filename')
try:
gcs.delete(filename)
except gcs.NotFoundError:
pass
My question - given I have a list of files on-screen, I want the user to click on the filename and for that file to be downloaded to the user's computer. In Google's Chrome Browser, this would result in the file being downloaded, with it's name displayed on the bottom left of the screen.
One other point, the above example is for .pdf files. I will also have to show .csv files in the list and would like them to be downloaded as well. I only want the files to be downloaded, no display is required.
So, I would like a snippet like ...
filename = self.request.get('filename')
try:
gcs.downloadtousercomputer(filename) ???
except gcs.NotFoundError:
pass
I think I have tried everything I can find both here and elsewhere. Sorry I have been so long-winded. Any hints for me?
To download a file instead of showing it in the browser, you need to add a header to your response:
self.response.headers["Content-Disposition"] = 'attachment; filename="%s"' % filename
You can specify the filename as shown above and it works for any file type.
One solution you can try is to read the file from the bucket and print the content as the response with the correct header:
import cloudstorage
...
def read_file(self, filename):
bucket_name = "/your_bucket_name"
file = bucket_name + '/' + filename
with cloudstorage.open(file) as cloudstorage_file:
self.response.headers["Content-Disposition"] = str('attachment;filename=' + filename)
contents = cloudstorage_file.read()
cloudstorage_file.close()
self.response.write(contents)
Here filename could be something you are sending as GET parameter and needs to be a file that exist on your bucket or you will raise an exception.
[1] Here you will find a sample.
[1]https://cloud.google.com/appengine/docs/standard/python/googlecloudstorageclient/read-write-to-cloud-storage

Python: Crossplatform code to download a valid .zip file

I have a requirement to download and unzip a file from a website. Here is the code I'm using:
#!/usr/bin/python
#geoipFolder = r'/my/folder/path/ ' #Mac/Linux folder path
geoipFolder = r'D:\my\folder\path\ ' #Windows folder path
geoipFolder = geoipFolder[:-1] #workaround for Windows escaping trailing quote
geoipName = 'GeoIPCountryWhois'
geoipURL = 'http://geolite.maxmind.com/download/geoip/database/GeoIPCountryCSV.zip'
import urllib2
response = urllib2.urlopen(geoipURL)
f = open('%s.zip' % (geoipFolder+geoipName),"w")
f.write(repr(response.read()))
f.close()
import zipfile
zip = zipfile.ZipFile(r'%s.zip' % (geoipFolder+geoipName))
zip.extractall(r'%s' % geoipFolder)
This code works on Mac and Linux boxes, but not on Windows. There, the .zip file is written, but the script throws this error:
zipfile.BadZipfile: File is not a zip file
I can't unzip the file using Windows Explorer either. It says that:
The compressed (zipped) folder is empty.
However the file on disk is 6MB large.
Thoughts on what I'm doing wrong on Windows?
Thanks
Your zipfile is corrupt on windows because you're opening the file in write/text mode (line-terminator conversion trashes binary data):
f = open('%s.zip' % (geoipFolder+geoipName),"w")
You have to open in write/binary mode like this:
f = open('%s.zip' % (geoipFolder+geoipName),"wb")
(will still work on Linux of course)
To sum it up, a more pythonic way of doing it, using a with block (and remove repr):
with open('{}{}.zip'.format(geoipFolder,geoipName),"wb") as f:
f.write(response.read())
EDIT: no need to write a file to disk, you can use io.BytesIO, since the ZipFile object accepts a file handle as first parameter.
import io
import zipfile
with open('{}{}.zip'.format(geoipFolder,geoipName),"wb") as f:
outbuf = io.BytesIO(f.read())
zip = zipfile.ZipFile(outbuf) # pass the fake-file handle: no disk write, no temp file
zip.extractall(r'%s' % geoipFolder)

Python: Serving files, All carriage returns lost in text file

I'm using the method described in the link https://stackoverflow.com/a/8601118/2497977
import os
import mimetypes
from django.core.servers.basehttp import FileWrapper
def download_file(request):
the_file = '/some/file/name.png'
filename = os.path.basename(the_file)
response = HttpResponse(FileWrapper(open(the_file)),
content_type=mimetypes.guess_type(the_file)[0])
response['Content-Length'] = os.path.getsize(the_file)
response['Content-Disposition'] = "attachment; filename=%s" % filename
return response
Initially get data in a form, when submitted, i process the data to generate a "config" and write it out to a file. then when valid, pass the file back to the user as a download.
It works great except I'm running into the problem that in my situation the file is text, so when the file is downloaded, its coming as a blob of text without CR/LF.
Any suggestions on how to address this?
Open with binary mode.
open(the_file, 'rb')
http://docs.python.org/2/library/functions.html#open
The default is to use text mode, which may convert '\n' characters to
a platform-specific representation on writing and back on reading.
Thus, when opening a binary file, you should append 'b' to the mode
value to open the file in binary mode, which will improve portability.
(Appending 'b' is useful even on systems that don’t treat binary and
text files differently, where it serves as documentation.)

Categories

Resources