Is there a way to compress gzip or zipfile as a password protected archive?
Here is an example code illustrating how to archive file with no password protection:
import gzip, shutil
filepath = r'c:\my.log'
with gzip.GzipFile(filepath + ".gz", "w") as gz:
with open(filepath) as with_open_file:
shutil.copyfileobj(with_open_file, gz)
import zipfile
zf = zipfile.ZipFile(filepath + '.zip', 'w')
zf.write(filepath)
zf.close()
Python supports extracting password protected zips:
zipfile.ZipFile('myarchive.zip').extractall(pwd='P4$$W0rd')
Sadly, it does not support creating them. You can either call an external tool like 7zip or use a third-party library like this zlib wrapper.
Use gpg for encryption. That would be a separate wrapper around the archived and compressed data.
Related
I have been struggling with this problem for a while but can't seem to find a solution for it. The situation is that I need to open a file in browser and after the user closes the file the file is removed from their machine. All I have is the binary data for that file. If it matters, the binary data comes from Google Storage using the download_as_string method.
After doing some research I found that the tempfile module would suit my needs, but I can't get the tempfile to open in browser because the file only exists in memory and not on the disk. Any suggestions on how to solve this?
This is my code so far:
import tempfile
import webbrowser
# grabbing binary data earlier on
temp = tempfile.NamedTemporaryFile()
temp.name = "example.pdf"
temp.write(binary_data_obj)
temp.close()
webbrowser.open('file://' + os.path.realpath(temp.name))
When this is run, my computer gives me an error that says that the file cannot be opened since it is empty. I am on a Mac and am using Chrome if that is relevant.
You could try using a temporary directory instead:
import os
import tempfile
import webbrowser
# I used an existing pdf I had laying around as sample data
with open('c.pdf', 'rb') as fh:
data = fh.read()
# Gives a temporary directory you have write permissions to.
# The directory and files within will be deleted when the with context exits.
with tempfile.TemporaryDirectory() as temp_dir:
temp_file_path = os.path.join(temp_dir, 'example.pdf')
# write a normal file within the temp directory
with open(temp_file_path, 'wb+') as fh:
fh.write(data)
webbrowser.open('file://' + temp_file_path)
This worked for me on Mac OS.
Is there a possible and way to encrypt PDF-Files in python?
One possibility is to zip the PDFs but is there another ?
Thanks for your help
regards
Felix
You can use PyPDF2:
from PyPDF2 import PdfFileReader, PdfFileWriter
with open("input.pdf", "rb") as in_file:
input_pdf = PdfFileReader(in_file)
output_pdf = PdfFileWriter()
output_pdf.appendPagesFromReader(input_pdf)
output_pdf.encrypt("password")
with open("output.pdf", "wb") as out_file:
output_pdf.write(out_file)
For more information, check out the PdfFileWriter docs.
PikePdf which is python's adaptation of QPDF, is by far the better option. This is especially helpful if you have a file that has text in languages other than English.
from pikepdf import Pdf
pdf = Pdf.open(path/to/file)
pdf.save('output_filename.pdf', encryption=pikepdf.Encryption(owner=password, user=password, R=4))
# you can change the R from 4 to 6 for 256 aes encryption
pdf.close()
You can use PyPDF2
import PyPDF2
pdfFile = open('input.pdf', 'rb')
# Create reader and writer object
pdfReader = PyPDF2.PdfFileReader(pdfFile)
pdfWriter = PyPDF2.PdfFileWriter()
# Add all pages to writer (accepted answer results into blank pages)
for pageNum in range(pdfReader.numPages):
pdfWriter.addPage(pdfReader.getPage(pageNum))
# Encrypt with your password
pdfWriter.encrypt('password')
# Write it to an output file. (you can delete unencrypted version now)
resultPdf = open('encrypted_output.pdf', 'wb')
pdfWriter.write(resultPdf)
resultPdf.close()
Another option is Aspose.PDF Cloud SDK for Python, it is a rest API solution. You can use cloud storage of your choice from Amazon S3, DropBox, Google Drive Storage, Google Cloud Storage, Windows Azure Storage, FTP Storage and Aspose Cloud Storage.
The cryptoAlgorithm takes the follwing possible values
RC4x40: RC4 with key length 40
RC4x128: RC4 with key length 128
AESx128: AES with key length 128
AESx256: AES with key length 256
import os
import base64
import asposepdfcloud
from asposepdfcloud.apis.pdf_api import PdfApi
from shutil import copyfile
# Get Client key and Client ID from https://cloud.aspose.com
pdf_api_client = asposepdfcloud.api_client.ApiClient(
app_key='xxxxxxxxxxxxxxxxxxxxxxxxxx',
app_sid='xxxxxx-xxxx-xxxxx-xxxx-xxxxxxxxxxx')
pdf_api = PdfApi(pdf_api_client)
temp_folder="Temp"
#upload PDF file to storage
data_file = "C:/Temp/02_pages.pdf"
remote_name= "02_pages.pdf"
pdf_api.upload_file(remote_name,data_file)
out_path = "EncryptedPDF.pdf"
user_password_encoded = base64.b64encode(b'user $^Password!&')
owner_password_encoded = base64.b64encode(b'owner\//? $12^Password!&')
#Encrypte PDF document
response = pdf_api.put_encrypt_document(temp_folder + '/' + out_path, user_password_encoded, owner_password_encoded, asposepdfcloud.models.CryptoAlgorithm.AESX128, file = remote_name)
#download PDF file from storage
response_download = pdf_api.download_file(temp_folder + '/' + out_path)
copyfile(response_download, 'C:/Temp/' + out_path)
print(response)
I would highly recommend the pyAesCrypt module.
It is based on the Cryptography module which is written partly in C.
The module is quite fast, especially in high spec computers.
You can expect a 12 second encryption of a 3 Gb file on higher end computers, so It really is fast though not the best one.
One liner for encryptions and Decryptions are:
import pyAesCrypt
Encrypting:
pyAesCrypt.encryptFile(inputfile, outputfile, password, bufferSize)
Decrypting:
pyAesCrypt.decryptFile(inputfile, outputfile, password, bufferSize)
Since this is not the full explanation I would recommend to fully read the documentation as It is not really long.
You can find it here: https://pypi.org/project/pyAesCrypt/
You can also use PyPDF2 with this project.
For example, put the PDF_Lock.py file into your project folder.
Then you can use:
import PDF_Lock
and when you want protect a PDF file use:
PDF_Lock.lock(YourPDFFilePath, YourProtectedPDFFilePath, Password)
I have a very simple script that uses urllib to retrieve a zip file and place it on my desktop. The zip file is only a couple MB in size and doesn't take long to download. However, the script doesn't seem to finish, it just hangs. Is there a way to forcibly close the urlretrieve?...or a better solution?
The URL is to a public ftp size. Is the ftp perhaps the cause?
I'm using python 2.7.8.
url = r'ftp://ftp.ngs.noaa.gov/pub/DS_ARCHIVE/ShapeFiles/IA.ZIP'
zip_path = r'C:\Users\***\Desktop\ngs.zip'
urllib.urlretrieve(url, zip_path)
Thanks in advance!
---Edit---
Was able to use ftplib to accomplish the task...
import os
from ftplib import FTP
import zipfile
ftp_site = 'ftp.ngs.noaa.gov'
ftp_file = 'IA.ZIP'
download_folder = '//folder to place file'
download_file = 'name of file'
download_path = os.path.join(download_folder, download_file)
# Download file from ftp
ftp = FTP(ftp_site)
ftp.login()
ftp.cwd('pub/DS_ARCHIVE/ShapeFiles') #change directory
ftp.retrlines('LIST') #show me the files located in directory
download = open(download_path, 'wb')
ftp.retrbinary('RETR ' + ftp_file, download.write)
ftp.quit()
download.close()
# Unzip if .zip file is downloaded
with zipfile.ZipFile(download_path, "r") as z:
z.extractall(download_folder)
urllib has a very bad support for error catching and debugging. urllib2 is a much better choice. The urlretrieve equivalent in urllib2 is:
resp = urllib2.urlopen(im_url)
with open(sav_name, 'wb') as f:
f.write(resp.read())
And the errors to catch are:
urllib2.URLError, urllib2.HTTPError, httplib.HTTPException
And you can also catch socket.error in case that the network is down.
You can use python requests library with requests-ftp module. It provides easier API and better processes exceptions. See: https://pypi.python.org/pypi/requests-ftp and http://docs.python-requests.org/en/latest/
I want to do the following in Python:
Take a binary (executable) file
Turn it into a zip file with gzip (gz extension)
Then put the jpg extension
Later again the desire to recover the original (without the gz extension or jpg).
The idea is to send binary files through GMail SMTP to recover and then get them over IMAP and process them in ehtir original form (1).
The python gzip and shutil libraries can do what you need.
To gzip the executable.
import gzip, shutil
src = open('executable', 'rb')
dest = gzip.open('executable.gz.jpg', 'wb')
shutil.copyfileobj(src, dest)
src.close()
dest.close()
And then to get the original back.
import gzip. shutil
src = gzip.open('executable.gz.jpg', 'rb')
dest = gzip.open('executable', 'wb')
shutil.copyfileobj(src, dest)
src.close()
dest.close()
That being said, gmail's MIME filters look at content, not extension, so it may still block the new file.
You can use gzip to compress the files and os.rename to change file names. In your case you could just use gzip and save it with a .jpg extension in the first place.
import gzip
# write compressed file
with gzip.open('my_file.jpg', 'wb') as f:
f.write(content)
# read it again
with gzip.open('my_file.jpg', 'rb') as f:
content = f.read()
How to serve users a dynamically generated ZIP archive in Django?
I'm making a site, where users can choose any combination of available books and download them as ZIP archive. I'm worried that generating such archives for each request would slow my server down to a crawl. I have also heard that Django doesn't currently have a good solution for serving dynamically generated files.
The solution is as follows.
Use Python module zipfile to create zip archive, but as the file specify StringIO object (ZipFile constructor requires file-like object). Add files you want to compress. Then in your Django application return the content of StringIO object in HttpResponse with mimetype set to application/x-zip-compressed (or at least application/octet-stream). If you want, you can set content-disposition header, but this should not be really required.
But beware, creating zip archives on each request is bad idea and this may kill your server (not counting timeouts if the archives are large). Performance-wise approach is to cache generated output somewhere in filesystem and regenerate it only if source files have changed. Even better idea is to prepare archives in advance (eg. by cron job) and have your web server serving them as usual statics.
Here's a Django view to do this:
import os
import zipfile
import StringIO
from django.http import HttpResponse
def getfiles(request):
# Files (local path) to put in the .zip
# FIXME: Change this (get paths from DB etc)
filenames = ["/tmp/file1.txt", "/tmp/file2.txt"]
# Folder name in ZIP archive which contains the above files
# E.g [thearchive.zip]/somefiles/file2.txt
# FIXME: Set this to something better
zip_subdir = "somefiles"
zip_filename = "%s.zip" % zip_subdir
# Open StringIO to grab in-memory ZIP contents
s = StringIO.StringIO()
# The zip compressor
zf = zipfile.ZipFile(s, "w")
for fpath in filenames:
# Calculate path for file in zip
fdir, fname = os.path.split(fpath)
zip_path = os.path.join(zip_subdir, fname)
# Add file, at correct path
zf.write(fpath, zip_path)
# Must close zip for all contents to be written
zf.close()
# Grab ZIP file from in-memory, make response with correct MIME-type
resp = HttpResponse(s.getvalue(), mimetype = "application/x-zip-compressed")
# ..and correct content-disposition
resp['Content-Disposition'] = 'attachment; filename=%s' % zip_filename
return resp
Many answers here suggest to use a StringIO or BytesIO buffer. However this is not needed as HttpResponse is already a file-like object:
response = HttpResponse(content_type='application/zip')
zip_file = zipfile.ZipFile(response, 'w')
for filename in filenames:
zip_file.write(filename)
response['Content-Disposition'] = 'attachment; filename={}'.format(zipfile_name)
return response
Note that you should not call zip_file.close() as the open "file" is response and we definitely don't want to close it.
I used Django 2.0 and Python 3.6.
import zipfile
import os
from io import BytesIO
def download_zip_file(request):
filelist = ["path/to/file-11.txt", "path/to/file-22.txt"]
byte_data = BytesIO()
zip_file = zipfile.ZipFile(byte_data, "w")
for file in filelist:
filename = os.path.basename(os.path.normpath(file))
zip_file.write(file, filename)
zip_file.close()
response = HttpResponse(byte_data.getvalue(), content_type='application/zip')
response['Content-Disposition'] = 'attachment; filename=files.zip'
# Print list files in zip_file
zip_file.printdir()
return response
For python3 i use the io.ByteIO since StringIO is deprecated to achieve this. Hope it helps.
import io
def my_downloadable_zip(request):
zip_io = io.BytesIO()
with zipfile.ZipFile(zip_io, mode='w', compression=zipfile.ZIP_DEFLATED) as backup_zip:
backup_zip.write('file_name_loc_to_zip') # u can also make use of list of filename location
# and do some iteration over it
response = HttpResponse(zip_io.getvalue(), content_type='application/x-zip-compressed')
response['Content-Disposition'] = 'attachment; filename=%s' % 'your_zipfilename' + ".zip"
response['Content-Length'] = zip_io.tell()
return response
Django doesn't directly handle the generation of dynamic content (specifically Zip files). That work would be done by Python's standard library. You can take a look at how to dynamically create a Zip file in Python here.
If you're worried about it slowing down your server you can cache the requests if you expect to have many of the same requests. You can use Django's cache framework to help you with that.
Overall, zipping files can be CPU intensive but Django shouldn't be any slower than another Python web framework.
Shameless plug: you can use django-zipview for the same purpose.
After a pip install django-zipview:
from zipview.views import BaseZipView
from reviews import Review
class CommentsArchiveView(BaseZipView):
"""Download at once all comments for a review."""
def get_files(self):
document_key = self.kwargs.get('document_key')
reviews = Review.objects \
.filter(document__document_key=document_key) \
.exclude(comments__isnull=True)
return [review.comments.file for review in reviews if review.comments.name]
I suggest to use separate model for storing those temp zip files. You can create zip on-fly, save to model with filefield and finally send url to user.
Advantages:
Serving static zip files with django media mechanism (like usual uploads).
Ability to cleanup stale zip files by regular cron script execution (which can use date field from zip file model).
A lot of contributions were made to the topic already, but since I came across this thread when I first researched this problem, I thought I'd add my own two cents.
Integrating your own zip creation is probably not as robust and optimized as web-server-level solutions. At the same time, we're using Nginx and it doesn't come with a module out of the box.
You can, however, compile Nginx with the mod_zip module (see here for a docker image with the latest stable Nginx version, and an alpine base making it smaller than the default Nginx image). This adds the zip stream capabilities.
Then Django just needs to serve a list of files to zip, all done!
It is a little more reusable to use a library for this file list response, and django-zip-stream offers just that.
Sadly it never really worked for me, so I started a fork with fixes and improvements.
You can use it in a few lines:
def download_view(request, name=""):
from django_zip_stream.responses import FolderZipResponse
path = settings.STATIC_ROOT
path = os.path.join(path, name)
return FolderZipResponse(path)
You need a way to have Nginx serve all files that you want to archive, but that's it.
Can't you just write a link to a "zip server" or whatnot? Why does the zip archive itself need to be served from Django? A 90's era CGI script to generate a zip and spit it to stdout is really all that's required here, at least as far as I can see.