Encrypt PDFs in python - python

Is there a possible and way to encrypt PDF-Files in python?
One possibility is to zip the PDFs but is there another ?
Thanks for your help
regards
Felix

You can use PyPDF2:
from PyPDF2 import PdfFileReader, PdfFileWriter
with open("input.pdf", "rb") as in_file:
input_pdf = PdfFileReader(in_file)
output_pdf = PdfFileWriter()
output_pdf.appendPagesFromReader(input_pdf)
output_pdf.encrypt("password")
with open("output.pdf", "wb") as out_file:
output_pdf.write(out_file)
For more information, check out the PdfFileWriter docs.

PikePdf which is python's adaptation of QPDF, is by far the better option. This is especially helpful if you have a file that has text in languages other than English.
from pikepdf import Pdf
pdf = Pdf.open(path/to/file)
pdf.save('output_filename.pdf', encryption=pikepdf.Encryption(owner=password, user=password, R=4))
# you can change the R from 4 to 6 for 256 aes encryption
pdf.close()

You can use PyPDF2
import PyPDF2
pdfFile = open('input.pdf', 'rb')
# Create reader and writer object
pdfReader = PyPDF2.PdfFileReader(pdfFile)
pdfWriter = PyPDF2.PdfFileWriter()
# Add all pages to writer (accepted answer results into blank pages)
for pageNum in range(pdfReader.numPages):
pdfWriter.addPage(pdfReader.getPage(pageNum))
# Encrypt with your password
pdfWriter.encrypt('password')
# Write it to an output file. (you can delete unencrypted version now)
resultPdf = open('encrypted_output.pdf', 'wb')
pdfWriter.write(resultPdf)
resultPdf.close()

Another option is Aspose.PDF Cloud SDK for Python, it is a rest API solution. You can use cloud storage of your choice from Amazon S3, DropBox, Google Drive Storage, Google Cloud Storage, Windows Azure Storage, FTP Storage and Aspose Cloud Storage.
The cryptoAlgorithm takes the follwing possible values
RC4x40: RC4 with key length 40
RC4x128: RC4 with key length 128
AESx128: AES with key length 128
AESx256: AES with key length 256
import os
import base64
import asposepdfcloud
from asposepdfcloud.apis.pdf_api import PdfApi
from shutil import copyfile
# Get Client key and Client ID from https://cloud.aspose.com
pdf_api_client = asposepdfcloud.api_client.ApiClient(
app_key='xxxxxxxxxxxxxxxxxxxxxxxxxx',
app_sid='xxxxxx-xxxx-xxxxx-xxxx-xxxxxxxxxxx')
pdf_api = PdfApi(pdf_api_client)
temp_folder="Temp"
#upload PDF file to storage
data_file = "C:/Temp/02_pages.pdf"
remote_name= "02_pages.pdf"
pdf_api.upload_file(remote_name,data_file)
out_path = "EncryptedPDF.pdf"
user_password_encoded = base64.b64encode(b'user $^Password!&')
owner_password_encoded = base64.b64encode(b'owner\//? $12^Password!&')
#Encrypte PDF document
response = pdf_api.put_encrypt_document(temp_folder + '/' + out_path, user_password_encoded, owner_password_encoded, asposepdfcloud.models.CryptoAlgorithm.AESX128, file = remote_name)
#download PDF file from storage
response_download = pdf_api.download_file(temp_folder + '/' + out_path)
copyfile(response_download, 'C:/Temp/' + out_path)
print(response)

I would highly recommend the pyAesCrypt module.
It is based on the Cryptography module which is written partly in C.
The module is quite fast, especially in high spec computers.
You can expect a 12 second encryption of a 3 Gb file on higher end computers, so It really is fast though not the best one.
One liner for encryptions and Decryptions are:
import pyAesCrypt
Encrypting:
pyAesCrypt.encryptFile(inputfile, outputfile, password, bufferSize)
Decrypting:
pyAesCrypt.decryptFile(inputfile, outputfile, password, bufferSize)
Since this is not the full explanation I would recommend to fully read the documentation as It is not really long.
You can find it here: https://pypi.org/project/pyAesCrypt/

You can also use PyPDF2 with this project.
For example, put the PDF_Lock.py file into your project folder.
Then you can use:
import PDF_Lock
and when you want protect a PDF file use:
PDF_Lock.lock(YourPDFFilePath, YourProtectedPDFFilePath, Password)

Related

python code to upload files to sharepoint

i want to be able to export .csv or excel workbooks directly into Sharepoint using python code - is this even possible?
thanks in advance!
Hi i found something that can help you!
You need this library: Office365 REST python client, to connect to the Microsoft API and upload your files (here you can find an example that does exactly what do you want). I think that you can upload both .csv and .xls, you should try and let us know!
According to my research and testing, I will recommend you to use Office365-Rest-Python-Client to consume SharePoint Rest API.
You can use the following code to upload file:
import os
from office365.sharepoint.client_context import ClientContext
from tests import test_user_credentials, test_team_site_url
ctx = ClientContext(test_team_site_url).with_credentials(test_user_credentials)
path = "../../data/report #123.csv"
with open(path, 'rb') as content_file:
file_content = content_file.read()
list_title = "Documents"
target_folder = ctx.web.lists.get_by_title(list_title).root_folder
name = os.path.basename(path)
target_file = target_folder.upload_file(name, file_content).execute_query()
print("File has been uploaded to url: {0}".format(target_file.serverRelativeUrl))
More information for reference: https://github.com/vgrem/Office365-REST-Python-Client/blob/master/examples/sharepoint/files/upload_file.py

How can i read file pdf in AWS S3 with boto3 in python?

I would like to read .pdf files in S3 bucket, but the problem is that it returns formatted bytes,
Whereas if the file is in .csv or .txt this code works
What's wrong with .pdf files?
the code :
import boto3
s3client = boto3.client('s3')
fileobj = s3client.get_object(
Bucket=BUCKET_NAME,
Key='file.pdf'
)
filedata = fileobj['Body'].read()
contents = filedata
print(contents)
it returns :
b'%PDF-1.4\n%\xd3\xeb\xe9\xe1\n1 0 obj\n<</Title (Architecture technique)\n/Producer (Skia/PDF m99 Google Docs Renderer)>>\nendobj\n3 0 obj\n<</ca 1\n/BM /Normal>>\nendobj\n6 0 obj\n<</Type /XObject\n/Subtype /Image\n/Width 1424\n/Height 500\n/ColorSpace /DeviceRGB\n/SMask 7 0 R\n/BitsPerComponent 8\n/Filter /FlateDecode\n/Length 26885>> stream\nx\x9c\xed\xdd\xeb\x93$Y\x99\xe7\xf7'
another solution that i try but not work too:
import boto3
from PyPDF2 import PdfFileReader
from io import BytesIO
s3 = boto3.resource('s3')
obj = s3.Object(BUCKET_NAME,'file.pdf')
fs = obj.get()['Body'].read()
pdfFile = PdfFileReader(BytesIO(fs))
it's return :
<PyPDF2.pdf.PdfFileReader at 0x7efbc8aead00>
Start by writing some Python code to access a PDF file on your local disk (search for a Python PDF library on the web).
Once you have that working, then you can look at reading the file from Amazon S3.
When reading a file from S3, you have two options:
Use fileobj['Body'].read() (as you already are doing) to obtain the bytes from the file directly, or
Use download_file() to download the file from S3 to the local disk, then process the file from disk
Which method to choose will depend upon the PDF library that you choose to use.

Creating view in browser functionality with python

I have been struggling with this problem for a while but can't seem to find a solution for it. The situation is that I need to open a file in browser and after the user closes the file the file is removed from their machine. All I have is the binary data for that file. If it matters, the binary data comes from Google Storage using the download_as_string method.
After doing some research I found that the tempfile module would suit my needs, but I can't get the tempfile to open in browser because the file only exists in memory and not on the disk. Any suggestions on how to solve this?
This is my code so far:
import tempfile
import webbrowser
# grabbing binary data earlier on
temp = tempfile.NamedTemporaryFile()
temp.name = "example.pdf"
temp.write(binary_data_obj)
temp.close()
webbrowser.open('file://' + os.path.realpath(temp.name))
When this is run, my computer gives me an error that says that the file cannot be opened since it is empty. I am on a Mac and am using Chrome if that is relevant.
You could try using a temporary directory instead:
import os
import tempfile
import webbrowser
# I used an existing pdf I had laying around as sample data
with open('c.pdf', 'rb') as fh:
data = fh.read()
# Gives a temporary directory you have write permissions to.
# The directory and files within will be deleted when the with context exits.
with tempfile.TemporaryDirectory() as temp_dir:
temp_file_path = os.path.join(temp_dir, 'example.pdf')
# write a normal file within the temp directory
with open(temp_file_path, 'wb+') as fh:
fh.write(data)
webbrowser.open('file://' + temp_file_path)
This worked for me on Mac OS.

Problems to verify a file signed with python

I'm trying to create a signed file using OpenSSL and Python, and I'm not receiving any error mesage, but the proccess is not working properly and I can't find the reason.
Below is my step-by-step to sign the file and check the signature:
First I create the crt in command line
openssl req -nodes -x509 -sha256 -newkey rsa:4096 -keyout "cert.key" -out "cert.crt" -subj "/C=BR/ST=SP/L=SP/O=Company/OU=IT Dept/CN=cert"
At this point, I have two files: cert.key and cert.crt
Sign the file use a Python Script like below:
import os.path
from Crypto.PublicKey import RSA
from Crypto.Signature import PKCS1_v1_5
from Crypto.Hash import SHA256
from base64 import b64encode, b64decode
def __init__(self):
folder = os.path.dirname(os.path.realpath(__file__))
file_path = os.path.join(folder, '../static/cert.key')
self.key = open(file_path, "r").read()
def sign_data(self, my_file):
rsakey = RSA.importKey(self.key) # I opened the cert.key in __init__
signer = PKCS1_v1_5.new(rsakey)
digest = SHA256.new()
digest.update(my_file)
sign = signer.sign(digest)
return sign, b64encode(sign)
All works fine and after save the files, I have other three files: my_file.csv (the original one), my_file.txt.sha256 and my_file.txt.sha256.base64. At this point, I can decode the base64 file and compare with the signed one and both are fine.
The problem is when I try to verify the signature using the following command:
`openssl dgst -sha256 -verify <(openssl x509 -in "cert.crt" -pubkey -noout) -signature my_file.txt.sha256 my_file.csv`
At this point I always receive the "Verification Failure" and don't understand why.
Maybe the problem is my lack of Python's Knowledge, because when I sign the file using the following command (after step 1 and before use the Python script described in 2), the same verification works fine.
openssl dgst -sha256 -sign "cert.key" -out my_file.txt.sha256 my_file.csv
Am I doing anything wrong?
UPDATE
Based on the comments, I tried the script in a local virtualnv with python 2.7 and it worked, so the problem must be in the read/write operations.
I'm updating this quetion with the complete script, including the read/write operations 'cause I can run it locally, but I still don't get any error in the GAE environment and can't understand why.
The first step is the CSV creation and storage in the Google Storage (Bucket) with the script below
import logging
import string
import cloudstorage as gcs
from google.appengine.api import app_identity
def create_csv_file(self, filename, cursor=None):
filename = '/' + self.bucket_name + filename
try:
write_retry_params = gcs.RetryParams(backoff_factor=1.1)
# the cursor stores a MySQL result set
if cursor is not None:
gcs_file = gcs.open(filename,
'w',
content_type='text/csv',
retry_params=write_retry_params)
for row in cursor:
gcs_file.write(','.join(map(str, row)) + '\n')
gcs_file.close()
except Exception as ex:
logging.critical("Problem to write in th GC Storage with the exception:{}".format(ex))
raise ex
It works fine and store a CSV in the correct path inside Google storage. After that part, the next read/write operation is the signature of the file.
def cert_file(self, original_filename):
filename = '/' + self.bucket_name + original_filename
cert = Cert() # This class just has one method, that is that described in my original question and is used to sign the file.
with gcs.open(filename) as cloudstorage_file:
cloudstorage_file.seek(-1024, os.SEEK_END)
signed_file, encoded_signed_file = cert.sign_data(cloudstorage_file.read()) #the method to sign the file
signature_content = encoded_signed_file
signed_file_name = string.replace(original_filename, '.csv', '.txt.sha256')
encoded_signed_file_name = string.replace(signed_file_name, '.txt.sha256', '.txt.sha256.base64')
self.inner_upload_file(signed_file, signed_file_name)
self.inner_upload_file(encoded_signed_file, encoded_signed_file_name)
return signed_file_name, encoded_signed_file_name, signature_content
The inner_upload_file, just save the new files in the same bucket:
def inner_upload_file(self, file_data, filename):
filename = '/' + self.bucket_name + filename
try:
write_retry_params = gcs.RetryParams(backoff_factor=1.1)
gcs_file = gcs.open(filename,
'w',
content_type='application/octet-stream',
retry_params=write_retry_params)
gcs_file.write(file_data)
gcs_file.close()
except Exception as ex:
logging.critical("Problem to write in th GC Storage with the exception:{}".format(ex))
raise ex
Here is the app.yaml for reference. The cert.key and cert.crt generated by command line are stored in a static folder inside the app folder (the same directory where is my app.yaml).
UPDATE 2
Following the comments, I tried to run the signature proccess locally and then compare the files. Below is the step-by-setp and results.
First, I adapted the signature process to run as python sign.py file_name.
#!/usr/bin/python
import sys
import os
from Crypto.PublicKey import RSA
from Crypto.Signature import PKCS1_v1_5
from Crypto.Hash import SHA256
from base64 import b64encode, b64decode
file_path = 'static/cert.key'
key = open(file_path, "rb").read()
rsakey = RSA.importKey(key)
signer = PKCS1_v1_5.new(rsakey)
digest = SHA256.new()
file_object = open(sys.argv[1], "r")
digest.update(file_object.read())
sign = signer.sign(digest)
signed_path = "signed"
f = open(signed_path + '.txt.sha256', 'w')
f.write(sign)
f.close()
f2 = open(signed_path + '.txt.sha256.base64', 'w')
f2.write(b64encode(sign))
f2.close()
I ran the automatic proccess that saved the signed file in GCS's bucket (along with the original CSV file). After it I download both files through Google web panel for GCS.
I ran the command python sign.py gcs_file_original.csv in a virtualenv with python 2.7.10 using the CSV file I just downloaded.
After it, I compared the two signed files with cmp -b gcs_signed.txt.sha256 locally_signed.txt.sha256 resulting in:
gcs_signed.txt.sha256 locally_signed.txt.sha256 differ: byte 1, line 1 is 24 ^T 164 t
Using the VisualBinaryDiff, the result looks like two totally different files.
Now, I know the problem, but have no idea on how to fix it. This problem is beeing very trick.
I finally found the problem. I was so focused in find a problem in the openssl signature proccess and didn't pay attention to old Ctrl+C/Ctrl+V problem.
For test purposes, I copied the 'Read from GCS' example from this tutorial.
When I moved the test to the real world application, I didn't read the page again and didn't note the gcs_file.seek(-1024, os.SEEK_END).
As I said in the original question, I'm not Python specialist, but this line was reading just part of the GCS file, so the signature was indeed different from the original one.
Just cut that line of my reading methods and now all works fine.

pickling python objects to google cloud storage

I've been pickling the objects to filesystem and reading them back when needed to work with those objects. Currently I've this code for that purpose.
def pickle(self, directory, filename):
if not os.path.exists(directory):
os.makedirs(directory)
with open(directory + '/' + filename, 'wb') as handle:
pickle.dump(self, handle)
#staticmethod
def load(filename):
with open(filename, 'rb') as handle:
element = pickle.load(handle)
return element
Now I'm moving my applictation(django) to Google app engine and figured that app engine does not allow me to write to file system. Google cloud storage seemed my only choice but I could not understand how could I pickle my objects as cloud storage objects and read them back to create the original python object.
For Python 3 users, you can use gcsfs library from Dask creator to solve your issue.
Example reading:
import gcsfs
fs = gcsfs.GCSFileSystem(project='my-google-project')
fs.ls('my-bucket')
>>> ['my-file.txt']
with fs.open('my-bucket/my-file.txt', 'rb') as f:
print(f.read())
It basically is identical with pickle, though:
with fs.open(directory + '/' + filename, 'wb') as handle:
pickle.dump(shandle)
To read, this is similar, but replace wb by rb and dump with load:
with fs.open(directory + '/' + filename, 'rb') as handle:
pickle.load(handle)
You can use the Cloud Storage client library.
Instead of open() use cloudstorage.open() (or gcs.open() if importing cloudstorage as gcs, as in the above-mentioned doc) and note that the full filepath starts with the GCS bucket name (as a dir).
More details in the cloudstorage.open() documentation.
One other option (I tested it with Tensorflow 2.2.0) which also works with Python 3:
from tensorflow.python.lib.io import file_io
with file_io.FileIO('gs://....', mode='rb') as f:
pickle.load(f)
This is very useful if you already use Tensorflow for example.

Categories

Resources