I'm trying to create a signed file using OpenSSL and Python, and I'm not receiving any error mesage, but the proccess is not working properly and I can't find the reason.
Below is my step-by-step to sign the file and check the signature:
First I create the crt in command line
openssl req -nodes -x509 -sha256 -newkey rsa:4096 -keyout "cert.key" -out "cert.crt" -subj "/C=BR/ST=SP/L=SP/O=Company/OU=IT Dept/CN=cert"
At this point, I have two files: cert.key and cert.crt
Sign the file use a Python Script like below:
import os.path
from Crypto.PublicKey import RSA
from Crypto.Signature import PKCS1_v1_5
from Crypto.Hash import SHA256
from base64 import b64encode, b64decode
def __init__(self):
folder = os.path.dirname(os.path.realpath(__file__))
file_path = os.path.join(folder, '../static/cert.key')
self.key = open(file_path, "r").read()
def sign_data(self, my_file):
rsakey = RSA.importKey(self.key) # I opened the cert.key in __init__
signer = PKCS1_v1_5.new(rsakey)
digest = SHA256.new()
digest.update(my_file)
sign = signer.sign(digest)
return sign, b64encode(sign)
All works fine and after save the files, I have other three files: my_file.csv (the original one), my_file.txt.sha256 and my_file.txt.sha256.base64. At this point, I can decode the base64 file and compare with the signed one and both are fine.
The problem is when I try to verify the signature using the following command:
`openssl dgst -sha256 -verify <(openssl x509 -in "cert.crt" -pubkey -noout) -signature my_file.txt.sha256 my_file.csv`
At this point I always receive the "Verification Failure" and don't understand why.
Maybe the problem is my lack of Python's Knowledge, because when I sign the file using the following command (after step 1 and before use the Python script described in 2), the same verification works fine.
openssl dgst -sha256 -sign "cert.key" -out my_file.txt.sha256 my_file.csv
Am I doing anything wrong?
UPDATE
Based on the comments, I tried the script in a local virtualnv with python 2.7 and it worked, so the problem must be in the read/write operations.
I'm updating this quetion with the complete script, including the read/write operations 'cause I can run it locally, but I still don't get any error in the GAE environment and can't understand why.
The first step is the CSV creation and storage in the Google Storage (Bucket) with the script below
import logging
import string
import cloudstorage as gcs
from google.appengine.api import app_identity
def create_csv_file(self, filename, cursor=None):
filename = '/' + self.bucket_name + filename
try:
write_retry_params = gcs.RetryParams(backoff_factor=1.1)
# the cursor stores a MySQL result set
if cursor is not None:
gcs_file = gcs.open(filename,
'w',
content_type='text/csv',
retry_params=write_retry_params)
for row in cursor:
gcs_file.write(','.join(map(str, row)) + '\n')
gcs_file.close()
except Exception as ex:
logging.critical("Problem to write in th GC Storage with the exception:{}".format(ex))
raise ex
It works fine and store a CSV in the correct path inside Google storage. After that part, the next read/write operation is the signature of the file.
def cert_file(self, original_filename):
filename = '/' + self.bucket_name + original_filename
cert = Cert() # This class just has one method, that is that described in my original question and is used to sign the file.
with gcs.open(filename) as cloudstorage_file:
cloudstorage_file.seek(-1024, os.SEEK_END)
signed_file, encoded_signed_file = cert.sign_data(cloudstorage_file.read()) #the method to sign the file
signature_content = encoded_signed_file
signed_file_name = string.replace(original_filename, '.csv', '.txt.sha256')
encoded_signed_file_name = string.replace(signed_file_name, '.txt.sha256', '.txt.sha256.base64')
self.inner_upload_file(signed_file, signed_file_name)
self.inner_upload_file(encoded_signed_file, encoded_signed_file_name)
return signed_file_name, encoded_signed_file_name, signature_content
The inner_upload_file, just save the new files in the same bucket:
def inner_upload_file(self, file_data, filename):
filename = '/' + self.bucket_name + filename
try:
write_retry_params = gcs.RetryParams(backoff_factor=1.1)
gcs_file = gcs.open(filename,
'w',
content_type='application/octet-stream',
retry_params=write_retry_params)
gcs_file.write(file_data)
gcs_file.close()
except Exception as ex:
logging.critical("Problem to write in th GC Storage with the exception:{}".format(ex))
raise ex
Here is the app.yaml for reference. The cert.key and cert.crt generated by command line are stored in a static folder inside the app folder (the same directory where is my app.yaml).
UPDATE 2
Following the comments, I tried to run the signature proccess locally and then compare the files. Below is the step-by-setp and results.
First, I adapted the signature process to run as python sign.py file_name.
#!/usr/bin/python
import sys
import os
from Crypto.PublicKey import RSA
from Crypto.Signature import PKCS1_v1_5
from Crypto.Hash import SHA256
from base64 import b64encode, b64decode
file_path = 'static/cert.key'
key = open(file_path, "rb").read()
rsakey = RSA.importKey(key)
signer = PKCS1_v1_5.new(rsakey)
digest = SHA256.new()
file_object = open(sys.argv[1], "r")
digest.update(file_object.read())
sign = signer.sign(digest)
signed_path = "signed"
f = open(signed_path + '.txt.sha256', 'w')
f.write(sign)
f.close()
f2 = open(signed_path + '.txt.sha256.base64', 'w')
f2.write(b64encode(sign))
f2.close()
I ran the automatic proccess that saved the signed file in GCS's bucket (along with the original CSV file). After it I download both files through Google web panel for GCS.
I ran the command python sign.py gcs_file_original.csv in a virtualenv with python 2.7.10 using the CSV file I just downloaded.
After it, I compared the two signed files with cmp -b gcs_signed.txt.sha256 locally_signed.txt.sha256 resulting in:
gcs_signed.txt.sha256 locally_signed.txt.sha256 differ: byte 1, line 1 is 24 ^T 164 t
Using the VisualBinaryDiff, the result looks like two totally different files.
Now, I know the problem, but have no idea on how to fix it. This problem is beeing very trick.
I finally found the problem. I was so focused in find a problem in the openssl signature proccess and didn't pay attention to old Ctrl+C/Ctrl+V problem.
For test purposes, I copied the 'Read from GCS' example from this tutorial.
When I moved the test to the real world application, I didn't read the page again and didn't note the gcs_file.seek(-1024, os.SEEK_END).
As I said in the original question, I'm not Python specialist, but this line was reading just part of the GCS file, so the signature was indeed different from the original one.
Just cut that line of my reading methods and now all works fine.
Related
I need your help badly :D
I wrote a code in python with PGP , I have a trusted public key and I could perfectly encrypt my massage with this code, but when I run it on data brick I faced problem :
gnupghome should be a directory and it isnt
I would like to know how can I access to a directory in databrick.
import gnupg
from pprint import pprint
import os
gpg = gnupg.GPG(gnupghome='/root/.pnugp')
key_data = open("/dbfs/mnt/xxxx/SCO/oracle/xxx/Files/publickey.asc").read()
import_result = gpg.import_keys(key_data)
pprint(import_result.results)
with open("/dbfs/mnt/xxxxx-storage/SCO/oracle/xxx/Files/FileToEncrypt.txt",'rb') as f:
status = gpg.encrypt_file(
f, recipients=['securxxxxfertuca#xx.ca'],
output='my-encrypted.txt.gpg')
print( 'ok: ', status.ok)
print ('status: ', status.status)
print ('stderr: ', status.stderr)
I suspect that this ran successfully locally. It doesn't work on databricks because it is looking for the .pnugp in the root which data bricks does not allow you to access.
I use the below snippet of code which doesn't need you to access anything from any directory other than the files you plan to encrypt and the keys.
In the code, I have my public key stored in the key vault as a secret named 'publicb64'. If you want to read the asc version from somewhere you can just read it into KEY_PUB. Don't forget to install pgpy using pip install pgpy.
#Encrypting a file using public key
import pgpy
from pgpy.constants import PubKeyAlgorithm, KeyFlags, HashAlgorithm, SymmetricKeyAlgorithm, CompressionAlgorithm
from timeit import default_timer as timer
import base64
import io
KEY_PUB = base64.b64decode(publicb64).decode("ascii").lstrip()
#print(KEY_PUB)
pub_key = pgpy.PGPKey()
pub_key.parse(KEY_PUB)
pass
# -READ THE FILE FROM MOUNT POINT-----------------
with io.open('/dbfs/mnt/sample_data/california_housing_test.csv', "r",newline='') as csv_file:
input_data = csv_file.read() # The io and newline retains the CRLF
t0 = timer()
#PGP Encryption start
msg = pgpy.PGPMessage.new(input_data)
###### this returns a new PGPMessage that contains an encrypted form of the original message
encrypted_message = pub_key.encrypt(msg)
pgpstr = str(encrypted_message)
with open('/dbfs/mnt/sample_data/california_housing_test.csv.pgp', "w") as text_file:
text_file.write(pgpstr)
print("Encryption Complete :" + str(timer()-t0))
Is there a possible and way to encrypt PDF-Files in python?
One possibility is to zip the PDFs but is there another ?
Thanks for your help
regards
Felix
You can use PyPDF2:
from PyPDF2 import PdfFileReader, PdfFileWriter
with open("input.pdf", "rb") as in_file:
input_pdf = PdfFileReader(in_file)
output_pdf = PdfFileWriter()
output_pdf.appendPagesFromReader(input_pdf)
output_pdf.encrypt("password")
with open("output.pdf", "wb") as out_file:
output_pdf.write(out_file)
For more information, check out the PdfFileWriter docs.
PikePdf which is python's adaptation of QPDF, is by far the better option. This is especially helpful if you have a file that has text in languages other than English.
from pikepdf import Pdf
pdf = Pdf.open(path/to/file)
pdf.save('output_filename.pdf', encryption=pikepdf.Encryption(owner=password, user=password, R=4))
# you can change the R from 4 to 6 for 256 aes encryption
pdf.close()
You can use PyPDF2
import PyPDF2
pdfFile = open('input.pdf', 'rb')
# Create reader and writer object
pdfReader = PyPDF2.PdfFileReader(pdfFile)
pdfWriter = PyPDF2.PdfFileWriter()
# Add all pages to writer (accepted answer results into blank pages)
for pageNum in range(pdfReader.numPages):
pdfWriter.addPage(pdfReader.getPage(pageNum))
# Encrypt with your password
pdfWriter.encrypt('password')
# Write it to an output file. (you can delete unencrypted version now)
resultPdf = open('encrypted_output.pdf', 'wb')
pdfWriter.write(resultPdf)
resultPdf.close()
Another option is Aspose.PDF Cloud SDK for Python, it is a rest API solution. You can use cloud storage of your choice from Amazon S3, DropBox, Google Drive Storage, Google Cloud Storage, Windows Azure Storage, FTP Storage and Aspose Cloud Storage.
The cryptoAlgorithm takes the follwing possible values
RC4x40: RC4 with key length 40
RC4x128: RC4 with key length 128
AESx128: AES with key length 128
AESx256: AES with key length 256
import os
import base64
import asposepdfcloud
from asposepdfcloud.apis.pdf_api import PdfApi
from shutil import copyfile
# Get Client key and Client ID from https://cloud.aspose.com
pdf_api_client = asposepdfcloud.api_client.ApiClient(
app_key='xxxxxxxxxxxxxxxxxxxxxxxxxx',
app_sid='xxxxxx-xxxx-xxxxx-xxxx-xxxxxxxxxxx')
pdf_api = PdfApi(pdf_api_client)
temp_folder="Temp"
#upload PDF file to storage
data_file = "C:/Temp/02_pages.pdf"
remote_name= "02_pages.pdf"
pdf_api.upload_file(remote_name,data_file)
out_path = "EncryptedPDF.pdf"
user_password_encoded = base64.b64encode(b'user $^Password!&')
owner_password_encoded = base64.b64encode(b'owner\//? $12^Password!&')
#Encrypte PDF document
response = pdf_api.put_encrypt_document(temp_folder + '/' + out_path, user_password_encoded, owner_password_encoded, asposepdfcloud.models.CryptoAlgorithm.AESX128, file = remote_name)
#download PDF file from storage
response_download = pdf_api.download_file(temp_folder + '/' + out_path)
copyfile(response_download, 'C:/Temp/' + out_path)
print(response)
I would highly recommend the pyAesCrypt module.
It is based on the Cryptography module which is written partly in C.
The module is quite fast, especially in high spec computers.
You can expect a 12 second encryption of a 3 Gb file on higher end computers, so It really is fast though not the best one.
One liner for encryptions and Decryptions are:
import pyAesCrypt
Encrypting:
pyAesCrypt.encryptFile(inputfile, outputfile, password, bufferSize)
Decrypting:
pyAesCrypt.decryptFile(inputfile, outputfile, password, bufferSize)
Since this is not the full explanation I would recommend to fully read the documentation as It is not really long.
You can find it here: https://pypi.org/project/pyAesCrypt/
You can also use PyPDF2 with this project.
For example, put the PDF_Lock.py file into your project folder.
Then you can use:
import PDF_Lock
and when you want protect a PDF file use:
PDF_Lock.lock(YourPDFFilePath, YourProtectedPDFFilePath, Password)
So, I'm developing a Flask application which uses the GDAL library, where I want to stream a .tif file through an url.
Right now I have method that reads a .tif file using gdal.Open(filepath). When run outside of the Flask environment (like in a Python console), it works fine by both specifying the filepath to a local file and a url.
from gdalconst import GA_ReadOnly
import gdal
filename = 'http://xxxxxxx.blob.core.windows.net/dsm/DSM_1km_6349_614.tif'
dataset = gdal.Open(filename, GA_ReadOnly )
if dataset is not None:
print 'Driver: ', dataset.GetDriver().ShortName,'/', \
dataset.GetDriver().LongName
However, when the following code is executed inside the Flask environement, I get the following message:
ERROR 4: `http://xxxxxxx.blob.core.windows.net/dsm/DSM_1km_6349_614.tif' does
not exist in the file system,
and is not recognised as a supported dataset name.
If I instead download the file to the local filesystem of the Flask app, and insert the path to the file, like this:
block_blob_service = get_blobservice() #Initialize block service
block_blob_service.get_blob_to_path('dsm', blobname, filename) # Get blob to local filesystem, path to file saved in filename
dataset = gdal.Open(filename, GA_ReadOnly)
That works just fine...
The thing is, since I'm requesting some big files (200 mb), I want to stream the files using the url instead of the local file reference.
Does anyone have an idea of what could be causing this? I also tried putting "/vsicurl_streaming/" in front of the url as suggested elsewhere.
I'm using Python 2.7, 32-bit with GDAL 2.0.2
Please try the follow code snippet:
from gzip import GzipFile
from io import BytesIO
import urllib2
from uuid import uuid4
from gdalconst import GA_ReadOnly
import gdal
def open_http_query(url):
try:
request = urllib2.Request(url,
headers={"Accept-Encoding": "gzip"})
response = urllib2.urlopen(request, timeout=30)
if response.info().get('Content-Encoding') == 'gzip':
return GzipFile(fileobj=BytesIO(response.read()))
else:
return response
except urllib2.URLError:
return None
url = 'http://xxx.blob.core.windows.net/container/example.tif'
image_data = open_http_query(url)
mmap_name = "/vsimem/"+uuid4().get_hex()
gdal.FileFromMemBuffer(mmap_name, image_data.read())
dataset = gdal.Open(mmap_name)
if dataset is not None:
print 'Driver: ', dataset.GetDriver().ShortName,'/', \
dataset.GetDriver().LongName
Which use a GDAL memory-mapped file to open an image retrieved via HTTP directly as a NumPy array without saving to a temporary file.
Refer to https://gist.github.com/jleinonen/5781308 for more info.
I am using Docker python client API 'copy'.
Response from copy is of type requests.packages.urllib3.HTTPResponse
Does it need to be handled differently for different types of file?
I copied a text file from container but when I try to read it using
response.read() I am getting text data mixed with binary data.
I see content decoders as
>>>resonse.CONTENT_DECODERS
>>>['gzip', 'deflate']
What is the best way to handle/read/dump the response from copy API ?
The response from the docker API is an uncompressed tar file. I had to read docker's source code to know the format of the response, as this is not documented. For instance, to download a file at remote_path, you need to do the following:
import tarfile, StringIO, os
reply = docker.copy(container, remote_path)
filelike = StringIO.StringIO(reply.read())
tar = tarfile.open(fileobj = filelike)
file = tar.extractfile(os.path.basename(remote_path))
print file.read()
The code should be modified to work on folders.
Here my python 3 version with docker API 1.38, the copy API seems to be replaced by get_archive.
archive, stat = client.get_archive(path)
filelike = io.BytesIO(b"".join(b for b in archive))
tar = tarfile.open(fileobj=filelike)
fd = tar.extractfile(stat['name'])
Adjusting #Apr's answer for Python 3:
import tarfile, io, os
def copy_from_docker(client, container_id, src, dest):
reply = client.copy(container_id, src)
filelike = io.BytesIO(reply.read())
tar = tarfile.open(fileobj = filelike)
file = tar.extractfile(os.path.basename(src))
with open(dest, 'wb') as f:
f.write(file.read())
I have the following view code that attempts to "stream" a zipfile to the client for download:
import os
import zipfile
import tempfile
from pyramid.response import FileIter
def zipper(request):
_temp_path = request.registry.settings['_temp']
tmpfile = tempfile.NamedTemporaryFile('w', dir=_temp_path, delete=True)
tmpfile_path = tmpfile.name
## creating zipfile and adding files
z = zipfile.ZipFile(tmpfile_path, "w")
z.write('somefile1.txt')
z.write('somefile2.txt')
z.close()
## renaming the zipfile
new_zip_path = _temp_path + '/somefilegroup.zip'
os.rename(tmpfile_path, new_zip_path)
## re-opening the zipfile with new name
z = zipfile.ZipFile(new_zip_path, 'r')
response = FileIter(z.fp)
return response
However, this is the Response I get in the browser:
Could not convert return value of the view callable function newsite.static.zipper into a response object. The value returned was .
I suppose I am not using FileIter correctly.
UPDATE:
Since updating with Michael Merickel's suggestions, the FileIter function is working correctly. However, still lingering is a MIME type error that appears on the client (browser):
Resource interpreted as Document but transferred with MIME type application/zip: "http://newsite.local:6543/zipper?data=%7B%22ids%22%3A%5B6%2C7%5D%7D"
To better illustrate the issue, I have included a tiny .py and .pt file on Github: https://github.com/thapar/zipper-fix
FileIter is not a response object, just like your error message says. It is an iterable that can be used for the response body, that's it. Also the ZipFile can accept a file object, which is more useful here than a file path. Let's try writing into the tmpfile, then rewinding that file pointer back to the start, and using it to write out without doing any fancy renaming.
import os
import zipfile
import tempfile
from pyramid.response import FileIter
def zipper(request):
_temp_path = request.registry.settings['_temp']
fp = tempfile.NamedTemporaryFile('w+b', dir=_temp_path, delete=True)
## creating zipfile and adding files
z = zipfile.ZipFile(fp, "w")
z.write('somefile1.txt')
z.write('somefile2.txt')
z.close()
# rewind fp back to start of the file
fp.seek(0)
response = request.response
response.content_type = 'application/zip'
response.app_iter = FileIter(fp)
return response
I changed the mode on NamedTemporaryFile to 'w+b' as per the docs to allow the file to be written to and read from.
current Pyramid version has 2 convenience classes for this use case- FileResponse, FileIter. The snippet below will serve a static file. I ran this code - the downloaded file is named "download" like the view name. To change the file name and more set the Content-Disposition header or have a look at the arguments of pyramid.response.Response.
from pyramid.response import FileResponse
#view_config(name="download")
def zipper(request):
path = 'path_to_file'
return FileResponse(path, request) #passing request is required
docs:
http://docs.pylonsproject.org/projects/pyramid/en/latest/api/response.html#
hint: extract the Zip logic from the view if possible