I am using the cryptography library for python. My goal is to take a string, encrypt it and then write to to a file.
This may be done multiple times, with each appending to the end of the file additional data; which is also encrypted.
I have tried a few solutions, such as:
Using the hazmat level API to avoid as much meta data stored in the encrypted text.
Writing each encrypted string to a new line in a text file.
This is the code that uses ECB mode and the hazmat API. It attempts to read the file and decrypt line by line. I understand it is unsafe, my main use is to log this data only locally to a file and then use a safe PKCS over the wire.
from cryptography import fernet
key = 'WqSAOfEoOdSP0c6i1CiyoOpTH2Gma3ff_G3BpDx52sE='
crypt_obj = fernet.Fernet(key)
file_handle = open('test.txt', 'a')
data = 'Hello1'
data = crypt_obj.encrypt(data.encode())
file_handle.write(data.decode() + '\n')
file_handle.close()
file_handle_two = open('test.txt', 'a')
data_two = 'Hello2'
data_two = crypt_obj.encrypt(data_two.encode())
file_handle_two.write(data_two.decode() + '\n')
file_handle_two.close()
file_read = open('test.txt', 'r')
file_lines = file_read.readlines()
file_content = ''
for line in file_lines:
line = line[:-2]
file_content = crypt_obj.decrypt(line.encode()).decode()
print(file_content)
file_read.close()
For the code above I get the following error:
Traceback (most recent call last):
File "C:\Dev\Python\local_crypt_test\venv\lib\site-packages\cryptography\fernet.py", line 110, in _get_unverified_token_data
data = base64.urlsafe_b64decode(token)
File "C:\Users\19097\AppData\Local\Programs\Python\Python39\lib\base64.py", line 133, in urlsafe_b64decode
return b64decode(s)
File "C:\Users\19097\AppData\Local\Programs\Python\Python39\lib\base64.py", line 87, in b64decode
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Dev\Python\local_crypt_test\main.py", line 25, in <module>
file_content = crypt_obj.decrypt(line.encode()).decode()
File "C:\Dev\Python\local_crypt_test\venv\lib\site-packages\cryptography\fernet.py", line 83, in decrypt
timestamp, data = Fernet._get_unverified_token_data(token)
File "C:\Dev\Python\local_crypt_test\venv\lib\site-packages\cryptography\fernet.py", line 112, in _get_unverified_token_data
raise InvalidToken
cryptography.fernet.InvalidToken
Process finished with exit code 1
These examples are only to demonstrate the issue, my real code looks much different so you may ignore errors in the example that do not pertain to my main issue. That is, appending encrypted data to a file and decrypting/reading that data from the file at a later time. The file does not need to be in any specific format, as long as it can be read from and decrypted to obtain the original message. Also, the mode of operation is not tied to ECB, if your example uses another type, that works too.
I am honestly stumped and would appreciate any help!
There are a couple details at play here...
1. Trailing newline character(s) are included in each line
When you loop through file_lines, each line includes the trailing newline character(s).
I say "character(s)" because this can vary based on the platform (e.g. Linux/macOS = '\n' versus Windows = '\r\n').
2. base64 decoding silently discards invalid characters
Fernet.encrypt(data) returns a bytes instance containing a base64 encoded "Fernet token".
Conversely, the first step Fernet.decrypt(token) takes is decoding the token by calling base64.urlsafe_b64decode(). This function uses the default non-validating behavior in which characters not within the base64 set are discarded (described here).
Note: This is why the answer from TheTS happens to work despite leaving the extraneous newline character intact.
Solution
I'd recommend making sure you provide Fernet.decrypt() the token exactly as produced by Fernet.encrypt(). I'm guessing this is what you were trying to do by stripping the last two characters.
Here's an approach that should be safe and not platform dependent.
When you call open() for writing, provide the newline='\n' argument to prevent the default behavior of converting instances of '\n' to the platform dependent os.linesep value (in the section describing the newline argument, see the second bullet point detailing how the argument applies when writing files).
When processing each line, use rstrip('\n') to remove the expected trailing newline.
Here's a code example that demonstrates this:
#!/usr/bin/python3
from cryptography import fernet
to_encrypt = ['Hello1', 'Hello2']
output_file = 'test.txt'
key = 'WqSAOfEoOdSP0c6i1CiyoOpTH2Gma3ff_G3BpDx52sE='
crypt = fernet.Fernet(key)
print("ENCRYPTING...")
for data in to_encrypt:
data_bytes = data.encode('utf-8')
token_bytes = crypt.encrypt(data_bytes)
print(f'data: {data}')
print(f'token_bytes: {token_bytes}\n')
with open(output_file, 'a', newline='\n') as f:
f.write(token_bytes.decode('utf-8') + '\n')
print("\nDECRYPTING...")
with open(output_file, 'r') as f:
for line in f:
# Create a copy of line which shows the trailing newline.
line_escaped = line.encode('unicode_escape').decode('utf-8')
line_stripped = line.rstrip('\n')
token_bytes = line_stripped.encode('utf-8')
data = crypt.decrypt(token_bytes).decode('utf-8')
print(f'line_escaped: {line_escaped}')
print(f'token_bytes: {token_bytes}')
print(f'decrypted data: {data}\n')
Output:
Note the trailing newline when line escaped is printed.
$ python3 solution.py
ENCRYPTING...
data: Hello1
token_bytes: b'gAAAAABi-LAo-h8w-ayc267hrLbswMZtkT4RQQ9wt0EusYNrZGjuzbpyRLoKDZZF4oQPOU-iH1PnCc7vSIOoTVMLlCFnHTkN6A=='
data: Hello2
token_bytes: b'gAAAAABi-LAoHUT8Iu1bVMcGSIrFRvtVZQFh4O52XYSCgd0leYWS-n38irhv3Ch7oEx6SXazHwAL7a57ncFoMJTQQAms52yf3w=='
DECRYPTING...
line_escaped: gAAAAABi-LAo-h8w-ayc267hrLbswMZtkT4RQQ9wt0EusYNrZGjuzbpyRLoKDZZF4oQPOU-iH1PnCc7vSIOoTVMLlCFnHTkN6A==\n
token_bytes: b'gAAAAABi-LAo-h8w-ayc267hrLbswMZtkT4RQQ9wt0EusYNrZGjuzbpyRLoKDZZF4oQPOU-iH1PnCc7vSIOoTVMLlCFnHTkN6A=='
decrypted data: Hello1
line_escaped: gAAAAABi-LAoHUT8Iu1bVMcGSIrFRvtVZQFh4O52XYSCgd0leYWS-n38irhv3Ch7oEx6SXazHwAL7a57ncFoMJTQQAms52yf3w==\n
token_bytes: b'gAAAAABi-LAoHUT8Iu1bVMcGSIrFRvtVZQFh4O52XYSCgd0leYWS-n38irhv3Ch7oEx6SXazHwAL7a57ncFoMJTQQAms52yf3w=='
decrypted data: Hello2
from cryptography import fernet
key = 'WqSAOfEoOdSP0c6i1CiyoOpTH2Gma3ff_G3BpDx52sE='
crypt_obj = fernet.Fernet(key)
file_handle = open('test.txt', 'a')
data = 'Hello1'
data = crypt_obj.encrypt(data.encode('utf-8'))
file_handle.write(data.decode('utf-8') + '\n')
file_handle.close()
file_handle_two = open('test.txt', 'a')
data_two = 'Hello2'
data_two = crypt_obj.encrypt(data_two.encode('utf-8'))
file_handle_two.write(data_two.decode('utf-8') + '\n')
file_handle_two.close()
file_read = open('test.txt', 'r')
file_lines = file_read.readlines()
file_content = ''
for line in file_lines:
# line = line[:-2]
file_content = crypt_obj.decrypt(line.encode('utf-8')).decode()
print(file_content)
file_read.close()
By removing the last characters from the string you also remove important characters for decoding.
Related
I am stuck on this revision exercise which asks to copy an input file to an output file and return the first and last letters.
def copy_file(filename):
input_file = open(filename, "r")
content = input_file.read()
content[0]
content[1]
return content[0] + content[-1]
input_file.close()
Why do I get an error message which I try get the first and last letters? And how would I copy the file to the output file?
Here is the test:
input_f = "FreeAdvice.txt"
first_last_chars = copy_file(input_f)
print(first_last_chars)
print_content('cure737.txt')
Error Message:
FileNotFoundError: [Errno 2] No such file or directory: 'hjac737(my username).txt'
All the code after a return statement is never executed, a proper code editor would highlight it to you, so I recommend you use one. So the file was never closed. A good practice is to use a context manager for that : it will automatically call close for you, even in case of an exception, when you exit the scope (indentation level).
The code you provided also miss to write the file content, which may be causing the error you reported.
I explicitely used the "rt" (and "wt") mode for the files (althought they are defaults), because we want the first and last character of the file, so it supports Unicode (any character, not just ASCII).
def copy_file(filename):
with open(filename, "rt") as input_file:
content = input_file.read()
print(input_file.closed) # True
my_username = "LENORMJU"
output_file_name = my_username + ".txt"
with open(output_file_name, "wt") as output_file:
output_file.write(content)
print(output_file.closed) # True
# last: return the result
return content[0] + content[-1]
print(copy_file("so67730842.py"))
When I run this script (on itself), the file is copied and I get the output d) which is correct.
I am using this code to find a string in Python:
buildSucceeded = "Build succeeded."
datafile = r'C:\PowerBuild\logs\Release\BuildAllPart2.log'
with open(datafile, 'r') as f:
for line in f:
if buildSucceeded in line:
print(line)
I am quite sure there is the string in the file although it does not return anything.
If I just print one line by line it returns a lot of 'NUL' characters between each "valid" character.
EDIT 1:
The problem was the encoding of Windows. I changed the encoding following this post and it worked: Why doesn't Python recognize my utf-8 encoded source file?
Anyway the file looks like this:
Line 1.
Line 2.
...
Build succeeded.
0 Warning(s)
0 Error(s)
...
I am currently testing with Sublime for Windows editor - which outputs a 'NUL' character between each "real" character which is very odd.
Using python command line I have this output:
C:\Dev>python readFile.py
Traceback (most recent call last):
File "readFile.py", line 7, in <module>
print(line)
File "C:\Program Files\Python35\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 1: character maps to <undefined>
Thanks for your help anyway...
If your file is not that big you can do a simple find. Otherwise I would check to file to see if you have the string in the file/ check the location for any spelling mistakes and try to narrow down the problem.
f = open(datafile, 'r')
lines = f.read()
answer = lines.find(buildSucceeded)
Also note that if it does not find the string answer would be -1.
As explained, the problem happening was related to encoding. In the below website there is a very good explanation on how to convert between files with one encoding to some other.
I used the last example (with Python 3 which is my case) it worked as expected:
buildSucceeded = "Build succeeded."
datafile = 'C:\\PowerBuild\\logs\\Release\\BuildAllPart2.log'
# Open both input and output streams.
#input = open(datafile, "rt", encoding="utf-16")
input = open(datafile, "r", encoding="utf-16")
output = open("output.txt", "w", encoding="utf-8")
# Stream chunks of unicode data.
with input, output:
while True:
# Read a chunk of data.
chunk = input.read(4096)
if not chunk:
break
# Remove vertical tabs.
chunk = chunk.replace("\u000B", "")
# Write the chunk of data.
output.write(chunk)
with open('output.txt', 'r') as f:
for line in f:
if buildSucceeded in line:
print(line)
Source: http://blog.etianen.com/blog/2013/10/05/python-unicode-streams/
I am trying to save an image with python that is Base64 encoded. Here the string is to large to post but here is the image
And when received by python the last 2 characters are == although the string is not formatted so I do this
import base64
data = "data:image/png;base64," + photo_base64.replace(" ", "+")
And then I do this
imgdata = base64.b64decode(data)
filename = 'some_image.jpg' # I assume you have a way of picking unique filenames
with open(filename, 'wb') as f:
f.write(imgdata)
But this causes this error
Traceback (most recent call last):
File "/var/www/cgi-bin/save_info.py", line 83, in <module>
imgdata = base64.b64decode(data)
File "/usr/lib64/python2.7/base64.py", line 76, in b64decode
raise TypeError(msg)
TypeError: Incorrect padding
I also printed out the length of the string once the data:image/png;base64, has been added and the spaces replace with + and it has a length of 34354, I have tried a bunch of different images but all of them when I try to open the saved file say that the file is damaged.
What is happening and why is the file corrupt?
Thanks
EDIT
Here is some base64 that also failed
iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAMAAAAoLQ9TAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAADBQTFRFA6b1q Ci5/f2lt/9yu3 Y8v2cMpb1/DSJbz5i9R2NLwfLrWbw m T8I8////////SvMAbAAAABB0Uk5T////////////////////AOAjXRkAAACYSURBVHjaLI8JDgMgCAQ5BVG3//9t0XYTE2Y5BPq0IGpwtxtTP4G5IFNMnmEKuCopPKUN8VTNpEylNgmCxjZa2c1kafpHSvMkX6sWe7PTkwRX1dY7gdyMRHZdZ98CF6NZT2ecMVaL9tmzTtMYcwbP y3XeTgZkF5s1OSHwRzo1fkILgWC5R0X4BHYu7t/136wO71DbvwVYADUkQegpokSjwAAAABJRU5ErkJggg==
This is what I receive in my python script from the POST Request
Note I have not replace the spaces with +'s
There is no need to add data:image/png;base64, before, I tried using the code below, it works fine.
import base64
data = 'iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAMAAAAoLQ9TAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAADBQTFRFA6b1q Ci5/f2lt/9yu3 Y8v2cMpb1/DSJbz5i9R2NLwfLrWbw m T8I8////////SvMAbAAAABB0Uk5T////////////////////AOAjXRkAAACYSURBVHjaLI8JDgMgCAQ5BVG3//9t0XYTE2Y5BPq0IGpwtxtTP4G5IFNMnmEKuCopPKUN8VTNpEylNgmCxjZa2c1kafpHSvMkX6sWe7PTkwRX1dY7gdyMRHZdZ98CF6NZT2ecMVaL9tmzTtMYcwbP y3XeTgZkF5s1OSHwRzo1fkILgWC5R0X4BHYu7t/136wO71DbvwVYADUkQegpokSjwAAAABJRU5ErkJggg=='.replace(' ', '+')
imgdata = base64.b64decode(data)
filename = 'some_image.jpg' # I assume you have a way of picking unique filenames
with open(filename, 'wb') as f:
f.write(imgdata)
If you append data:image/png;base64, to data, then you get error. If You have this, you must replace it.
new_data = initial_data.replace('data:image/png;base64,', '')
I have a large file with one string per line. I want to to read this file, get the SHA1 hash value and save both the string and its hash to a file...
so Far I'm at just trying to read the large dictionary file...
DictionaryV = []
with open('Dictionary.txt','r') as inf:
for line in inf:
DictionaryV.append(eval(line))
print DicionaryV[0]
I wanted to print to see if anything loaded. I keep getting the following error
Traceback (most recent call last):
File "./script", line 7, in <module>
DictionaryV.append(eval(line))
File "<string>", line 1
!
^
SyntaxError: invalid syntax
These are the first few lines of the file I'm trying to read:
!
!elephant!
!!!
!!!!!
!!!!!!
!!!!!!!
!!!!!!!!
!!!!!!!!!!
!!!!!!1
!!!!!!888888
Don't call eval() when appending to the dict.
eval will just try to evaluate the string ! as a python expression which is not what you want.
DictionaryV.append(line)
You can also get all the lines with DictionaryV = inf.readlines() or DictionaryV = list(f).
Also if you are trying to get the SHA1 hash for each string, there is no need for a dictionary, you can compute the SHA1 using functions from hashlib:
No need to evaluate the string when you append it to the dictionary:
DictionaryV = []
with open('Dictionary.txt','r') as inf:
for line in inf:
DictionaryV.append(line)
print DicionaryV
Note that DictionaryV is a list, not a dictionary.
This code reads a text file line by line, strips any trailing white space from the end of the line, computes the SHA1 hash for that line, and then writes the line to the output file with the hexadecimal form of the SHA1 hash appended, with a single space separating the stripped line contents and its hash, and a newline after the hash.
Tested on Python 2.6.6, but it should run correctly on any later versions of Python, too.
from hashlib import sha1
iname = 'qdata'
oname = 'qdata_sha1'
with open(iname, 'r') as ifile:
with open(oname, 'w') as ofile:
for line in ifile:
line = line.rstrip()
digest = sha1(line).hexdigest()
ofile.write('{0} {1}\n'.format(line, digest))
Using the data given in the question as the contents of 'qdata', here's the contents of 'qdata_sha1':
! 0ab8318acaf6e678dd02e2b5c343ed41111b393d
!elephant! 750b8da9d4b0a1d2d472afdbec88d74d0d9c3736
!!! 9a7b006d203b362c8cef6da001685678fc1d463a
!!!!! 1227cb28ec9e51942b7dacc0d5453e10d975612f
!!!!!! bae598184569d68359358ff314765c82166f9dfd
!!!!!!! 9b8a410b57694951c5ca9405c741fcc7578af9b1
!!!!!!!! 4cca2690b6ba377b0ed0aae5c6bd746583f34cd6
!!!!!!!!!! f2f7e9980103b41cefff52cb41df97a157de8b40
!!!!!!1 a807638c63c996475e0d1c9bdd84deef9504f7ef
!!!!!!888888 ecd90d1f8bd89fab7001f21a15375f90cfc259c9
I have a big amount of files and parser. What I Have to do is strip all non utf-8 symbols and put data in mongodb.
Currently I have code like this.
with open(fname, "r") as fp:
for line in fp:
line = line.strip()
line = line.decode('utf-8', 'ignore')
line = line.encode('utf-8', 'ignore')
somehow I still get an error
bson.errors.InvalidStringData: strings in documents must be valid UTF-8:
1/b62010montecassianomcir\xe2\x86\x90ta0\xe2\x86\x90008923304320733/290066010401040101506055soccorin
I don't get it. Is there some simple way to do it?
UPD: seems like Python and Mongo don't agree about definition of Utf-8 Valid string.
Try below code line instead of last two lines. Hope it helps:
line=line.decode('utf-8','ignore').encode("utf-8")
For python 3, as mentioned in a comment in this thread, you can do:
line = bytes(line, 'utf-8').decode('utf-8', 'ignore')
The 'ignore' parameter prevents an error from being raised if any characters are unable to be decoded.
If your line is already a bytes object (e.g. b'my string') then you just need to decode it with decode('utf-8', 'ignore').
Example to handle no utf-8 characters
import string
test=u"\n\n\n\n\n\n\n\n\n\n\n\n\n\nHi <<First Name>>\nthis is filler text \xa325 more filler.\nadditilnal filler.\n\nyet more\xa0still more\xa0filler.\n\n\xa0\n\n\n\n\nmore\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nfiller.\x03\n\t\t\t\t\t\t almost there \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nthe end\n\n\n\n\n\n\n\n\n\n\n\n\n"
print ''.join(x for x in test if x in string.printable)
with open(fname, "r") as fp:
for line in fp:
line = line.strip()
line = line.decode('cp1252').encode('utf-8')