Python hash from file not as expected

Python hash from file not as expected - python

My problem is the following:
I want to create a little tool in Python that creates hash values for entered text or from files. I've created all necessary things, GUI, option to select between hash functions, everything is fine.
But when I was testing the program, I realized, that the from files generated hashes aren't the same as the ones given by most download pages. I was confused, downloaded some other hashing tools, they all gave me the same hash as provided on several websites, but my tool always give some other output.
The odd thing is, the hashes generated from "plain text" are in my and in all other tools identical.
The app uses wxPython, but I've extracted my hash function for hash creation from files:
import os, hashlib
path = "C:\file.txt" # Given from some open file dialog, valid file
text = ""
if os.path.isfile(path):
text_file = open(path, "r")
text = text_file.read()
text_file.close()
print hashlib.new("md5", text).hexdigest() # Could be any hash function
Quite simple, but doesn't work as expected.
It seems to work if there's no new line in the file (\n)?
But how to make it work with newline? It's like every file has more than one line.

It is a problem of quoting the backslash character, see https://docs.python.org/2/reference/lexical_analysis.html#literals. Use two backslashes to specify the file name. I would also recommend reading the file in binary mode. As a precaution, print the length of variable text to make sure the file was read.
import os, hashlib
path = "C:\\file.txt" # Given from some open file dialog, valid file
text = ""
if os.path.isfile(path):
text_file = open(path, "rb")
text = text_file.read()
text_file.close()
print len(text)
print hashlib.new("md5", text).hexdigest() # Could be any hash function

Try splitting text update and md5 object creation as below
import hashlib;
md5=hashlib.new('md5')
with open(filepath,'rb') as f:
for line in f:
md5.update(line)
return md5.hexdigest()

Related

Python is reading past the end of the file. Is this a security risk? [duplicate]

Started Python a week ago and I have some questions to ask about reading and writing to the same files. I've gone through some tutorials online but I am still confused about it. I can understand simple read and write files.
openFile = open("filepath", "r")
readFile = openFile.read()
print readFile
openFile = open("filepath", "a")
appendFile = openFile.write("\nTest 123")
openFile.close()
But, if I try the following I get a bunch of unknown text in the text file I am writing to. Can anyone explain why I am getting such errors and why I cannot use the same openFile object the way shown below.
# I get an error when I use the codes below:
openFile = open("filepath", "r+")
writeFile = openFile.write("Test abc")
readFile = openFile.read()
print readFile
openFile.close()
I will try to clarify my problems. In the example above, openFile is the object used to open file. I have no problems if I want write to it the first time. If I want to use the same openFile to read files or append something to it. It doesn't happen or an error is given. I have to declare the same/different open file object before I can perform another read/write action to the same file.
#I have no problems if I do this:
openFile = open("filepath", "r+")
writeFile = openFile.write("Test abc")
openFile2 = open("filepath", "r+")
readFile = openFile2.read()
print readFile
openFile.close()
I will be grateful if anyone can tell me what I did wrong here or is it just a Pythong thing. I am using Python 2.7. Thanks!

Updated Response:
This seems like a bug specific to Windows - http://bugs.python.org/issue1521491.
Quoting from the workaround explained at http://mail.python.org/pipermail/python-bugs-list/2005-August/029886.html
the effect of mixing reads with writes on a file open for update is
entirely undefined unless a file-positioning operation occurs between
them (for example, a seek()). I can't guess what
you expect to happen, but seems most likely that what you
intend could be obtained reliably by inserting
fp.seek(fp.tell())
between read() and your write().
My original response demonstrates how reading/writing on the same file opened for appending works. It is apparently not true if you are using Windows.
Original Response:
In 'r+' mode, using write method will write the string object to the file based on where the pointer is. In your case, it will append the string "Test abc" to the start of the file. See an example below:
>>> f=open("a","r+")
>>> f.read()
'Test abc\nfasdfafasdfa\nsdfgsd\n'
>>> f.write("foooooooooooooo")
>>> f.close()
>>> f=open("a","r+")
>>> f.read()
'Test abc\nfasdfafasdfa\nsdfgsd\nfoooooooooooooo'
The string "foooooooooooooo" got appended at the end of the file since the pointer was already at the end of the file.
Are you on a system that differentiates between binary and text files? You might want to use 'rb+' as a mode in that case.
Append 'b' to the mode to open the file in binary mode, on systems
that differentiate between binary and text files; on systems that
don’t have this distinction, adding the 'b' has no effect.
http://docs.python.org/2/library/functions.html#open

Every open file has an implicit pointer which indicates where data will be read and written. Normally this defaults to the start of the file, but if you use a mode of a (append) then it defaults to the end of the file. It's also worth noting that the w mode will truncate your file (i.e. delete all the contents) even if you add + to the mode.
Whenever you read or write N characters, the read/write pointer will move forward that amount within the file. I find it helps to think of this like an old cassette tape, if you remember those. So, if you executed the following code:
fd = open("testfile.txt", "w+")
fd.write("This is a test file.\n")
fd.close()
fd = open("testfile.txt", "r+")
print fd.read(4)
fd.write(" IS")
fd.close()
... It should end up printing This and then leaving the file content as This IS a test file.. This is because the initial read(4) returns the first 4 characters of the file, because the pointer is at the start of the file. It leaves the pointer at the space character just after This, so the following write(" IS") overwrites the next three characters with a space (the same as is already there) followed by IS, replacing the existing is.
You can use the seek() method of the file to jump to a specific point. After the example above, if you executed the following:
fd = open("testfile.txt", "r+")
fd.seek(10)
fd.write("TEST")
fd.close()
... Then you'll find that the file now contains This IS a TEST file..
All this applies on Unix systems, and you can test those examples to make sure. However, I've had problems mixing read() and write() on Windows systems. For example, when I execute that first example on my Windows machine then it correctly prints This, but when I check the file afterwards the write() has been completely ignored. However, the second example (using seek()) seems to work fine on Windows.
In summary, if you want to read/write from the middle of a file in Windows I'd suggest always using an explicit seek() instead of relying on the position of the read/write pointer. If you're doing only reads or only writes then it's pretty safe.
One final point - if you're specifying paths on Windows as literal strings, remember to escape your backslashes:
fd = open("C:\\Users\\johndoe\\Desktop\\testfile.txt", "r+")
Or you can use raw strings by putting an r at the start:
fd = open(r"C:\Users\johndoe\Desktop\testfile.txt", "r+")
Or the most portable option is to use os.path.join():
fd = open(os.path.join("C:\\", "Users", "johndoe", "Desktop", "testfile.txt"), "r+")
You can find more information about file IO in the official Python docs.

Reading and Writing happens where the current file pointer is and it advances with each read/write.
In your particular case, writing to the openFile, causes the file-pointer to point to the end of file. Trying to read from the end would result EOF.
You need to reset the file pointer, to point to the beginning of the file before through seek(0) before reading from it

You can read, modify and save to the same file in python but you have actually to replace the whole content in file, and to call before updating file content:
# set the pointer to the beginning of the file in order to rewrite the content
edit_file.seek(0)
I needed a function to go through all subdirectories of folder and edit content of the files based on some criteria, if it helps:
new_file_content = ""
for directories, subdirectories, files in os.walk(folder_path):
for file_name in files:
file_path = os.path.join(directories, file_name)
# open file for reading and writing
with io.open(file_path, "r+", encoding="utf-8") as edit_file:
for current_line in edit_file:
if condition in current_line:
# update current line
current_line = current_line.replace('john', 'jack')
new_file_content += current_line
# set the pointer to the beginning of the file in order to rewrite the content
edit_file.seek(0)
# delete actual file content
edit_file.truncate()
# rewrite updated file content
edit_file.write(new_file_content)
# empties new content in order to set for next iteration
new_file_content = ""
edit_file.close()

Python hashlib module producing strange results

I'm using the hashlib module to test a hypothesis about hash algorithms and I'm getting strange results. I check my results with the Windows fciv program. The workflow I'm using is this:
Gather the file and algorithm from the user.
Print out the original filename and hashed file using that algorithm.
Test the results with fciv in Windows.
Add a few bytes or a space character to the file.
Print out he new hashed file using the chosen algorithm.
Test the results with the updated file in fciv.
The problem is this:
When I use a .txt file, I am getting the different results as I expected from my program and from fciv. This works perfectly.
Here is the output:
Original Filename: example_docs\testDocument.txt
Original md5 Hash: 62bef8046d4bcbdc46ac81f5e4202fe7
Updated md5 Hash: 78a96b792cf2ea160db5e4823f4bf0c5
However, when I use an .mp4 video file, fciv shows a different hash, but my program does not.
Here is the output:
Original Filename: example_docs\testVideo.mp4
Original md5 Hash: 9a7dcb986e2e756dda60e851a0b03916
Updated md5 Hash: 9a7dcb986e2e756dda60e851a0b03916
It doesn't matter how many times I run my program, the hash remains the same in the output from my program, but fciv displays different results.
Here is my code snippet:
def getHash(filename, algorithm):
h = hashlib.new(algorithm)
h.update(filename)
return h.hexdigest()
print "Original Filename: {file}".format(file=args.file)
with open(args.file, "a+") as inFile:
h = getHash(inFile.read(), args.algorithm)
print "Original {hashname} Hash: {hashed_file}".format(hashname=args.algorithm, hashed_file=h)
with open(args.file, "a+") as inFile:
inFile.write(b'\x07\x08\x07') # Also worked with inFile.write(" ")
with open(args.file, "a+") as inFile:
h = getHash(inFile.read(), args.algorithm)
print "Updated {hashname} Hash: {hashed_file}".format(hashname=args.algorithm, hashed_file=h)
where args.algorithm is md5 and args.file is the user-provided filename.

Open your files always in binary mode with ab+. Otherwise Python on Windows will use text mode for what it thinks are text files.
But I do wonder why you would be using ab+ rather than rb+ if you intend to read the entire file as with ab+ the file pointer starts out at the end where as with rb+ it starts out at the beginning of the file.
See https://stackoverflow.com/a/23566951 for a nice list of the file modes.

Beginner Python: Reading and writing to the same file

Started Python a week ago and I have some questions to ask about reading and writing to the same files. I've gone through some tutorials online but I am still confused about it. I can understand simple read and write files.
openFile = open("filepath", "r")
readFile = openFile.read()
print readFile
openFile = open("filepath", "a")
appendFile = openFile.write("\nTest 123")
openFile.close()
But, if I try the following I get a bunch of unknown text in the text file I am writing to. Can anyone explain why I am getting such errors and why I cannot use the same openFile object the way shown below.
# I get an error when I use the codes below:
openFile = open("filepath", "r+")
writeFile = openFile.write("Test abc")
readFile = openFile.read()
print readFile
openFile.close()
I will try to clarify my problems. In the example above, openFile is the object used to open file. I have no problems if I want write to it the first time. If I want to use the same openFile to read files or append something to it. It doesn't happen or an error is given. I have to declare the same/different open file object before I can perform another read/write action to the same file.
#I have no problems if I do this:
openFile = open("filepath", "r+")
writeFile = openFile.write("Test abc")
openFile2 = open("filepath", "r+")
readFile = openFile2.read()
print readFile
openFile.close()
I will be grateful if anyone can tell me what I did wrong here or is it just a Pythong thing. I am using Python 2.7. Thanks!

Updated Response:
This seems like a bug specific to Windows - http://bugs.python.org/issue1521491.
Quoting from the workaround explained at http://mail.python.org/pipermail/python-bugs-list/2005-August/029886.html
the effect of mixing reads with writes on a file open for update is
entirely undefined unless a file-positioning operation occurs between
them (for example, a seek()). I can't guess what
you expect to happen, but seems most likely that what you
intend could be obtained reliably by inserting
fp.seek(fp.tell())
between read() and your write().
My original response demonstrates how reading/writing on the same file opened for appending works. It is apparently not true if you are using Windows.
Original Response:
In 'r+' mode, using write method will write the string object to the file based on where the pointer is. In your case, it will append the string "Test abc" to the start of the file. See an example below:
>>> f=open("a","r+")
>>> f.read()
'Test abc\nfasdfafasdfa\nsdfgsd\n'
>>> f.write("foooooooooooooo")
>>> f.close()
>>> f=open("a","r+")
>>> f.read()
'Test abc\nfasdfafasdfa\nsdfgsd\nfoooooooooooooo'
The string "foooooooooooooo" got appended at the end of the file since the pointer was already at the end of the file.
Are you on a system that differentiates between binary and text files? You might want to use 'rb+' as a mode in that case.
Append 'b' to the mode to open the file in binary mode, on systems
that differentiate between binary and text files; on systems that
don’t have this distinction, adding the 'b' has no effect.
http://docs.python.org/2/library/functions.html#open

Every open file has an implicit pointer which indicates where data will be read and written. Normally this defaults to the start of the file, but if you use a mode of a (append) then it defaults to the end of the file. It's also worth noting that the w mode will truncate your file (i.e. delete all the contents) even if you add + to the mode.
Whenever you read or write N characters, the read/write pointer will move forward that amount within the file. I find it helps to think of this like an old cassette tape, if you remember those. So, if you executed the following code:
fd = open("testfile.txt", "w+")
fd.write("This is a test file.\n")
fd.close()
fd = open("testfile.txt", "r+")
print fd.read(4)
fd.write(" IS")
fd.close()
... It should end up printing This and then leaving the file content as This IS a test file.. This is because the initial read(4) returns the first 4 characters of the file, because the pointer is at the start of the file. It leaves the pointer at the space character just after This, so the following write(" IS") overwrites the next three characters with a space (the same as is already there) followed by IS, replacing the existing is.
You can use the seek() method of the file to jump to a specific point. After the example above, if you executed the following:
fd = open("testfile.txt", "r+")
fd.seek(10)
fd.write("TEST")
fd.close()
... Then you'll find that the file now contains This IS a TEST file..
All this applies on Unix systems, and you can test those examples to make sure. However, I've had problems mixing read() and write() on Windows systems. For example, when I execute that first example on my Windows machine then it correctly prints This, but when I check the file afterwards the write() has been completely ignored. However, the second example (using seek()) seems to work fine on Windows.
In summary, if you want to read/write from the middle of a file in Windows I'd suggest always using an explicit seek() instead of relying on the position of the read/write pointer. If you're doing only reads or only writes then it's pretty safe.
One final point - if you're specifying paths on Windows as literal strings, remember to escape your backslashes:
fd = open("C:\\Users\\johndoe\\Desktop\\testfile.txt", "r+")
Or you can use raw strings by putting an r at the start:
fd = open(r"C:\Users\johndoe\Desktop\testfile.txt", "r+")
Or the most portable option is to use os.path.join():
fd = open(os.path.join("C:\\", "Users", "johndoe", "Desktop", "testfile.txt"), "r+")
You can find more information about file IO in the official Python docs.

Reading and Writing happens where the current file pointer is and it advances with each read/write.
In your particular case, writing to the openFile, causes the file-pointer to point to the end of file. Trying to read from the end would result EOF.
You need to reset the file pointer, to point to the beginning of the file before through seek(0) before reading from it

You can read, modify and save to the same file in python but you have actually to replace the whole content in file, and to call before updating file content:
# set the pointer to the beginning of the file in order to rewrite the content
edit_file.seek(0)
I needed a function to go through all subdirectories of folder and edit content of the files based on some criteria, if it helps:
new_file_content = ""
for directories, subdirectories, files in os.walk(folder_path):
for file_name in files:
file_path = os.path.join(directories, file_name)
# open file for reading and writing
with io.open(file_path, "r+", encoding="utf-8") as edit_file:
for current_line in edit_file:
if condition in current_line:
# update current line
current_line = current_line.replace('john', 'jack')
new_file_content += current_line
# set the pointer to the beginning of the file in order to rewrite the content
edit_file.seek(0)
# delete actual file content
edit_file.truncate()
# rewrite updated file content
edit_file.write(new_file_content)
# empties new content in order to set for next iteration
new_file_content = ""
edit_file.close()

Simple Python code to scramble descramble file contents

I'm looking for a small easy program to scramble and descramble a file's contents. The file is a zip file so it might contain any characters. Was thinking something like two's complement or something, but not sure how to do it in Python.
The reason for this is that my current employer has draconian internet and file installation laws and I would like to mail myself files without the mailer detecting the attachment as a zip file (which it does even if you rename it to .jpg).
I already have Python installed on my work machine.

You could try xor'ing the contents of the file with a specific value, creating an XOR-cipher, just be sure to read/write the file in binary mode. You'd have to test this out to see if this works for your purposes (it does fine with text files, and I don't see why it wouldn't work for binaries.)
Of course you have to use the same (key) value (for instance a character) to encode and decode - though you can use the same code for both of these operations.
Here's some code I recently put together which does just that:
import sys
def processData(filename, key):
with open(filename, 'rb') as inf, open(filename+'.xor', 'wb') as outf:
for line in inf:
line = ''.join([chr(ord(c)^ord(key)) for c in line])
outf.write(line)
if __name__ == '__main__':
if len(sys.argv) != 3:
print 'Usage: xor_it filename key'
print ' key = a single character'
else:
processData(sys.argv[1], sys.argv[2])

I have had success with stealthZip getting through e-mail and proxy restrictions!
However, if you do want to do it in python, you could try using pickle. Load the file in binary mode, pickle it, send it over, and unpickle it on the receiving end. I don't know how well pickle functions with binaries though.

You could also simply base64 encode the file. This will make your file larger, but has the benefit of being all ascii text (which you could then just paste in the email.)
Note this will overwrite the current file. It wouldn't be hard to change it to output to another file.
#b64.py
import base64
import sys
def help():
print """Usage: python b64.py -[de] file
-d\tdecode
-e\tencode"""
sys.exit(0)
def decode(filename):
with open(filename) as f:
bin = base64.b64decode(f.read())
with open(filename,'wb') as f:
f.write(bin)
def encode(filename):
with open(filename,'rb') as f:
text = base64.b64encode(f.read())
with open(filename,'w') as f:
f.write(text)
if len(sys.argv) != 3:
help()
if sys.argv[1] == "-d":
decode(sys.argv[2])
elif sys.argv[1] == "-e":
encode(sys.argv[2])
else:
help()

f.read coming up empty

I'm doing all this in the interpreter..
loc1 = '/council/council1'
file1 = open(loc1, 'r')
at this point i can do file1.read() and it prints the file's contents as a string to standard output
but if i add this..
string1 = file1.read()
string 1 comes back empty.. i have no idea what i could be doing wrong. this seems like the most basic thing!
if I go on to type file1.read() again, the output to standard output is just an empty string. so, somehow i am losing my file when i try to create a string with file1.read()

You can only read a file once. After that, the current read-position is at the end of the file.
If you add file1.seek(0) before you re-read it, you should be able to read the contents again. A better approach, however, is to read into a string the first time and then keep it in memory:
loc1 = '/council/council1'
file1 = open(loc1, 'r')
string1 = file1.read()
print string1

You do not lose it, you just move offset pointer to the end of file and try to read some more data. Since it is the end of the file, no more data is available and you get empty string. Try reopening file or seeking to zero position:
f.read()
f.seek(0)
f.read()

Using with is the best syntax to use because it closes the connection to the file after using it(since python 2.5):
with open('/council/council1', 'r') as input_file:
text = input_file.read()
print(text)

To quote the official documentation on read():
To read a file’s contents, call f.read(size)
When size is omitted or negative, the entire contents of the file will
be read and returned;
And the most relevant part:
If the end of the file has been reached, f.read() will return an empty
string ('').
Which means that if you use read() twice consecutively, it is expected that the second time you'll get an empty string. Either store it the first time or use f.seek(0) to go back to the start. Together, they provide a lower level API to give you greater control.
Besides using a context manager to automatically open and close the file, there's another way to read a whole text file, using pathlib, example below:
#!/usr/bin/env python3
from pathlib import Path
txt_file = Path("myfile.txt")
try:
content = txt_file.read_text()
except FileNotFoundError:
print("Could not find file")
else:
print(f"The content is: {content}")
print(f"I can also read again: {txt_file.read_text()}")
As you can see, you can call read_text() several times and you'll get the full content, no surprises. Of course you wouldn't want to do that in production code since read_text() opens and closes the file each time, it's still best to store it. I could recommend pathlib highly when dealing with files and file paths.
It's outside the scope, but it may be worth noting a difference when reading line by line. Unlike the file object obtained by open(), PosixPath returned by Path() is not iterable. The equivalent of:
with open('file.txt') as f:
for line in f:
print(line)
Would be something like:
for line in Path('file.txt').read_text().split('\n'):
print(line)
One advantage of the first approach, with open, is that the entire file is not read into memory at once.

make sure your location is correct. Do you actually have a directory called /council under your root directory (/) ?. also use, os.path.join() to create your path
loc1 = os.path.join("/path","dir1","dir2")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python hash from file not as expected - python

Try splitting text update and md5 object creation as below import hashlib; md5=hashlib.new('md5') with open(filepath,'rb') as f: for line in f: md5.update(line) return md5.hexdigest()

Related

Python is reading past the end of the file. Is this a security risk? [duplicate]

Python hashlib module producing strange results

Beginner Python: Reading and writing to the same file

Simple Python code to scramble descramble file contents

f.read coming up empty

Categories

Resources