Python: Generating a MD5 Hash of a file using Hashlib - python

I am trying to generate hashes of files using hashlib inside Tkinter modules.
My goal:
Step 1:- Button (clicked), opens up a browser (click file you want a hash of).
Step 2:- Once file is chosen, choose output file (.txt) where the hash will be 'printed'.
Step 3:- Repeat and have no clashes.
from tkinter.filedialog import askopenfilename
import hashlib
def hashing():
hash = askopenfilename(title="Select file for Hashing")
savename = askopenfilename(title="Select output")
outputhash = open(savename, "w")
hash1 = open(hash, "r")
h = hashlib.md5()
print(h.hexdigest(), file=outputhash)
love.flush()
It 'works' to some extent, it allows an input file and output file to be selected. It prints the hash into the output file.
HOWEVER - If i choose ANY different file, i get the same hash everytime.
Im new to Python and its really stumping me.
Thanks in advance.
Thanks for all your comments.
I figured the problem and this is my new code:
from tkinter.filedialog import askopenfilename
import hashlib
def hashing():
hash = askopenfilename(title="Select file for Hashing")
savename = askopenfilename(title="Select output")
outputhash = open(savename, "w")
curfile = open(hash, "rb")
hasher = hashlib.md5()
buf = curfile.read()
hasher.update(buf)
print(hasher.hexdigest(), file=outputhash)
outputhash.flush()
This code works, You guys rock. :)

In your case you do the digest of the empty string and probably you get:
d41d8cd98f00b204e9800998ecf8427e
I used this method to digest, that is better for big files (see here).
md5 = hashlib.md5()
with open(File, "rb") as f:
for block in iter(lambda: f.read(128), ""):
md5.update(block)
print(md5.hexdigest())

A very simple way
from hashlib import md5
f=open("file.txt","r")
data=f.read()
f.close()
Hash=md5(data).hexdigest()
out=open("out.txt","w")
out.write(Hash)
out.close()

Related

Is it possible to name a cryptography.fernet generated key after a variable?

first post here, as to not waste your time, lets get right into it:
Is it possible to give a key generated by the cryptography.fernet module a name that is a earlier defined variable? Example:
# import required module
import os
from cryptography.fernet import Fernet
from tkinter import *
from tkinter.filedialog import askopenfilename
Tk().withdraw() # we don't want a full GUI, so keep the root window from appearing
filepath = askopenfilename() # show an "Open" dialog box and return the path to the selected file
var1 = (os.path.basename(filepath)) # cuts the filepath into a filename
# key generation
key = Fernet.generate_key()
# string the key in a file
with open('filekey.key','wb') as filekey:
filekey.write(key)
# opening key
with open('filekey.key', 'rb') as filekey:
key = filekey.read()
# using the generated key
fernet = Fernet(key)
# opening the original file to encrypt
with open(filepath, 'rb') as file:
original = file.read()
# encrypting the file
encrypted = fernet.encrypt(original)
# opening the file in write mode and
# writing the encrypted data
with open(filepath, 'wb') as encrypted_file:
encrypted_file.write(encrypted)
My goal is to give the generated key the output of the var1 variable as a name.
"Sure. open() just takes a string. You can make the string you want by concatenation. with open(var1 + '.key','wb')"
Answered by Mark M in the comments.

Python Image hashing

I'm currently trying to get a hash from an image in python, i have successfully done this and it works somewhat.
However, I have this issue:
Image1 and image2 end up having the same hash, even though they are different. I need a form of hashing which is more accurate and precise.
Image1 = Image1
Image2 = Image2
The hash for the images is: faf0761493939381
I am currently using from PIL import Image
import imagehash
And imagehash.average_hash
Code here
import os
from PIL import Image
import imagehash
def checkImage():
for filename in os.listdir('images//'):
hashedImage = imagehash.average_hash(Image.open('images//' + filename))
print(filename, hashedImage)
for filename in os.listdir('checkimage//'):
check_image = imagehash.average_hash(Image.open('checkimage//' + filename))
print(filename, check_image)
if check_image == hashedImage:
print("Same image")
else:
print("Not the same image")
print(hashedImage, check_image)
checkImage()
Try using hashlib. Just open the file and perform a hash.
import hashlib
# Simple solution
with open("image.extension", "rb") as f:
hash = hashlib.sha256(f.read()).hexdigest()
# General-purpose solution that can process large files
def file_hash(file_path):
# https://stackoverflow.com/questions/22058048/hashing-a-file-in-python
sha256 = hashlib.sha256()
with open(file_path, "rb") as f:
while True:
data = f.read(65536) # arbitrary number to reduce RAM usage
if not data:
break
sha256.update(data)
return sha256.hexdigest()
Thanks to AntonĂ­n Hoskovec for pointing out that it should be read binary (rb), not simple read (r)!
By default, imagehash checks if image files are nearly identical. The files you are comparing are more similar than they are not. If you want a more or less unique way of fingerprinting files you can use a different approach, such as employing a cryptographic hashing algorithm:
import hashlib
def get_hash(img_path):
# This function will return the `md5` checksum for any input image.
with open(img_path, "rb") as f:
img_hash = hashlib.md5()
while chunk := f.read(8192):
img_hash.update(chunk)
return img_hash.hexdigest()

Reccommended way to redirect file-like streams in Python?

I am writing a backup script for a sqlite database that changes very intermittently. Here's how it is now:
from bz2 import BZ2File
from datetime import datetime
from os.path import dirname, abspath, join
from hashlib import sha512
def backup_target_database(target):
backup_dir = dirname(abspath(target))
hash_file = join(backup_dir, 'last_hash')
new_hash = sha512(open(target, 'rb').read()).digest()
if new_hash != open(hash_file, 'rb').read():
fmt = '%Y%m%d-%H%M.sqlite3.bz2'
snapshot_file = join(backup_dir, datetime.now().strftime(fmt))
BZ2File(snapshot_file, 'wb').write(open(target, 'rb').read())
open(hash_file, 'wb').write(new_hash)
Currently the database weighs just shy of 20MB, so it's not that taxing when this runs and reads the whole file into memory (and do it twice when changes are detected), but I don't want to wait until this becomes a problem.
What is the proper way to do this sort of (to use Bashscript terminology) stream piping?
First, there's a duplication in your code (reading target file twice).
And you can use shutil.copyfileobj and hashlib.update for memory-efficient routine.
from bz2 import BZ2File
from datetime import datetime
from hashlib import sha512
from os.path import dirname, abspath, join
from shutil import copyfileobj
def backup_target_database(target_path):
backup_dir = dirname(abspath(target_path))
hash_path = join(backup_dir, 'last_hash')
old_hash = open(hash_path, 'rb').read()
hasher = sha512()
with open(target_path, 'rb') as target:
while True:
data = target.read(1024)
if not data:
break
hasher.update(data)
new_hash = hasher.digest()
if new_hash != old_hash:
fmt = '%Y%m%d-%H%M.sqlite3.bz2'
snapshot_path = join(backup_dir, datetime.now().strftime(fmt))
with open(target_path, 'rb') as target:
with BZ2File(snapshot_path, 'wb', compresslevel=9) as snapshot:
copyfileobj(target, snapshot)
(Note: I didn't test this code. If you have problem please notice me)

Error when trying to read and write multiple files

I modified the code based on the comments from experts in this thread. Now the script reads and writes all the individual files. The script reiterates, highlight and write the output. The current issue is, after highlighting the last instance of the search item, the script removes all the remaining contents after the last search instance in the output of each file.
Here is the modified code:
import os
import sys
import re
source = raw_input("Enter the source files path:")
listfiles = os.listdir(source)
for f in listfiles:
filepath = source+'\\'+f
infile = open(filepath, 'r+')
source_content = infile.read()
color = ('red')
regex = re.compile(r"(\b be \b)|(\b by \b)|(\b user \b)|(\bmay\b)|(\bmight\b)|(\bwill\b)|(\b's\b)|(\bdon't\b)|(\bdoesn't\b)|(\bwon't\b)|(\bsupport\b)|(\bcan't\b)|(\bkill\b)|(\betc\b)|(\b NA \b)|(\bfollow\b)|(\bhang\b)|(\bbelow\b)", re.I)
i = 0; output = ""
for m in regex.finditer(source_content):
output += "".join([source_content[i:m.start()],
"<strong><span style='color:%s'>" % color[0:],
source_content[m.start():m.end()],
"</span></strong>"])
i = m.end()
outfile = open(filepath, 'w+')
outfile.seek(0)
outfile.write(output)
print "\nProcess Completed!\n"
infile.close()
outfile.close()
raw_input()
The error message tells you what the error is:
No such file or directory: 'sample1.html'
Make sure the file exists. Or do a try statement to give it a default behavior.
The reason why you get that error is because the python script doesn't have any knowledge about where the files are located that you want to open.
You have to provide the file path to open it as I have done below. I have simply concatenated the source file path+'\\'+filename and saved the result in a variable named as filepath. Now simply use this variable to open a file in open().
import os
import sys
source = raw_input("Enter the source files path:")
listfiles = os.listdir(source)
for f in listfiles:
filepath = source+'\\'+f # This is the file path
infile = open(filepath, 'r')
Also there are couple of other problems with your code, if you want to open the file for both reading and writing then you have to use r+ mode. More over in case of Windows if you open a file using r+ mode then you may have to use file.seek() before file.write() to avoid an other issue. You can read the reason for using the file.seek() here.

working from .txt file to output sha256 hashes in Python

I have created a text file of randomly generated words, now i would like to write a script that can use that data to create sha256 hashes from those words...would prefer for hashes to be saved as a .txt file also, but in my failed attempt here I was simply trying to print them out. Any suggestions?
#!usr/bin/python
# Filename: doesnt_work
import os
import hashlib
with open("wordlist.txt","r") as f:
for line in f:
line = line.rstrip("\n")
m = sha256(line)
print(m.hexdigest())
Although you should have posted the exception, I'm guessing this would fix it
m = hashlib.sha256(line)
And there is an indentation problem
#!usr/bin/python
# Filename: doesnt_work
import os
import hashlib
with open("wl.txt","r") as f:
for line in f.readlines():
line = line.rstrip("\n")
m = hashlib.sha256(line)
print(m.hexdigest())

Categories

Resources