I have a string which is a SHA256 hash, and I want to pass it to a Python script which will convert it to a SHA256 object. If I do this:
my_hashed_string = // my hashed string here
m = hashlib.SHA256()
m.update( my_hashed_string )
it will just hash my hash. I don't want to hash twice...it's already been hashed. I just want python to parse my original hashed string as a hash object. How do I do this?
Unfortunately the hash alone isn't enough info to reconstruct the hash object. The hash algorithm itself is temporal, depending on both internal structures and further input in order to generate hashes for subsequent input; the hash itself is only a small piece of the cross section of the algorithm's data, and cannot alone be used to generate hashes of additional data.
Related
I'm using following code to get the keccak 256 hash:
import sha3
k = sha3.keccak_256()
k.update(b'age')
print (k.hexdigest())
How do I convert the keccak 256 hash value back to original string? (I'm ok with using any library as needed).
You can't convert the hash value back to the original string. The hash function is created in a way that it is infeasible to convert the hash value back to the original string.
What is the use of .digest() in this statement? Why do we use it ? I searched on google ( and documentation also) but still I am not able to figure it out.
train_hashes = [hashlib.sha1(x).digest() for x in train_dataset]
What I found is that it convert to string. Am I right or wrong?
The .digest() method returns the actual digest the hash is designed to produce.
It is a separate method because the hashing API is designed to accept data in multiple pieces:
hash = hashlib.sha1()
for chunk in large_amount_of_data:
hash.update(chunk)
final_digest = hash.digest()
The above code creates a hashing object without passing any initial data in, then uses the hash.update() method to put chunks of data in in a loop. This helps avoid having to all of the data into memory all at once, so you can hash anything between 1 byte and the entire Google index, if you ever had access to something that large.
If hashlib.sha1(x) produced the digest directly you could never add additional data to hash first. Moreover, there is also an alternative method of accessing the digest, as a hexadecimal string using the hash.hexdigest() method (equivalent to hash.digest().hex(), but more convenient).
The code you found uses the fact that the constructor of the hash object also accepts data; since that's the all of the data that you wanted to hash you can call .digest() immediately.
The module documentation covers it this way:
There is one constructor method named for each type of hash. All return a hash object with the same simple interface. For example: use sha256() to create a SHA-256 hash object. You can now feed this object with bytes-like objects (normally bytes) using the update() method. At any point you can ask it for the digest of the concatenation of the data fed to it so far using the digest() or hexdigest() methods.
(bold emphasis mine).
I'm trying to convert user access log into a pure binary format, which would require me to convert string into int using some hash method, and then the mapping relationship of "id -> string value" would be stored somewhere for further backward retrieve.
Since I'm using Python, in order to save some process time, instead of introducing hashlib to calculate hash, can I simply use
string_hash = id(intern(some_string))
as the hash method? Any basic difference to be aware of comparing to MD5 / SHA1? Is the probability of conflict obviously higher than MD5 / SHA1?
Doesn't work. id is not guaranteed to be consistent across interpreter executions; in CPython, it's the memory location of the object. Even if it were consistent, it doesn't have enough bytes for collision resistance. Why not just keep using the strings? ASCII or Unicode, strings can be serialized easily.
Using SHA1 to hash down larger size strings so that they can be used as a keys in a database.
Trying to produce a UUID-size string from the original string that is random enough and big enough to protect against collisions, but much smaller than the original string.
Not using this for anything security related.
Example:
# Take a very long string, hash it down to a smaller string behind the scenes and use
# the hashed key as the data base primary key instead
def _get_database_key(very_long_key):
return hashlib.sha1(very_long_key).digest()
Is SHA1 a good algorithm to be using for this purpose? Or is there something else that is more appropriate?
Python has a uuid library, based on RFC 4122.
The version that uses SHA1 is UUIDv5, so the code would be something like this:
import uuid
uuid.uuid5(uuid.NAMESPACE_OID, 'your string here')
Just out of curiosity, really... for example, in python,
hashlib.sha1("key" + "data").hexdigest() != hmac.new("key", "data", hashlib.sha1)
is there some logical distinction I'm missing between the two actions?
hashlib.sha1 gives you simply sha1 hash of content "keydata" that you give as a parameter (note that you are simply concatenating the two strings). The hmac call gives you keyed hash of the string "data" using string "key" as the key and sha1 as the hash function. The fundamental difference between the two calls are that the HMAC can only be reproduced if you know the key so you would also know something about who has generated the hmac. SHA1 can only be used to detect that content has not changed.
I found the answer in the manual.
https://en.wikipedia.org/wiki/Hmac#Design_principles