CRC32 checksum in Python with hex input - python

I'm wanting to calculate the CRC32 checksum of a string of hex values in python. I found zlib.crc32(data) and binascii.crc32(data), but all the examples I found using these functions have 'data' as a string ('hello' for example). I want to pass hex values in as data and find the checksum. I've tried setting data as a hex value (0x18329a7e for example) and I get a TypeError: must be string or buffer, not int. The function evaluates when I make the hex value a string ('0x18329a7e' for example), but I don't think it's evaluating the correct checksum. Any help would be appreciated. Thanks!

I think you are looking for binascii.a2b_hex():
>>> binascii.crc32(binascii.a2b_hex('18329a7e'))
-1357533383

>>> import struct,binascii
>>> ncrc = lambda numVal: binascii.crc32(struct.pack('!I', numVal))
>>> ncrc(0x18329a7e)
-1357533383

Try converting the list of hex values to a string:
t = ['\x18', '\x32', '\x9a', '\x7e']
chksum = binascii.crc32(str(t))

Related

How can i convert this byte properly?

I have a byte array b'string\x01' that i need to format to string1. I need to do this for any "string", followed by a byte e.g, b'string\t' to string9. Why is my way not correctly working?
I have tried to get the x = b'string\x01', i am trying to turn into "string1".
So i need to remove the '\x01', s = str(x).split("g",1) and then byte_part = s[1].rstrip('\'') so i get "\x01" on its own, but the next problem is:
I am trying to convert this string to a byte, so i can use int.from_bytes(byte_part,'little') and get the correct integer result. e.g. \x01 = 1.
What is happening is i am converting the string to a bytearray bytearray(string, 'utf-8') which then gives me bytearray(b'\\x01') then using int.from_bytes() gives me the result for b'\\x01' is 825260124 instead of b'\x01' being 1 i am after.
The method you are looking for is ord().
ord('\x01') # the result is 1
Also, following would convert your string and return the last number.
ord(a.decode().split('string')[1])
hope this helps.

Generate ID from string in Python

I'm struggling a bit to generate ID of type integer for given string in Python.
I thought the built-it hash function is perfect but it appears that the IDs are too long sometimes. It's a problem since I'm limited to 64bits as maximum length.
My code so far: hash(s) % 10000000000.
The input string(s) which I can expect will be in range of 12-512 chars long.
Requirements are:
integers only
generated from provided string
ideally up to 10-12 chars long (I'll have ~5 million items only)
low probability of collision..?
I would be glad if someone can provide any tips / solutions.
I would do something like this:
>>> import hashlib
>>> m = hashlib.md5()
>>> m.update("some string")
>>> str(int(m.hexdigest(), 16))[0:12]
'120665287271'
The idea:
Calculate the hash of a string with MD5 (or SHA-1 or ...) in hexadecimal form (see module hashlib)
Convert the string into an integer and reconvert it to a String with base 10 (there are just digits in the result)
Use the first 12 characters of the string.
If characters a-f are also okay, I would do m.hexdigest()[0:12].
If you're not allowed to add extra dependency, you can continue using hash function in the following way:
>>> my_string = "whatever"
>>> str(hash(my_string))[1:13]
'460440266319'
NB:
I am ignoring 1st character as it may be the negative sign.
hash may return different values for same string, as PYTHONHASHSEED Value will change everytime you run your program. You may want to set it to some fixed value. Read here
encode utf-8 was needed for mine to work:
def unique_name_from_str(string: str, last_idx: int = 12) -> str:
"""
Generates a unique id name
refs:
- md5: https://stackoverflow.com/questions/22974499/generate-id-from-string-in-python
- sha3: https://stackoverflow.com/questions/47601592/safest-way-to-generate-a-unique-hash
(- guid/uiid: https://stackoverflow.com/questions/534839/how-to-create-a-guid-uuid-in-python?noredirect=1&lq=1)
"""
import hashlib
m = hashlib.md5()
string = string.encode('utf-8')
m.update(string)
unqiue_name: str = str(int(m.hexdigest(), 16))[0:last_idx]
return unqiue_name
see my ultimate-utils python library.

Convert decimal int to little endian string ('\x##\x##...')

I want to convert an integer value to a string of hex values, in little endian. For example, 5707435436569584000 would become '\x4a\xe2\x34\x4f\x4a\xe2\x34\x4f'.
All my googlefu is finding for me is hex(..) which gives me '0x4f34e24a4f34e180' which is not what I want.
I could probably manually split up that string and build the one I want but I'm hoping somone can point me to a better option.
You need to use the struct module:
>>> import struct
>>> struct.pack('<Q', 5707435436569584000)
'\x80\xe14OJ\xe24O'
>>> struct.pack('<Q', 5707435436569584202)
'J\xe24OJ\xe24O'
Here < indicates little-endian, and Q that we want to pack a unsigned long long (8 bytes).
Note that Python will use ASCII characters for any byte that falls within the printable ASCII range to represent the resulting bytestring, hence the 14OJ, 24O and J parts of the above result:
>>> struct.pack('<Q', 5707435436569584202).encode('hex')
'4ae2344f4ae2344f'
>>> '\x4a\xe2\x34\x4f\x4a\xe2\x34\x4f'
'J\xe24OJ\xe24O'
I know it is an old thread, but it is still useful. Here my two cents using python3:
hex_string = hex(5707435436569584202) # '0x4f34e24a4f34e180' as you said
bytearray.fromhex(hex_string[2:]).reverse()
So, the key is convert it to a bytearray and reverse it.
In one line:
bytearray.fromhex(hex(5707435436569584202)[2:])[::-1] # bytearray(b'J\xe24OJ\xe24O')
PS: You can treat "bytearray" data like "bytes" and even mix them with b'raw bytes'
Update:
As Will points in coments, you can also manage negative integers:
To make this work with negative integers you need to mask your input with your preferred int type output length. For example, -16 as a little endian uint32_t would be bytearray.fromhex(hex(-16 & (2**32-1))[2:])[::-1], which evaluates to bytearray(b'\xf0\xff\xff\xff')

How to convert hexadecimal string to bytes in Python?

I have a long Hex string that represents a series of values of different types. I need to convert this Hex String into bytes or bytearray so that I can extract each value from the raw data. How can I do this?
For example, the string "ab" should convert to the bytes b"\xab" or equivalent byte array. Longer example:
>>> # what to use in place of `convert` here?
>>> convert("8e71c61de6a2321336184f813379ec6bf4a3fb79e63cd12b")
b'\x8eq\xc6\x1d\xe6\xa22\x136\x18O\x813y\xeck\xf4\xa3\xfby\xe6<\xd1+'
Suppose your hex string is something like
>>> hex_string = "deadbeef"
Convert it to a bytearray (Python 3 and 2.7):
>>> bytearray.fromhex(hex_string)
bytearray(b'\xde\xad\xbe\xef')
Convert it to a bytes object (Python 3):
>>> bytes.fromhex(hex_string)
b'\xde\xad\xbe\xef'
Note that bytes is an immutable version of bytearray.
Convert it to a string (Python ≤ 2.7):
>>> hex_data = hex_string.decode("hex")
>>> hex_data
"\xde\xad\xbe\xef"
There is a built-in function in bytearray that does what you intend.
bytearray.fromhex("de ad be ef 00")
It returns a bytearray and it reads hex strings with or without space separator.
provided I understood correctly, you should look for binascii.unhexlify
import binascii
a='45222e'
s=binascii.unhexlify(a)
b=[ord(x) for x in s]
Assuming you have a byte string like so
"\x12\x45\x00\xAB"
and you know the amount of bytes and their type you can also use this approach
import struct
bytes = '\x12\x45\x00\xAB'
val = struct.unpack('<BBH', bytes)
#val = (18, 69, 43776)
As I specified little endian (using the '<' char) at the start of the format string the function returned the decimal equivalent.
0x12 = 18
0x45 = 69
0xAB00 = 43776
B is equal to one byte (8 bit) unsigned
H is equal to two bytes (16 bit) unsigned
More available characters and byte sizes can be found here
The advantages are..
You can specify more than one byte and the endian of the values
Disadvantages..
You really need to know the type and length of data your dealing with
You can use the Codecs module in the Python Standard Library, i.e.
import codecs
codecs.decode(hexstring, 'hex_codec')
You should be able to build a string holding the binary data using something like:
data = "fef0babe"
bits = ""
for x in xrange(0, len(data), 2)
bits += chr(int(data[x:x+2], 16))
This is probably not the fastest way (many string appends), but quite simple using only core Python.
A good one liner is:
byte_list = map(ord, hex_string)
This will iterate over each char in the string and run it through the ord() function. Only tested on python 2.6, not too sure about 3.0+.
-Josh

How can I unpack binary hex formatted data in Python?

Using the PHP pack() function, I have converted a string into a binary hex representation:
$string = md5(time); // 32 character length
$packed = pack('H*', $string);
The H* formatting means "Hex string, high nibble first".
To unpack this in PHP, I would simply use the unpack() function with the H* format flag.
How would I unpack this data in Python?
There's an easy way to do this with the binascii module:
>>> import binascii
>>> print binascii.hexlify("ABCZ")
'4142435a'
>>> print binascii.unhexlify("4142435a")
'ABCZ'
Unless I'm misunderstanding something about the nibble ordering (high-nibble first is the default… anything different is insane), that should be perfectly sufficient!
Furthermore, Python's hashlib.md5 objects have a hexdigest() method to automatically convert the MD5 digest to an ASCII hex string, so that this method isn't even necessary for MD5 digests. Hope that helps.
There's no corresponding "hex nibble" code for struct.pack, so you'll either need to manually pack into bytes first, like:
hex_string = 'abcdef12'
hexdigits = [int(x, 16) for x in hex_string]
data = ''.join(struct.pack('B', (high <<4) + low)
for high, low in zip(hexdigits[::2], hexdigits[1::2]))
Or better, you can just use the hex codec. ie.
>>> data = hex_string.decode('hex')
>>> data
'\xab\xcd\xef\x12'
To unpack, you can encode the result back to hex similarly
>>> data.encode('hex')
'abcdef12'
However, note that for your example, there's probably no need to take the round-trip through a hex representation at all when encoding. Just use the md5 binary digest directly. ie.
>>> x = md5.md5('some string')
>>> x.digest()
'Z\xc7I\xfb\xee\xc96\x07\xfc(\xd6f\xbe\x85\xe7:'
This is equivalent to your pack()ed representation. To get the hex representation, use the same unpack method above:
>>> x.digest().decode('hex')
'acbd18db4cc2f85cedef654fccc4a4d8'
>>> x.hexdigest()
'acbd18db4cc2f85cedef654fccc4a4d8'
[Edit]: Updated to use better method (hex codec)
In Python you use the struct module for this.
>>> from struct import *
>>> pack('hhl', 1, 2, 3)
'\x00\x01\x00\x02\x00\x00\x00\x03'
>>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)
>>> calcsize('hhl')
8
HTH

Categories

Resources