Python code for hashing byte values with md5 - python

Please accept this question from an inexperienced (and very enthusiastic) programmer who tries to learn:
I need to calculate the md5 hashes of every byte combination from 0x00 to 0xff. I tried to do this with Python, but I'm not sure how Python interprets my input. As said, I need to hash the byte values, not the character 'fa' or '00', but the values themselves.
Here is an example of one code that I have tested. The problems are that the output from bytes.fromhex show some of the hex numbers represented as ascii. I suppose then that the ascii-representation is hashed, not the byte-value. The second problem is that I'm unsure of how to use hashlib correctly so that the byte value is hashed.
import hashlib
# Global variables
HEX_VALUES = {0:"0",1:"1",2:"2",3:"3",4:"4",5:"5",6:"6",7:"7",8:"8",9:"9",10:"a",11:"b",12:"c",13:"d",14:"e",15:"f"}
# Helper function for converting decimal number to another base.
def dec_to_base(num,base):
exp = 0
list1 = []
while (num // base ** exp) > 0:
num2 = (num // base ** exp) % base
list1.insert(0,num2)
exp += 1
return list1
# Function for converting decimal to hex numbers.
def dec_to_hex(num):
ret_val = []
for x in dec_to_base(num,16):
x = HEX_VALUES[x]
ret_val.append(x)
ret_val_str = ''.join(ret_val)
ret_val_str_pad = ret_val_str.zfill(4)
# Returns the hex number as a string with four zero-padding.
return ret_val_str_pad
for i in range(1,65536):
hex_number = bytes.fromhex(dec_to_hex(i))
print(hex_number)
h = hashlib.md5(hex_number)
md5_hash = h.hexdigest()
print(md5_hash)
# Checks after TARGET STRING

md5 accepts bytes but not string.
md5(b'00').hexdigest() # 'b4b147bc522828731f1a016bfa72c073'
md5('00').hexdigest() # TypeError: Unicode-objects must be encoded before hashing
String must be encoded into bytes before passing to md5.
md5('00'.encode()).hexdigest() # 'b4b147bc522828731f1a016bfa72c073'
'00'.encode() == b'00' # True
b'\x30\x30' == b'00' # True
In above case, in C terms, you're passing byte array {0x30, 0x30} as argument. In your code, hex_number = bytes.fromhex(dec_to_hex(i)) returns size 2 bytes. Depending on your goal, you might not get what you want.
hex_number = bytes.fromhex(dec_to_hex(1)) # b'\x00\x01'
md5(b'\x00\x01').hexdigest() # '441077cc9e57554dd476bdfb8b8b8102'
md5(b'\x01').hexdigest() # '55a54008ad1ba589aa210d2629c1df41'

Related

How to convert a test message to an integer number to produce ciphertext in RSA

The function ConvertToInt(message) should convert a text message to an integer number so ciphertext in RSA can be produced using the formula M^e mod n. Here M is the message that must be encoded into a single number. Instead, my following function ConvertToInt returns an array with elements each of which is the ASCII value of the characters. So the result becomes a character by character encryption instead of a string.
Which is the proper way to convert the message to an integer and calculate the proper RSA encrypted result?
Here is my code:
def ConvertToInt(message):
l = len(message)
arra = []
i = 0
while(i<l):
j=ord(message[i])
arra.append(j)
i += 1
return arra
def mod_ex(b, k, m):
i = 1
j = 0
while(j<=k):
b = (b*i) % m
i = b
j += 1
return b
def PowMod(s,modulo,exponent):
bin_e = bin(exponent)
bin_e = bin_e [::-1]
ln = len(bin_e)
result = 1
slen = len(s)
for i in range(0,slen,+1):
for j in range(0,ln-2,+1):
if(bin_e[j]=='1'):
result *= mod_ex(s[i],j,modulo)
s[i] = result%modulo
result = 1
return s
def Encrypt(message, modulo, exponent):
s = ConvertToInt(message)
return PowMod(s, modulo, exponent)
x = Encrypt("Aa",473,17)
print(x)
Here is a ConvertToInt function that effectively calculated a number out of a string which can be used in RSA Encryption, cz RSA encryption must need a number to operate.
def ConvertToInt(message):
grd = 1
num = 0
message = message [::-1]
for i in range(0,len(message),+1):
num = num+ord(message[i])*grd
grd *= 256
return num
Get the full RSA Code in python here:
https://github.com/Sazzad-Saju/RSA_Algorithm
I think it depends on your application. But, as we are using encryption for communicating often, it is better to use Encoding that is more compatible with communication systems. For example, Base64 is one of the popular encodings used for encryption schemes(either Symmetric or Asymmetric).

Randrange whitin static method

I wrote a method (in Python) get_16bit_error for generate a random integer of 16 bits. I'm trying to understand why this code always output "similar" numbers. For an instance when I ran this code two times I obtained 0x10000, 0x100000 or 0x800000, 0x8000000.
class Util(object):
#staticmethod
def get_16bit_error():
i = randrange(0, 16)
e = bin(2 ** i)[2:]
len_e = len(e)
e = "0"*(16 - len_e) + e
return int(e + "0"*(16), 2)
for i in range(2):
print hex(Util.get_16bit_error())
Bit is a binary digit, so 16 bits is 16 binary digits.
import random
class Util(object):
#staticmethod
def get_16bit_error():
string = ''
for i in range(16):
string += random.choice(['1', '0'])
return '0b' + string
the_binary = hex(Util.get_16bit_error())
in_decimal = print int(the_binary, 2)
print the_binary # or in_decimal
You're generating numbers of the form 2**(16 + randrange(0,16)). I'm guessing that's not what you want. Let's break it down:
i = randrange(0, 16)
i is a random number between 0 and 16. Good so far. Let's proceed assuming i = 3.
e = bin(2 ** i)[2:]
So now you've exponentiated 2 by i (in our example, 2**i = 8. Then you convert it to binary representation and take the back end of the string, so e = 1000.
len_e = len(e)
The length of e, which is 4 in our example.
e = "0"*(16 - len_e) + e
This essentially pads e with zeros so that it has length 16. In our example, e = '0000000000001000'.
return int(e + "0"*(16), 2)
First you add sixteen zeros to the end of the binary representation. This has the effect of taking what was 2**i and making it 2**(16 + i). You then convert it to an integer. That's why all your hex representations look the same.
If you want to generate a random 16 bit number, try this:
import random
def rand_16_bit_int():
return random.randrange(2**16)
Lastly, I'm not sure what the purpose of the Util class and the staticmethod is here... perhaps you're coming from Java where you have to put all your methods in classes. Not so in python. It's sufficient to define rand_16_bit_int as a free function in your module.

Write boolean string to binary file?

I have a string of booleans and I want to create a binary file using these booleans as bits. This is what I am doing:
# first append the string with 0s to make its length a multiple of 8
while len(boolString) % 8 != 0:
boolString += '0'
# write the string to the file byte by byte
i = 0
while i < len(boolString) / 8:
byte = int(boolString[i*8 : (i+1)*8], 2)
outputFile.write('%c' % byte)
i += 1
But this generates the output 1 byte at a time and is slow. What would be a more efficient way to do it?
It should be quicker if you calculate all your bytes first and then write them all together. For example
b = bytearray([int(boolString[x:x+8], 2) for x in range(0, len(boolString), 8)])
outputFile.write(b)
I'm also using a bytearray which is a natural container to use, and can also be written directly to your file.
You can of course use libraries if that's appropriate such as bitarray and bitstring. Using the latter you could just say
bitstring.Bits(bin=boolString).tofile(outputFile)
Here's another answer, this time using an industrial-strength utility function from the PyCrypto - The Python Cryptography Toolkit where, in version 2.6 (the current latest stable release), it's defined inpycrypto-2.6/lib/Crypto/Util/number.py.
The comments preceeding it say:
Improved conversion functions contributed by Barry Warsaw, after careful benchmarking
import struct
def long_to_bytes(n, blocksize=0):
"""long_to_bytes(n:long, blocksize:int) : string
Convert a long integer to a byte string.
If optional blocksize is given and greater than zero, pad the front of the
byte string with binary zeros so that the length is a multiple of
blocksize.
"""
# after much testing, this algorithm was deemed to be the fastest
s = b('')
n = long(n)
pack = struct.pack
while n > 0:
s = pack('>I', n & 0xffffffffL) + s
n = n >> 32
# strip off leading zeros
for i in range(len(s)):
if s[i] != b('\000')[0]:
break
else:
# only happens when n == 0
s = b('\000')
i = 0
s = s[i:]
# add back some pad bytes. this could be done more efficiently w.r.t. the
# de-padding being done above, but sigh...
if blocksize > 0 and len(s) % blocksize:
s = (blocksize - len(s) % blocksize) * b('\000') + s
return s
You can convert a boolean string to a long using data = long(boolString,2). Then to write this long to disk you can use:
while data > 0:
data, byte = divmod(data, 0xff)
file.write('%c' % byte)
However, there is no need to make a boolean string. It is much easier to use a long. The long type can contain an infinite number of bits. Using bit manipulation you can set or clear the bits as needed. You can then write the long to disk as a whole in a single write operation.
You can try this code using the array class:
import array
buffer = array.array('B')
i = 0
while i < len(boolString) / 8:
byte = int(boolString[i*8 : (i+1)*8], 2)
buffer.append(byte)
i += 1
f = file(filename, 'wb')
buffer.tofile(f)
f.close()
A helper class (shown below) makes it easy:
class BitWriter:
def __init__(self, f):
self.acc = 0
self.bcount = 0
self.out = f
def __del__(self):
self.flush()
def writebit(self, bit):
if self.bcount == 8 :
self.flush()
if bit > 0:
self.acc |= (1 << (7-self.bcount))
self.bcount += 1
def writebits(self, bits, n):
while n > 0:
self.writebit( bits & (1 << (n-1)) )
n -= 1
def flush(self):
self.out.write(chr(self.acc))
self.acc = 0
self.bcount = 0
with open('outputFile', 'wb') as f:
bw = BitWriter(f)
bw.writebits(int(boolString,2), len(boolString))
bw.flush()
Use the struct package.
This can be used in handling binary data stored in files or from network connections, among other sources.
Edit:
An example using ? as the format character for a bool.
import struct
p = struct.pack('????', True, False, True, False)
assert p == '\x01\x00\x01\x00'
with open("out", "wb") as o:
o.write(p)
Let's take a look at the file:
$ ls -l out
-rw-r--r-- 1 lutz lutz 4 Okt 1 13:26 out
$ od out
0000000 000001 000001
000000
Read it in again:
with open("out", "rb") as i:
q = struct.unpack('????', i.read())
assert q == (True, False, True, False)

Convert hex to binary

I have ABC123EFFF.
I want to have 001010101111000001001000111110111111111111 (i.e. binary repr. with, say, 42 digits and leading zeroes).
How?
For solving the left-side trailing zero problem:
my_hexdata = "1a"
scale = 16 ## equals to hexadecimal
num_of_bits = 8
bin(int(my_hexdata, scale))[2:].zfill(num_of_bits)
It will give 00011010 instead of the trimmed version.
import binascii
binary_string = binascii.unhexlify(hex_string)
Read
binascii.unhexlify
Return the binary data represented by the hexadecimal string specified as the parameter.
Convert hex to binary
I have ABC123EFFF.
I want to have 001010101111000001001000111110111111111111 (i.e. binary
repr. with, say, 42 digits and leading zeroes).
Short answer:
The new f-strings in Python 3.6 allow you to do this using very terse syntax:
>>> f'{0xABC123EFFF:0>42b}'
'001010101111000001001000111110111111111111'
or to break that up with the semantics:
>>> number, pad, rjust, size, kind = 0xABC123EFFF, '0', '>', 42, 'b'
>>> f'{number:{pad}{rjust}{size}{kind}}'
'001010101111000001001000111110111111111111'
Long answer:
What you are actually saying is that you have a value in a hexadecimal representation, and you want to represent an equivalent value in binary.
The value of equivalence is an integer. But you may begin with a string, and to view in binary, you must end with a string.
Convert hex to binary, 42 digits and leading zeros?
We have several direct ways to accomplish this goal, without hacks using slices.
First, before we can do any binary manipulation at all, convert to int (I presume this is in a string format, not as a literal):
>>> integer = int('ABC123EFFF', 16)
>>> integer
737679765503
alternatively we could use an integer literal as expressed in hexadecimal form:
>>> integer = 0xABC123EFFF
>>> integer
737679765503
Now we need to express our integer in a binary representation.
Use the builtin function, format
Then pass to format:
>>> format(integer, '0>42b')
'001010101111000001001000111110111111111111'
This uses the formatting specification's mini-language.
To break that down, here's the grammar form of it:
[[fill]align][sign][#][0][width][,][.precision][type]
To make that into a specification for our needs, we just exclude the things we don't need:
>>> spec = '{fill}{align}{width}{type}'.format(fill='0', align='>', width=42, type='b')
>>> spec
'0>42b'
and just pass that to format
>>> bin_representation = format(integer, spec)
>>> bin_representation
'001010101111000001001000111110111111111111'
>>> print(bin_representation)
001010101111000001001000111110111111111111
String Formatting (Templating) with str.format
We can use that in a string using str.format method:
>>> 'here is the binary form: {0:{spec}}'.format(integer, spec=spec)
'here is the binary form: 001010101111000001001000111110111111111111'
Or just put the spec directly in the original string:
>>> 'here is the binary form: {0:0>42b}'.format(integer)
'here is the binary form: 001010101111000001001000111110111111111111'
String Formatting with the new f-strings
Let's demonstrate the new f-strings. They use the same mini-language formatting rules:
>>> integer = 0xABC123EFFF
>>> length = 42
>>> f'{integer:0>{length}b}'
'001010101111000001001000111110111111111111'
Now let's put this functionality into a function to encourage reusability:
def bin_format(integer, length):
return f'{integer:0>{length}b}'
And now:
>>> bin_format(0xABC123EFFF, 42)
'001010101111000001001000111110111111111111'
Aside
If you actually just wanted to encode the data as a string of bytes in memory or on disk, you can use the int.to_bytes method, which is only available in Python 3:
>>> help(int.to_bytes)
to_bytes(...)
int.to_bytes(length, byteorder, *, signed=False) -> bytes
...
And since 42 bits divided by 8 bits per byte equals 6 bytes:
>>> integer.to_bytes(6, 'big')
b'\x00\xab\xc1#\xef\xff'
bin(int("abc123efff", 16))[2:]
>>> bin( 0xABC123EFFF )
'0b1010101111000001001000111110111111111111'
Use Built-in format() function and int() function
It's simple and easy to understand. It's little bit simplified version of Aaron answer
int()
int(string, base)
format()
format(integer, # of bits)
Example
# w/o 0b prefix
>> format(int("ABC123EFFF", 16), "040b")
1010101111000001001000111110111111111111
# with 0b prefix
>> format(int("ABC123EFFF", 16), "#042b")
0b1010101111000001001000111110111111111111
# w/o 0b prefix + 64bit
>> format(int("ABC123EFFF", 16), "064b")
0000000000000000000000001010101111000001001000111110111111111111
See also this answer
"{0:020b}".format(int('ABC123EFFF', 16))
Here's a fairly raw way to do it using bit fiddling to generate the binary strings.
The key bit to understand is:
(n & (1 << i)) and 1
Which will generate either a 0 or 1 if the i'th bit of n is set.
import binascii
def byte_to_binary(n):
return ''.join(str((n & (1 << i)) and 1) for i in reversed(range(8)))
def hex_to_binary(h):
return ''.join(byte_to_binary(ord(b)) for b in binascii.unhexlify(h))
print hex_to_binary('abc123efff')
>>> 1010101111000001001000111110111111111111
Edit: using the "new" ternary operator this:
(n & (1 << i)) and 1
Would become:
1 if n & (1 << i) or 0
(Which TBH I'm not sure how readable that is)
This is a slight touch up to Glen Maynard's solution, which I think is the right way to do it. It just adds the padding element.
def hextobin(self, hexval):
'''
Takes a string representation of hex data with
arbitrary length and converts to string representation
of binary. Includes padding 0s
'''
thelen = len(hexval)*4
binval = bin(int(hexval, 16))[2:]
while ((len(binval)) &lt thelen):
binval = '0' + binval
return binval
Pulled it out of a class. Just take out self, if you're working in a stand-alone script.
I added the calculation for the number of bits to fill to Onedinkenedi's solution. Here is the resulting function:
def hextobin(h):
return bin(int(h, 16))[2:].zfill(len(h) * 4)
Where 16 is the base you're converting from (hexadecimal), and 4 is how many bits you need to represent each digit, or log base 2 of the scale.
Replace each hex digit with the corresponding 4 binary digits:
1 - 0001
2 - 0010
...
a - 1010
b - 1011
...
f - 1111
hex --> decimal then decimal --> binary
#decimal to binary
def d2b(n):
bStr = ''
if n < 0: raise ValueError, "must be a positive integer"
if n == 0: return '0'
while n > 0:
bStr = str(n % 2) + bStr
n = n >> 1
return bStr
#hex to binary
def h2b(hex):
return d2b(int(hex,16))
# Python Program - Convert Hexadecimal to Binary
hexdec = input("Enter Hexadecimal string: ")
print(hexdec," in Binary = ", end="") # end is by default "\n" which prints a new line
for _hex in hexdec:
dec = int(_hex, 16) # 16 means base-16 wich is hexadecimal
print(bin(dec)[2:].rjust(4,"0"), end="") # the [2:] skips 0b, and the
Just use the module coden (note: I am the author of the module)
You can convert haxedecimal to binary there.
Install using pip
pip install coden
Convert
a_hexadecimal_number = "f1ff"
binary_output = coden.hex_to_bin(a_hexadecimal_number)
The converting Keywords are:
hex for hexadeimal
bin for binary
int for decimal
_to_ - the converting keyword for the function
So you can also format:
e. hexadecimal_output = bin_to_hex(a_binary_number)
Another way:
import math
def hextobinary(hex_string):
s = int(hex_string, 16)
num_digits = int(math.ceil(math.log(s) / math.log(2)))
digit_lst = ['0'] * num_digits
idx = num_digits
while s > 0:
idx -= 1
if s % 2 == 1: digit_lst[idx] = '1'
s = s / 2
return ''.join(digit_lst)
print hextobinary('abc123efff')
The binary version of ABC123EFFF is actually 1010101111000001001000111110111111111111
For almost all applications you want the binary version to have a length that is a multiple of 4 with leading padding of 0s.
To get this in Python:
def hex_to_binary( hex_code ):
bin_code = bin( hex_code )[2:]
padding = (4-len(bin_code)%4)%4
return '0'*padding + bin_code
Example 1:
>>> hex_to_binary( 0xABC123EFFF )
'1010101111000001001000111110111111111111'
Example 2:
>>> hex_to_binary( 0x7123 )
'0111000100100011'
Note that this also works in Micropython :)
i have a short snipped hope that helps :-)
input = 'ABC123EFFF'
for index, value in enumerate(input):
print(value)
print(bin(int(value,16)+16)[3:])
string = ''.join([bin(int(x,16)+16)[3:] for y,x in enumerate(input)])
print(string)
first i use your input and enumerate it to get each symbol. then i convert it to binary and trim from 3th position to the end. The trick to get the 0 is to add the max value of the input -> in this case always 16 :-)
the short form ist the join method. Enjoy.
HEX_TO_BINARY_CONVERSION_TABLE = {
'0': '0000',
'1': '0001',
'2': '0010',
'3': '0011',
'4': '0100',
'5': '0101',
'6': '0110',
'7': '0111',
'8': '1000',
'9': '1001',
'a': '1010',
'b': '1011',
'c': '1100',
'd': '1101',
'e': '1110',
'f': '1111'}
def hex_to_binary(hex_string):
binary_string = ""
for character in hex_string:
binary_string += HEX_TO_BINARY_CONVERSION_TABLE[character]
return binary_string
when I time hex_to_binary("123ade")
%timeit hex_to_binary("123ade")
here is the result:
316 ns ± 2.52 ns per loop
Alternatively, you could use "join" method:
def hex_to_binary_join(hex_string):
hex_array=[]
for character in hex_string:
hex_array.append(HEX_TO_BINARY_CONVERSION_TABLE[character])
return "".join(hex_array)
I timed this too:
%timeit hex_to_binary_join("123ade")
397 ns ± 4.64 ns per loop
a = raw_input('hex number\n')
length = len(a)
ab = bin(int(a, 16))[2:]
while len(ab)<(length * 4):
ab = '0' + ab
print ab
import binascii
hexa_input = input('Enter hex String to convert to Binary: ')
pad_bits=len(hexa_input)*4
Integer_output=int(hexa_input,16)
Binary_output= bin(Integer_output)[2:]. zfill(pad_bits)
print(Binary_output)
"""zfill(x) i.e. x no of 0 s to be padded left - Integers will overwrite 0 s
starting from right side but remaining 0 s will display till quantity x
[y:] where y is no of output chars which need to destroy starting from left"""
def conversion():
e=raw_input("enter hexadecimal no.:")
e1=("a","b","c","d","e","f")
e2=(10,11,12,13,14,15)
e3=1
e4=len(e)
e5=()
while e3<=e4:
e5=e5+(e[e3-1],)
e3=e3+1
print e5
e6=1
e8=()
while e6<=e4:
e7=e5[e6-1]
if e7=="A":
e7=10
if e7=="B":
e7=11
if e7=="C":
e7=12
if e7=="D":
e7=13
if e7=="E":
e7=14
if e7=="F":
e7=15
else:
e7=int(e7)
e8=e8+(e7,)
e6=e6+1
print e8
e9=1
e10=len(e8)
e11=()
while e9<=e10:
e12=e8[e9-1]
a1=e12
a2=()
a3=1
while a3<=1:
a4=a1%2
a2=a2+(a4,)
a1=a1/2
if a1<2:
if a1==1:
a2=a2+(1,)
if a1==0:
a2=a2+(0,)
a3=a3+1
a5=len(a2)
a6=1
a7=""
a56=a5
while a6<=a5:
a7=a7+str(a2[a56-1])
a6=a6+1
a56=a56-1
if a5<=3:
if a5==1:
a8="000"
a7=a8+a7
if a5==2:
a8="00"
a7=a8+a7
if a5==3:
a8="0"
a7=a8+a7
else:
a7=a7
print a7,
e9=e9+1
no=raw_input("Enter your number in hexa decimal :")
def convert(a):
if a=="0":
c="0000"
elif a=="1":
c="0001"
elif a=="2":
c="0010"
elif a=="3":
c="0011"
elif a=="4":
c="0100"
elif a=="5":
c="0101"
elif a=="6":
c="0110"
elif a=="7":
c="0111"
elif a=="8":
c="1000"
elif a=="9":
c="1001"
elif a=="A":
c="1010"
elif a=="B":
c="1011"
elif a=="C":
c="1100"
elif a=="D":
c="1101"
elif a=="E":
c="1110"
elif a=="F":
c="1111"
else:
c="invalid"
return c
a=len(no)
b=0
l=""
while b<a:
l=l+convert(no[b])
b+=1
print l

How to convert a string of bytes into an int?

How can I convert a string of bytes into an int in python?
Say like this: 'y\xcc\xa6\xbb'
I came up with a clever/stupid way of doing it:
sum(ord(c) << (i * 8) for i, c in enumerate('y\xcc\xa6\xbb'[::-1]))
I know there has to be something builtin or in the standard library that does this more simply...
This is different from converting a string of hex digits for which you can use int(xxx, 16), but instead I want to convert a string of actual byte values.
UPDATE:
I kind of like James' answer a little better because it doesn't require importing another module, but Greg's method is faster:
>>> from timeit import Timer
>>> Timer('struct.unpack("<L", "y\xcc\xa6\xbb")[0]', 'import struct').timeit()
0.36242198944091797
>>> Timer("int('y\xcc\xa6\xbb'.encode('hex'), 16)").timeit()
1.1432669162750244
My hacky method:
>>> Timer("sum(ord(c) << (i * 8) for i, c in enumerate('y\xcc\xa6\xbb'[::-1]))").timeit()
2.8819329738616943
FURTHER UPDATE:
Someone asked in comments what's the problem with importing another module. Well, importing a module isn't necessarily cheap, take a look:
>>> Timer("""import struct\nstruct.unpack(">L", "y\xcc\xa6\xbb")[0]""").timeit()
0.98822188377380371
Including the cost of importing the module negates almost all of the advantage that this method has. I believe that this will only include the expense of importing it once for the entire benchmark run; look what happens when I force it to reload every time:
>>> Timer("""reload(struct)\nstruct.unpack(">L", "y\xcc\xa6\xbb")[0]""", 'import struct').timeit()
68.474128007888794
Needless to say, if you're doing a lot of executions of this method per one import than this becomes proportionally less of an issue. It's also probably i/o cost rather than cpu so it may depend on the capacity and load characteristics of the particular machine.
In Python 3.2 and later, use
>>> int.from_bytes(b'y\xcc\xa6\xbb', byteorder='big')
2043455163
or
>>> int.from_bytes(b'y\xcc\xa6\xbb', byteorder='little')
3148270713
according to the endianness of your byte-string.
This also works for bytestring-integers of arbitrary length, and for two's-complement signed integers by specifying signed=True. See the docs for from_bytes.
You can also use the struct module to do this:
>>> struct.unpack("<L", "y\xcc\xa6\xbb")[0]
3148270713L
As Greg said, you can use struct if you are dealing with binary values, but if you just have a "hex number" but in byte format you might want to just convert it like:
s = 'y\xcc\xa6\xbb'
num = int(s.encode('hex'), 16)
...this is the same as:
num = struct.unpack(">L", s)[0]
...except it'll work for any number of bytes.
I use the following function to convert data between int, hex and bytes.
def bytes2int(str):
return int(str.encode('hex'), 16)
def bytes2hex(str):
return '0x'+str.encode('hex')
def int2bytes(i):
h = int2hex(i)
return hex2bytes(h)
def int2hex(i):
return hex(i)
def hex2int(h):
if len(h) > 1 and h[0:2] == '0x':
h = h[2:]
if len(h) % 2:
h = "0" + h
return int(h, 16)
def hex2bytes(h):
if len(h) > 1 and h[0:2] == '0x':
h = h[2:]
if len(h) % 2:
h = "0" + h
return h.decode('hex')
Source: http://opentechnotes.blogspot.com.au/2014/04/convert-values-to-from-integer-hex.html
import array
integerValue = array.array("I", 'y\xcc\xa6\xbb')[0]
Warning: the above is strongly platform-specific. Both the "I" specifier and the endianness of the string->int conversion are dependent on your particular Python implementation. But if you want to convert many integers/strings at once, then the array module does it quickly.
In Python 2.x, you could use the format specifiers <B for unsigned bytes, and <b for signed bytes with struct.unpack/struct.pack.
E.g:
Let x = '\xff\x10\x11'
data_ints = struct.unpack('<' + 'B'*len(x), x) # [255, 16, 17]
And:
data_bytes = struct.pack('<' + 'B'*len(data_ints), *data_ints) # '\xff\x10\x11'
That * is required!
See https://docs.python.org/2/library/struct.html#format-characters for a list of the format specifiers.
>>> reduce(lambda s, x: s*256 + x, bytearray("y\xcc\xa6\xbb"))
2043455163
Test 1: inverse:
>>> hex(2043455163)
'0x79cca6bb'
Test 2: Number of bytes > 8:
>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAAA"))
338822822454978555838225329091068225L
Test 3: Increment by one:
>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAAB"))
338822822454978555838225329091068226L
Test 4: Append one byte, say 'A':
>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAABA"))
86738642548474510294585684247313465921L
Test 5: Divide by 256:
>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAABA"))/256
338822822454978555838225329091068226L
Result equals the result of Test 4, as expected.
I was struggling to find a solution for arbitrary length byte sequences that would work under Python 2.x. Finally I wrote this one, it's a bit hacky because it performs a string conversion, but it works.
Function for Python 2.x, arbitrary length
def signedbytes(data):
"""Convert a bytearray into an integer, considering the first bit as
sign. The data must be big-endian."""
negative = data[0] & 0x80 > 0
if negative:
inverted = bytearray(~d % 256 for d in data)
return -signedbytes(inverted) - 1
encoded = str(data).encode('hex')
return int(encoded, 16)
This function has two requirements:
The input data needs to be a bytearray. You may call the function like this:
s = 'y\xcc\xa6\xbb'
n = signedbytes(s)
The data needs to be big-endian. In case you have a little-endian value, you should reverse it first:
n = signedbytes(s[::-1])
Of course, this should be used only if arbitrary length is needed. Otherwise, stick with more standard ways (e.g. struct).
int.from_bytes is the best solution if you are at version >=3.2.
The "struct.unpack" solution requires a string so it will not apply to arrays of bytes.
Here is another solution:
def bytes2int( tb, order='big'):
if order == 'big': seq=[0,1,2,3]
elif order == 'little': seq=[3,2,1,0]
i = 0
for j in seq: i = (i<<8)+tb[j]
return i
hex( bytes2int( [0x87, 0x65, 0x43, 0x21])) returns '0x87654321'.
It handles big and little endianness and is easily modifiable for 8 bytes
As mentioned above using unpack function of struct is a good way. If you want to implement your own function there is an another solution:
def bytes_to_int(bytes):
result = 0
for b in bytes:
result = result * 256 + int(b)
return result
In python 3 you can easily convert a byte string into a list of integers (0..255) by
>>> list(b'y\xcc\xa6\xbb')
[121, 204, 166, 187]
A decently speedy method utilizing array.array I've been using for some time:
predefined variables:
offset = 0
size = 4
big = True # endian
arr = array('B')
arr.fromstring("\x00\x00\xff\x00") # 5 bytes (encoding issues) [0, 0, 195, 191, 0]
to int: (read)
val = 0
for v in arr[offset:offset+size][::pow(-1,not big)]: val = (val<<8)|v
from int: (write)
val = 16384
arr[offset:offset+size] = \
array('B',((val>>(i<<3))&255 for i in range(size)))[::pow(-1,not big)]
It's possible these could be faster though.
EDIT:
For some numbers, here's a performance test (Anaconda 2.3.0) showing stable averages on read in comparison to reduce():
========================= byte array to int.py =========================
5000 iterations; threshold of min + 5000ns:
______________________________________code___|_______min______|_______max______|_______avg______|_efficiency
⣿⠀⠀⠀⠀⡇⢀⡀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⡀⠀⢰⠀⠀⠀⢰⠀⠀⠀⢸⠀⠀⢀⡇⠀⢀⠀⠀⠀⠀⢠⠀⠀⠀⠀⢰⠀⠀⠀⢸⡀⠀⠀⠀⢸⠀⡇⠀⠀⢠⠀⢰⠀⢸⠀
⣿⣦⣴⣰⣦⣿⣾⣧⣤⣷⣦⣤⣶⣾⣿⣦⣼⣶⣷⣶⣸⣴⣤⣀⣾⣾⣄⣤⣾⡆⣾⣿⣿⣶⣾⣾⣶⣿⣤⣾⣤⣤⣴⣼⣾⣼⣴⣤⣼⣷⣆⣴⣴⣿⣾⣷⣧⣶⣼⣴⣿⣶⣿⣶
val = 0 \nfor v in arr: val = (val<<8)|v | 5373.848ns | 850009.965ns | ~8649.64ns | 62.128%
⡇⠀⠀⢀⠀⠀⠀⡇⠀⡇⠀⠀⣠⠀⣿⠀⠀⠀⠀⡀⠀⠀⡆⠀⡆⢰⠀⠀⡆⠀⡄⠀⠀⠀⢠⢀⣼⠀⠀⡇⣠⣸⣤⡇⠀⡆⢸⠀⠀⠀⠀⢠⠀⢠⣿⠀⠀⢠⠀⠀⢸⢠⠀⡀
⣧⣶⣶⣾⣶⣷⣴⣿⣾⡇⣤⣶⣿⣸⣿⣶⣶⣶⣶⣧⣷⣼⣷⣷⣷⣿⣦⣴⣧⣄⣷⣠⣷⣶⣾⣸⣿⣶⣶⣷⣿⣿⣿⣷⣧⣷⣼⣦⣶⣾⣿⣾⣼⣿⣿⣶⣶⣼⣦⣼⣾⣿⣶⣷
val = reduce( shift, arr ) | 6489.921ns | 5094212.014ns | ~12040.269ns | 53.902%
This is a raw performance test, so the endian pow-flip is left out.
The shift function shown applies the same shift-oring operation as the for loop, and arr is just array.array('B',[0,0,255,0]) as it has the fastest iterative performance next to dict.
I should probably also note efficiency is measured by accuracy to the average time.

Categories

Resources