Related
I want to convert an 32-byte (although I might need other lengths) integer to a bytes object in python. Is there a clean and simple way to do this?
to_bytes(length, byteorder[, signed]) is all you need starting from 3.2. In this case, someidentifier.to_bytes(4,'big') should give you the bytes string you need.
I'm guessing you need a 32-bit integer, and big-endian to boot:
>>> from ctypes import c_uint32
>>> l = c_uint32(0x12345678)
>>> bytes(l)
b'xV4\x12'
There is c_uint8, c_uint16 and c_uint64 as well. For longer ints you need to make it manually, using divmod(x, 256).
>>> def bytify(v):
... v, r = divmod(v, 256)
... yield r
... if v == 0:
... raise StopIteration
... for r in bytify(v):
... yield r
...
>>> [x for x in bytify(0x12345678)]
[120, 86, 52, 18]
>>> bytes(bytify(0x12345678))
b'xV4\x12
>>> bytes(bytify(0x123456789098765432101234567890987654321))
b'!Ce\x87\t\x89gE#\x01!Ce\x87\t\x89gE#\x01'
You can use bytes("iterable") directly. Where every value in iterable will be specific byte in bytes(). Example for little endian encoding:
>>> var=0x12345678
>>> var_tuple=((var)&0xff, (var>>8)&0xff, (var>>16)&0xff, (var>>24)&0xff)
>>> bytes(var_tuple)
b'xV4\x12'
Suppose you have
var = 'і' # var is ukrainian і
We want to get binary from it.
Flow is this. value/which is string => bytes => int => binary
binary_var = '{:b}'.format(int.from_bytes(var.encode('utf-8'), byteorder='big'))
Now binary_var is '1101000110010110'. It type is string.
Now go back, you want get unicode value from binary:
int_var = int(binary_var, 2) # here we get int value, int_var = 53654
Now we need convert integer to bytes. Ukrainian 'і' is not gonna fit into 1 byte but in 2. We convert to actual bytes bytes_var = b'\xd1\x96'
bytes_var = int_var.to_bytes(2, byteorder='big')
Finally we decode our bytes.
ukr_i = bytes_var.decode('utf-8') # urk_i = 'і'
I have int in python that I want to reverse
x = int(1234567899)
I want to result will be 3674379849
explain : = 1234567899 = 0x499602DB and 3674379849 = 0xDB029649
How to do that in python ?
>>> import struct
>>> struct.unpack('>I', struct.pack('<I', 1234567899))[0]
3674379849
>>>
This converts the integer to a 4-byte array (I), then decodes it in reverse order (> vs <).
Documentation: struct
If you just want the result, use sabiks approach - if you want the intermediate steps for bragging rights, you would need to
create the hex of the number (#1) and maybe add a leading 0 for correctness
reverse it 2-byte-wise (#2)
create an integer again (#3)
f.e. like so
n = 1234567899
# 1
h = hex(n)
if len(h) % 2: # fix for uneven lengthy inputs (f.e. n = int("234",16))
h = '0x0'+h[2:]
# 2 (skips 0x and prepends 0x for looks only)
bh = '0x'+''.join([h[i: i+2] for i in range(2, len(h), 2)][::-1])
# 3
b = int(bh, 16)
print(n, h, bh, b)
to get
1234567899 0x499602db 0xdb029649 3674379849
I'm currently working on an encryption/decryption program and I need to be able to convert bytes to an integer. I know that:
bytes([3]) = b'\x03'
Yet I cannot find out how to do the inverse. What am I doing terribly wrong?
Assuming you're on at least 3.2, there's a built in for this:
int.from_bytes( bytes, byteorder, *, signed=False )
...
The argument bytes must either be a bytes-like object or an iterable
producing bytes.
The byteorder argument determines the byte order used to represent the
integer. If byteorder is "big", the most significant byte is at the
beginning of the byte array. If byteorder is "little", the most
significant byte is at the end of the byte array. To request the
native byte order of the host system, use sys.byteorder as the byte
order value.
The signed argument indicates whether two’s complement is used to
represent the integer.
## Examples:
int.from_bytes(b'\x00\x01', "big") # 1
int.from_bytes(b'\x00\x01', "little") # 256
int.from_bytes(b'\x00\x10', byteorder='little') # 4096
int.from_bytes(b'\xfc\x00', byteorder='big', signed=True) #-1024
Lists of bytes are subscriptable (at least in Python 3.6). This way you can retrieve the decimal value of each byte individually.
>>> intlist = [64, 4, 26, 163, 255]
>>> bytelist = bytes(intlist) # b'#\x04\x1a\xa3\xff'
>>> for b in bytelist:
... print(b) # 64 4 26 163 255
>>> [b for b in bytelist] # [64, 4, 26, 163, 255]
>>> bytelist[2] # 26
list() can be used to convert bytes to int (works in Python 3.7):
list(b'\x03\x04\x05')
[3, 4, 5]
int.from_bytes( bytes, byteorder, *, signed=False )
doesn't work with me
I used function from this website, it works well
https://coderwall.com/p/x6xtxq/convert-bytes-to-int-or-int-to-bytes-in-python
def bytes_to_int(bytes):
result = 0
for b in bytes:
result = result * 256 + int(b)
return result
def int_to_bytes(value, length):
result = []
for i in range(0, length):
result.append(value >> (i * 8) & 0xff)
result.reverse()
return result
In case of working with buffered data I found this useful:
int.from_bytes([buf[0],buf[1],buf[2],buf[3]], "big")
Assuming that all elements in buf are 8-bit long.
An old question that I stumbled upon while looking for an existing solution. Rolled my own and thought I'd share because it allows you to create a 32-bit integer from a list of bytes, specifying an offset.
def bytes_to_int(bList, offset):
r = 0
for i in range(4):
d = 32 - ((i + 1) * 8)
r += bList[offset + i] << d
return r
#convert bytes to int
def bytes_to_int(value):
return int.from_bytes(bytearray(value), 'little')
bytes_to_int(b'\xa231')
I am working with Python3.2. I need to take a hex stream as an input and parse it at bit-level. So I used
bytes.fromhex(input_str)
to convert the string to actual bytes. Now how do I convert these bytes to bits?
Another way to do this is by using the bitstring module:
>>> from bitstring import BitArray
>>> input_str = '0xff'
>>> c = BitArray(hex=input_str)
>>> c.bin
'0b11111111'
And if you need to strip the leading 0b:
>>> c.bin[2:]
'11111111'
The bitstring module isn't a requirement, as jcollado's answer shows, but it has lots of performant methods for turning input into bits and manipulating them. You might find this handy (or not), for example:
>>> c.uint
255
>>> c.invert()
>>> c.bin[2:]
'00000000'
etc.
What about something like this?
>>> bin(int('ff', base=16))
'0b11111111'
This will convert the hexadecimal string you have to an integer and that integer to a string in which each byte is set to 0/1 depending on the bit-value of the integer.
As pointed out by a comment, if you need to get rid of the 0b prefix, you can do it this way:
>>> bin(int('ff', base=16))[2:]
'11111111'
... or, if you are using Python 3.9 or newer:
>>> bin(int('ff', base=16)).removepreffix('0b')
'11111111'
Note: using lstrip("0b") here will lead to 0 integer being converted to an empty string. This is almost always not what you want to do.
Operations are much faster when you work at the integer level. In particular, converting to a string as suggested here is really slow.
If you want bit 7 and 8 only, use e.g.
val = (byte >> 6) & 3
(this is: shift the byte 6 bits to the right - dropping them. Then keep only the last two bits 3 is the number with the first two bits set...)
These can easily be translated into simple CPU operations that are super fast.
using python format string syntax
>>> mybyte = bytes.fromhex("0F") # create my byte using a hex string
>>> binary_string = "{:08b}".format(int(mybyte.hex(),16))
>>> print(binary_string)
00001111
The second line is where the magic happens. All byte objects have a .hex() function, which returns a hex string. Using this hex string, we convert it to an integer, telling the int() function that it's a base 16 string (because hex is base 16). Then we apply formatting to that integer so it displays as a binary string. The {:08b} is where the real magic happens. It is using the Format Specification Mini-Language format_spec. Specifically it's using the width and the type parts of the format_spec syntax. The 8 sets width to 8, which is how we get the nice 0000 padding, and the b sets the type to binary.
I prefer this method over the bin() method because using a format string gives a lot more flexibility.
I think simplest would be use numpy here. For example you can read a file as bytes and then expand it to bits easily like this:
Bytes = numpy.fromfile(filename, dtype = "uint8")
Bits = numpy.unpackbits(Bytes)
input_str = "ABC"
[bin(byte) for byte in bytes(input_str, "utf-8")]
Will give:
['0b1000001', '0b1000010', '0b1000011']
Here how to do it using format()
print "bin_signedDate : ", ''.join(format(x, '08b') for x in bytevector)
It is important the 08b . That means it will be a maximum of 8 leading zeros be appended to complete a byte. If you don't specify this then the format will just have a variable bit length for each converted byte.
To binary:
bin(byte)[2:].zfill(8)
Use ord when reading reading bytes:
byte_binary = bin(ord(f.read(1))) # Add [2:] to remove the "0b" prefix
Or
Using str.format():
'{:08b}'.format(ord(f.read(1)))
The other answers here provide the bits in big-endian order ('\x01' becomes '00000001')
In case you're interested in little-endian order of bits, which is useful in many cases, like common representations of bignums etc -
here's a snippet for that:
def bits_little_endian_from_bytes(s):
return ''.join(bin(ord(x))[2:].rjust(8,'0')[::-1] for x in s)
And for the other direction:
def bytes_from_bits_little_endian(s):
return ''.join(chr(int(s[i:i+8][::-1], 2)) for i in range(0, len(s), 8))
One line function to convert bytes (not string) to bit list. There is no endnians issue when source is from a byte reader/writer to another byte reader/writer, only if source and target are bit reader and bit writers.
def byte2bin(b):
return [int(X) for X in "".join(["{:0>8}".format(bin(X)[2:])for X in b])]
I came across this answer when looking for a way to convert an integer into a list of bit positions where the bitstring is equal to one. This becomes very similar to this question if you first convert your hex string to an integer like int('0x453', 16).
Now, given an integer - a representation already well-encoded in the hardware, I was very surprised to find out that the string variants of the above solutions using things like bin turn out to be faster than numpy based solutions for a single number, and I thought I'd quickly write up the results.
I wrote three variants of the function. First using numpy:
import math
import numpy as np
def bit_positions_numpy(val):
"""
Given an integer value, return the positions of the on bits.
"""
bit_length = val.bit_length() + 1
length = math.ceil(bit_length / 8.0) # bytelength
bytestr = val.to_bytes(length, byteorder='big', signed=True)
arr = np.frombuffer(bytestr, dtype=np.uint8, count=length)
bit_arr = np.unpackbits(arr, bitorder='big')
bit_positions = np.where(bit_arr[::-1])[0].tolist()
return bit_positions
Then using string logic:
def bit_positions_str(val):
is_negative = val < 0
if is_negative:
bit_length = val.bit_length() + 1
length = math.ceil(bit_length / 8.0) # bytelength
neg_position = (length * 8) - 1
# special logic for negatives to get twos compliment repr
max_val = 1 << neg_position
val_ = max_val + val
else:
val_ = val
binary_string = '{:b}'.format(val_)[::-1]
bit_positions = [pos for pos, char in enumerate(binary_string)
if char == '1']
if is_negative:
bit_positions.append(neg_position)
return bit_positions
And finally, I added a third method where I precomputed a lookuptable of the positions for a single byte and expanded that given larger itemsizes.
BYTE_TO_POSITIONS = []
pos_masks = [(s, (1 << s)) for s in range(0, 8)]
for i in range(0, 256):
positions = [pos for pos, mask in pos_masks if (mask & i)]
BYTE_TO_POSITIONS.append(positions)
def bit_positions_lut(val):
bit_length = val.bit_length() + 1
length = math.ceil(bit_length / 8.0) # bytelength
bytestr = val.to_bytes(length, byteorder='big', signed=True)
bit_positions = []
for offset, b in enumerate(bytestr[::-1]):
pos = BYTE_TO_POSITIONS[b]
if offset == 0:
bit_positions.extend(pos)
else:
pos_offset = (8 * offset)
bit_positions.extend([p + pos_offset for p in pos])
return bit_positions
The benchmark code is as follows:
def benchmark_bit_conversions():
# for val in [-0, -1, -3, -4, -9999]:
test_values = [
# -1, -2, -3, -4, -8, -32, -290, -9999,
# 0, 1, 2, 3, 4, 8, 32, 290, 9999,
4324, 1028, 1024, 3000, -100000,
999999999999,
-999999999999,
2 ** 32,
2 ** 64,
2 ** 128,
2 ** 128,
]
for val in test_values:
r1 = bit_positions_str(val)
r2 = bit_positions_numpy(val)
r3 = bit_positions_lut(val)
print(f'val={val}')
print(f'r1={r1}')
print(f'r2={r2}')
print(f'r3={r3}')
print('---')
assert r1 == r2
import xdev
xdev.profile_now(bit_positions_numpy)(val)
xdev.profile_now(bit_positions_str)(val)
xdev.profile_now(bit_positions_lut)(val)
import timerit
ti = timerit.Timerit(10000, bestof=10, verbose=2)
for timer in ti.reset('str'):
for val in test_values:
bit_positions_str(val)
for timer in ti.reset('numpy'):
for val in test_values:
bit_positions_numpy(val)
for timer in ti.reset('lut'):
for val in test_values:
bit_positions_lut(val)
for timer in ti.reset('raw_bin'):
for val in test_values:
bin(val)
for timer in ti.reset('raw_bytes'):
for val in test_values:
val.to_bytes(val.bit_length(), 'big', signed=True)
And it clearly shows the str and lookup table implementations are ahead of numpy. I tested this on CPython 3.10 and 3.11.
Timed str for: 10000 loops, best of 10
time per loop: best=20.488 µs, mean=21.438 ± 0.4 µs
Timed numpy for: 10000 loops, best of 10
time per loop: best=25.754 µs, mean=28.509 ± 5.2 µs
Timed lut for: 10000 loops, best of 10
time per loop: best=19.420 µs, mean=21.305 ± 3.8 µs
I have a file where the first byte contains encoded information. In Matlab I can read the byte bit by bit with var = fread(file, 8, 'ubit1'), and then retrieve each bit by var(1), var(2), etc.
Is there any equivalent bit reader in python?
Read the bits from a file, low bits first.
def bits(f):
bytes = (ord(b) for b in f.read())
for b in bytes:
for i in xrange(8):
yield (b >> i) & 1
for b in bits(open('binary-file.bin', 'r')):
print b
The smallest unit you'll be able to work with is a byte. To work at the bit level you need to use bitwise operators.
x = 3
#Check if the 1st bit is set:
x&1 != 0
#Returns True
#Check if the 2nd bit is set:
x&2 != 0
#Returns True
#Check if the 3rd bit is set:
x&4 != 0
#Returns False
With numpy it is easy like this:
Bytes = numpy.fromfile(filename, dtype = "uint8")
Bits = numpy.unpackbits(Bytes)
More info here:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfile.html
You won't be able to read each bit one by one - you have to read it byte by byte. You can easily extract the bits out, though:
f = open("myfile", 'rb')
# read one byte
byte = f.read(1)
# convert the byte to an integer representation
byte = ord(byte)
# now convert to string of 1s and 0s
byte = bin(byte)[2:].rjust(8, '0')
# now byte contains a string with 0s and 1s
for bit in byte:
print bit
Joining some of the previous answers I would use:
[int(i) for i in "{0:08b}".format(byte)]
For each byte read from the file. The results for an 0x88 byte example is:
>>> [int(i) for i in "{0:08b}".format(0x88)]
[1, 0, 0, 0, 1, 0, 0, 0]
You can assign it to a variable and work as per your initial request.
The "{0.08}" is to guarantee the full byte length
To read a byte from a file: bytestring = open(filename, 'rb').read(1). Note: the file is opened in the binary mode.
To get bits, convert the bytestring into an integer: byte = bytestring[0] (Python 3) or byte = ord(bytestring[0]) (Python 2) and extract the desired bit: (byte >> i) & 1:
>>> for i in range(8): (b'a'[0] >> i) & 1
...
1
0
0
0
0
1
1
0
>>> bin(b'a'[0])
'0b1100001'
There are two possible ways to return the i-th bit of a byte. The "first bit" could refer to the high-order bit or it could refer to the lower order bit.
Here is a function that takes a string and index as parameters and returns the value of the bit at that location. As written, it treats the low-order bit as the first bit. If you want the high order bit first, just uncomment the indicated line.
def bit_from_string(string, index):
i, j = divmod(index, 8)
# Uncomment this if you want the high-order bit first
# j = 8 - j
if ord(string[i]) & (1 << j):
return 1
else:
return 0
The indexing starts at 0. If you want the indexing to start at 1, you can adjust index in the function before calling divmod.
Example usage:
>>> for i in range(8):
>>> print i, bit_from_string('\x04', i)
0 0
1 0
2 1
3 0
4 0
5 0
6 0
7 0
Now, for how it works:
A string is composed of 8-bit bytes, so first we use divmod() to break the index into to parts:
i: the index of the correct byte within the string
j: the index of the correct bit within that byte
We use the ord() function to convert the character at string[i] into an integer type. Then, (1 << j) computes the value of the j-th bit by left-shifting 1 by j. Finally, we use bitwise-and to test if that bit is set. If so return 1, otherwise return 0.
Supposing you have a file called bloom_filter.bin which contains an array of bits and you want to read the entire file and use those bits in an array.
First create the array where the bits will be stored after reading,
from bitarray import bitarray
a=bitarray(size) #same as the number of bits in the file
Open the file,
using open or with, anything is fine...I am sticking with open here,
f=open('bloom_filter.bin','rb')
Now load all the bits into the array 'a' at one shot using,
f.readinto(a)
'a' is now a bitarray containing all the bits
This is pretty fast I would think:
import itertools
data = range(10)
format = "{:0>8b}".format
newdata = (False if n == '0' else True for n in itertools.chain.from_iterable(map(format, data)))
print(newdata) # prints tons of True and False
I think this is a more pythonic way:
a = 140
binary = format(a, 'b')
The result of this block is:
'10001100'
I was to get bit planes of the image and this function helped me to write this block:
def img2bitmap(img: np.ndarray) -> list:
if img.dtype != np.uint8 or img.ndim > 2:
raise ValueError("Image is not uint8 or gray")
bit_mat = [np.zeros(img.shape, dtype=np.uint8) for _ in range(8)]
for row_number in range(img.shape[0]):
for column_number in range(img.shape[1]):
binary = format(img[row_number][column_number], 'b')
for idx, bit in enumerate("".join(reversed(binary))[:]):
bit_mat[idx][row_number, column_number] = 2 ** idx if int(bit) == 1 else 0
return bit_mat
Also by this block, I was able to make primitives image from extracted bit planes
img = cv2.imread('test.jpg', cv2.IMREAD_GRAYSCALE)
out = img2bitmap(img)
original_image = np.zeros(img.shape, dtype=np.uint8)
for i in range(original_image.shape[0]):
for j in range(original_image.shape[1]):
for data in range(8):
x = np.array([original_image[i, j]], dtype=np.uint8)
data = np.array([data], dtype=np.uint8)
flag = np.array([0 if out[data[0]][i, j] == 0 else 1], dtype=np.uint8)
mask = flag << data[0]
x[0] = (x[0] & ~mask) | ((flag[0] << data[0]) & mask)
original_image[i, j] = x[0]