How do I manipulate bits in Python? - python

In C I could, for example, zero out bit #10 in a 32 bit unsigned value like so:
unsigned long value = 0xdeadbeef;
value &= ~(1<<10);
How do I do that in Python ?

Bitwise operations on Python ints work much like in C. The &, | and ^ operators in Python work just like in C. The ~ operator works as for a signed integer in C; that is, ~x computes -x-1.
You have to be somewhat careful with left shifts, since Python integers aren't fixed-width. Use bit masks to obtain the low order bits. For example, to do the equivalent of shift of a 32-bit integer do (x << 5) & 0xffffffff.

value = 0xdeadbeef
value &= ~(1<<10)

Some common bit operations that might serve as example:
def get_bit(value, n):
return ((value >> n & 1) != 0)
def set_bit(value, n):
return value | (1 << n)
def clear_bit(value, n):
return value & ~(1 << n)
Usage e.g.
>>> get_bit(5, 2)
True
>>> get_bit(5, 1)
False
>>> set_bit(5, 1)
7
>>> clear_bit(5, 2)
1
>>> clear_bit(7, 2)
3

Python has C style bit manipulation operators, so your example is literally the same in Python except without type keywords.
value = 0xdeadbeef
value &= ~(1 << 10)

You should also check out BitArray, which is a nice interface for dealing with sequences of bits.

Omit the 'unsigned long', and the semi-colons are not needed either:
value = 0xDEADBEEF
value &= ~(1<<10)
print value
"0x%08X" % value

Have you tried copying and pasting your code into the Python REPL to see what will happen?
>>> value = 0xdeadbeef
>>> value &= ~(1<<10)
>>> hex (value)
'0xdeadbaef'

If you're going to do a lot of bit manipulation ( and you care much more about readability rather than performance for your application ) then you may want to create an integer wrapper to enable slicing like in Verilog or VHDL:
import math
class BitVector:
def __init__(self,val):
self._val = val
def __setslice__(self,highIndx,lowIndx,newVal):
assert math.ceil(math.log(newVal)/math.log(2)) <= (highIndx-lowIndx+1)
# clear out bit slice
clean_mask = (2**(highIndx+1)-1)^(2**(lowIndx)-1)
self._val = self._val ^ (self._val & clean_mask)
# set new value
self._val = self._val | (newVal<<lowIndx)
def __getslice__(self,highIndx,lowIndx):
return (self._val>>lowIndx)&(2L**(highIndx-lowIndx+1)-1)
b = BitVector(0)
b[3:0] = 0xD
b[7:4] = 0xE
b[11:8] = 0xA
b[15:12] = 0xD
for i in xrange(0,16,4):
print '%X'%b[i+3:i]
Outputs:
D
E
A
D

a = int('00001111', 2)
b = int('11110000', 2)
bin(a & b)[2:].zfill(8)
bin(a | b)[2:].zfill(8)
bin(a << 2)[2:].zfill(8)
bin(a >> 2)[2:].zfill(8)
bin(a ^ b)[2:].zfill(8)
int(bin(a | b)[2:].zfill(8), 2)

Related

Python: Extracting bits from a byte

I'm reading a binary file in python and the documentation for the file format says:
Flag (in binary)Meaning
1 nnn nnnn Indicates that there is one data byte to follow
that is to be duplicated nnn nnnn (127 maximum)
times.
0 nnn nnnn Indicates that there are nnn nnnn bytes of image
data to follow (127 bytes maximum) and that
there are no duplications.
n 000 0000 End of line field. Indicates the end of a line
record. The value of n may be either zero or one.
Note that the end of line field is required and
that it is reflected in the length of line record
field mentioned above.
When reading the file I'm expecting the byte I'm at to return 1 nnn nnnn where the nnn nnnn part should be 50.
I've been able to do this using the following:
flag = byte >> 7
numbytes = int(bin(byte)[3:], 2)
But the numbytes calculation feels like a cheap workaround.
Can I do more bit math to accomplish the calculation of numbytes?
How would you approach this?
The classic approach of checking whether a bit is set, is to use binary "and" operator, i.e.
x = 10 # 1010 in binary
if x & 0b10: # explicitly: x & 0b0010 != 0
print('First bit is set')
To check, whether n^th bit is set, use the power of two, or better bit shifting
def is_set(x, n):
return x & 2 ** n != 0
# a more bitwise- and performance-friendly version:
return x & 1 << n != 0
is_set(10, 1) # 1 i.e. first bit - as the count starts at 0-th bit
>>> True
You can strip off the leading bit using a mask ANDed with a byte from file. That will leave you with the value of the remaining bits:
mask = 0b01111111
byte_from_file = 0b10101010
value = mask & byte_from_file
print bin(value)
>> 0b101010
print value
>> 42
I find the binary numbers easier to understand than hex when doing bit-masking.
EDIT: Slightly more complete example for your use case:
LEADING_BIT_MASK = 0b10000000
VALUE_MASK = 0b01111111
values = [0b10101010, 0b01010101, 0b0000000, 0b10000000]
for v in values:
value = v & VALUE_MASK
has_leading_bit = v & LEADING_BIT_MASK
if value == 0:
print "EOL"
elif has_leading_bit:
print "leading one", value
elif not has_leading_bit:
print "leading zero", value
If I read your description correctly:
if (byte & 0x80) != 0:
num_bytes = byte & 0x7F
there you go:
class ControlWord(object):
"""Helper class to deal with control words.
Bit setting and checking methods are implemented.
"""
def __init__(self, value = 0):
self.value = int(value)
def set_bit(self, bit):
self.value |= bit
def check_bit(self, bit):
return self.value & bit != 0
def clear_bit(self, bit):
self.value &= ~bit
Instead of int(bin(byte)[3:], 2), you could simply use: int(bin(byte>>1),2)
not sure I got you correctly, but if I did, this should do the trick:
>>> x = 154 #just an example
>>> flag = x >> 1
>>> flag
1
>>> nb = x & 127
>>> nb
26
You can do it like this:
def GetVal(b):
# mask off the most significant bit, see if it's set
flag = b & 0x80 == 0x80
# then look at the lower 7 bits in the byte.
count = b & 0x7f
# return a tuple indicating the state of the high bit, and the
# remaining integer value without the high bit.
return (flag, count)
>>> testVal = 50 + 0x80
>>> GetVal(testVal)
(True, 50)

Concatenate two 32 bit int to get a 64 bit long in Python

I want to generate 64 bits long int to serve as unique ID's for documents.
One idea is to combine the user's ID, which is a 32 bit int, with the Unix timestamp, which is another 32 bits int, to form an unique 64 bits long integer.
A scaled-down example would be:
Combine two 4-bit numbers 0010 and 0101 to form the 8-bit number 00100101.
Does this scheme make sense?
If it does, how do I do the "concatenation" of numbers in Python?
Left shift the first number by the number of bits in the second number, then add (or bitwise OR - replace + with | in the following examples) the second number.
result = (user_id << 32) + timestamp
With respect to your scaled-down example,
>>> x = 0b0010
>>> y = 0b0101
>>> (x << 4) + y
37
>>> 0b00100101
37
>>>
foo = <some int>
bar = <some int>
foobar = (foo << 32) + bar
This should do it:
(x << 32) + y
For the next guy (which was me in this case was me). Here is one way to do it in general (for the scaled down example):
def combineBytes(*args):
"""
given the bytes of a multi byte number combine into one
pass them in least to most significant
"""
ans = 0
for i, val in enumerate(args):
ans += (val << i*4)
return ans
for other sizes change the 4 to a 32 or whatever.
>>> bin(combineBytes(0b0101, 0b0010))
'0b100101'
None of the answers before this cover both merging and splitting the numbers. Splitting can be as much a necessity as merging.
NUM_BITS_PER_INT = 4 # Replace with 32, 48, 64, etc. as needed.
MAXINT = (1 << NUM_BITS_PER_INT) - 1
def merge(a, b):
c = (a << NUM_BITS_PER_INT) | b
return c
def split(c):
a = (c >> NUM_BITS_PER_INT) & MAXINT
b = c & MAXINT
return a, b
# Test
EXPECTED_MAX_NUM_BITS = NUM_BITS_PER_INT * 2
for a in range(MAXINT + 1):
for b in range(MAXINT + 1):
c = merge(a, b)
assert c.bit_length() <= EXPECTED_MAX_NUM_BITS
assert (a, b) == split(c)

Convert a Python int into a big-endian string of bytes

I have a non-negative int and I would like to efficiently convert it to a big-endian string containing the same data. For example, the int 1245427 (which is 0x1300F3) should result in a string of length 3 containing three characters whose byte values are 0x13, 0x00, and 0xf3.
My ints are on the scale of 35 (base-10) digits.
How do I do this?
In Python 3.2+, you can use int.to_bytes:
If you don't want to specify the size
>>> n = 1245427
>>> n.to_bytes((n.bit_length() + 7) // 8, 'big') or b'\0'
b'\x13\x00\xf3'
If you don't mind specifying the size
>>> (1245427).to_bytes(3, byteorder='big')
b'\x13\x00\xf3'
You can use the struct module:
import struct
print(struct.pack('>I', your_int))
'>I' is a format string. > means big endian and I means unsigned int. Check the documentation for more format chars.
This is fast and works for small and (arbitrary) large ints:
def Dump(n):
s = '%x' % n
if len(s) & 1:
s = '0' + s
return s.decode('hex')
print repr(Dump(1245427)) #: '\x13\x00\xf3'
Probably the best way is via the built-in struct module:
>>> import struct
>>> x = 1245427
>>> struct.pack('>BH', x >> 16, x & 0xFFFF)
'\x13\x00\xf3'
>>> struct.pack('>L', x)[1:] # could do it this way too
'\x13\x00\xf3'
Alternatively -- and I wouldn't usually recommend this, because it's mistake-prone -- you can do it "manually" by shifting and the chr() function:
>>> x = 1245427
>>> chr((x >> 16) & 0xFF) + chr((x >> 8) & 0xFF) + chr(x & 0xFF)
'\x13\x00\xf3'
Out of curiosity, why do you only want three bytes? Usually you'd pack such an integer into a full 32 bits (a C unsigned long), and use struct.pack('>L', 1245427) but skip the [1:] step?
def tost(i):
result = []
while i:
result.append(chr(i&0xFF))
i >>= 8
result.reverse()
return ''.join(result)
Single-source Python 2/3 compatible version based on #pts' answer:
#!/usr/bin/env python
import binascii
def int2bytes(i):
hex_string = '%x' % i
n = len(hex_string)
return binascii.unhexlify(hex_string.zfill(n + (n & 1)))
print(int2bytes(1245427))
# -> b'\x13\x00\xf3'
The shortest way, I think, is the following:
import struct
val = 0x11223344
val = struct.unpack("<I", struct.pack(">I", val))[0]
print "%08x" % val
This converts an integer to a byte-swapped integer.
Using the bitstring module:
>>> bitstring.BitArray(uint=1245427, length=24).bytes
'\x13\x00\xf3'
Note though that for this method you need to specify the length in bits of the bitstring you are creating.
Internally this is pretty much the same as Alex's answer, but the module has a lot of extra functionality available if you want to do more with your data.
Very easy with pwntools , the tools created for software hacking
(Un-ironically, I stumbled across this thread and tried solutions here, until I realised there exists conversion functionality in pwntools)
import pwntools
x2 = p32(x1)

Python: Set Bits Count (popcount)

Few blob's have been duplicated in my database(oracle 11g), performed XOR operations on the blob using UTL_RAW.BIT_XOR. After that i wanted to count the number of set bits in the binary string, so wrote the code above.
During a small experiment, i wanted to see what is the hex and the integer value produced and wrote this procedure..
SQL> declare
2
3 vblob1 blob;
4
5 BEGIN
6
7 select leftiriscode INTO vblob1 FROM irisdata WHERE irisid=1;
8
9 dbms_output.put_line(rawtohex(vblob1));
10
11
12 dbms_output.put_line(UTL_RAW.CAST_TO_binary_integer(vblob1));
13
14
15 END;
16 /
OUTPUT: HEXVALUE:
0F0008020003030D030C1D1C3C383C330A3311373724764C54496C0A6B029B84840547A341BBA83D
BB5FB9DE4CDE5EFE96E1FC6169438344D604681D409F9F9F3BC07EE0C4E0C033A23B37791F59F84F
F94E4F664E3072B0229DA09D9F0F1FC600C2E380D6988C198B39517D157E7D66FE675237673D3D28
3A016C01411003343C76740F710F0F4F8FE976E1E882C186D316A63C0C7D7D7D7D397F016101B043
0176C37E767C7E0C7D010C8302C2D3E4F2ACE42F8D3F3F367A46F54285434ABB61BDB53CBF6C7CC0
F4C1C3F349B3F7BEB30E4A0CFE1C85180DC338C2C1C6E7A5CE3104303178724CCC5F451F573F3B24
7F24052000202003291F130F1B0E070C0E0D0F0E0F0B0B07070F1E1B330F27073F3F272E2F2F6F7B
2F2E1F2E4F7EFF7EDF3EBF253F3D2F39BF3D7F7FFED72FF39FE7773DBE9DBFBB3FE7A76E777DF55C
5F5F7ADF7FBD7F6AFE7B7D1FBE7F7F7DD7F63FBFBF2D3B7F7F5F2F7F3D7F7D3B3F3B7FFF4D676F7F
5D9FAD7DD17F7F6F6F0B6F7F3F767F1779364737370F7D3F5F377F2F3D3F7F1F2FE7709FB7BCB77B
0B77CF1DF5BF1F7F3D3E4E7F197F571F7D7E3F7F7F7D7F6F4F75FF6F7ECE2FFF793EFFEDB7BDDD1F
FF3BCE3F7F3FBF3D6C7FFF7F7F4FAF7F6FFFFF8D7777BF3AE30FAEEEEBCF5FEEFEE75FFEACFFDF0F
DFFFF77FFF677F4FFF7F7F1B5F1F5F146F1F1E1B3B1F3F273303170F370E250B
INTEGER VALUE: 15
There was a variance between the hex code and the integer value produced, so used the following python code to check the actual integer value.
print int("0F0008020003030D030C1D1C3C383C330A3311373724764C54496C0A6B029B84840547A341BBA83D
BB5FB9DE4CDE5EFE96E1FC6169438344D604681D409F9F9F3BC07EE0C4E0C033A23B37791F59F84F
F94E4F664E3072B0229DA09D9F0F1FC600C2E380D6988C198B39517D157E7D66FE675237673D3D28
3A016C01411003343C76740F710F0F4F8FE976E1E882C186D316A63C0C7D7D7D7D397F016101B043
0176C37E767C7E0C7D010C8302C2D3E4F2ACE42F8D3F3F367A46F54285434ABB61BDB53CBF6C7CC0
F4C1C3F349B3F7BEB30E4A0CFE1C85180DC338C2C1C6E7A5CE3104303178724CCC5F451F573F3B24
7F24052000202003291F130F1B0E070C0E0D0F0E0F0B0B07070F1E1B330F27073F3F272E2F2F6F7B
2F2E1F2E4F7EFF7EDF3EBF253F3D2F39BF3D7F7FFED72FF39FE7773DBE9DBFBB3FE7A76E777DF55C
5F5F7ADF7FBD7F6AFE7B7D1FBE7F7F7DD7F63FBFBF2D3B7F7F5F2F7F3D7F7D3B3F3B7FFF4D676F7F
5D9FAD7DD17F7F6F6F0B6F7F3F767F1779364737370F7D3F5F377F2F3D3F7F1F2FE7709FB7BCB77B
0B77CF1DF5BF1F7F3D3E4E7F197F571F7D7E3F7F7F7D7F6F4F75FF6F7ECE2FFF793EFFEDB7BDDD1F
FF3BCE3F7F3FBF3D6C7FFF7F7F4FAF7F6FFFFF8D7777BF3AE30FAEEEEBCF5FEEFEE75FFEACFFDF0F
DFFFF77FFF677F4FFF7F7F1B5F1F5F146F1F1E1B3B1F3F273303170F370E250B",16)
Answer:
611951595100708231079693644541095422704525056339295086455197024065285448917042457
942011979060274412229909425184116963447100932992139876977824261789243946528467423
887840013630358158845039770703659333212332565531927875442166643379024991542726916
563271158141698128396823655639931773363878078933197184072343959630467756337300811
165816534945075483141582643531294791665590339000206551162697220540050652439977992
246472159627917169957822698172925680112854091876671868161705785698942483896808137
210721991100755736178634253569843464062494863175653771387230991126430841565373390
924951878267929443498220727531299945275045612499928105876210478958806304156695438
684335624641395635997624911334453040399012259638042898470872203581555352191122920
004010193837249388365999010692555403377045768493630826307316376698443166439386014
145858084176544890282148970436631175577000673079418699845203671050174181808397880
048734270748095682582556024378558289251964544327507321930196203199459115159756564
507340111030285226951393012863778670390172056906403480159339130447254293412506482
027099835944315172972281427649277354815211185293109925602315480350955479477144523
387689192243720928249121486221114300503766209279369960344185651810101969585926336
07333771272398091
To get the set-bit count I have written the following code in C:
int bitsoncount(unsigned x)
{
unsigned int b=0;
if(x > 1)
b=1;
while(x &= (x - 1))
b++;
return b;
}
When I tried the same code in python it did not work. I am new to python through curiosity I'm experimenting, excuse me if am wrong.
def bitsoncount(x):
b=0;
if(x>1):
b=1;
while(x &= (x-1)):
I get an error at the last line, need some help in resolving this and implementing the logic in python :-)
I was interested in checking out the set bits version in python after what i have seen!
Related question: Best algorithm to count the number of set bits in a 32-bit integer?
In Python 3.10+, there is int.bit_count():
>>> 123 .bit_count()
6
Python 2.6 or 3.0:
def bitsoncount(x):
return bin(x).count('1')
Example:
>>> x = 123
>>> bin(x)
'0b1111011'
>>> bitsoncount(x)
6
Or
Matt Howells's answer in Python:
def bitsoncount(i):
assert 0 <= i < 0x100000000
i = i - ((i >> 1) & 0x55555555)
i = (i & 0x33333333) + ((i >> 2) & 0x33333333)
return (((i + (i >> 4) & 0xF0F0F0F) * 0x1010101) & 0xffffffff) >> 24
Starting with Python 3.10 you can use int.bit_count():
x = 826151739
print(x.bit_count()) # 16
what you're looking for is called the Hamming Weight.
in python 2.6/3.0 it can be found rather easily with:
bits = sum( b == '1' for b in bin(x)[2:] )
What version of Python are you using?
First off, Python uses white space not semicolon's, so to start it should look something like this...
def bitsoncount(x):
b=0
while(x > 0):
x &= x - 1
b+=1
return b
The direct translation of your C algorithm is as follows:
def bitsoncount(x):
b = 0
while x > 0:
x &= x - 1
b += 1
return b
Maybe this is what you mean?
def bits_on_count(x):
b = 0
while x != 0:
if x & 1: # Last bit is a 1
b += 1
x >>= 1 # Shift the bits of x right
return b
There's also a way to do it simply in Python 3.0:
def bits_on_count(x):
return sum(c=='1' for c in bin(x))
This uses the fact that bin(x) gives a binary representation of x.
Try this module:
import sys
if sys.maxint < 2**32:
msb2= 2**30
else:
msb2= 2**62
BITS=[-msb2*2] # not converted into long
while msb2:
BITS.append(msb2)
msb2 >>= 1
def bitcount(n):
return sum(1 for b in BITS if b&n)
This should work for machine integers (depending on your OS and the Python version). It won't work for any long.
How do you like this one:
def bitsoncount(x):
b = 0
bit = 1
while bit <= x:
b += int(x & bit > 0)
bit = bit << 1
return b
Basically, you use a test bit that starts right and gets shifted all the way through up to the bit length of your in parameter. For each position the bit & x yields a single bit which is on, or none. Check > 0 and turn the resulting True|False into 1|0 with int(), and add this to the accumulator. Seems to work nicely for longs :-) .
How to count the number of 1-bits starting with Python 3.10: https://docs.python.org/3/library/stdtypes.html#int.bit_count
# int.bit_count()
n = 19
bin(n)
# '0b10011'
n.bit_count() # <-- this is how
# 3
(-n).bit_count()
# 3
Is equivalent to (as per page linked above), but more efficient than:
def bit_count(self):
return bin(self).count("1")

How to convert a string of bytes into an int?

How can I convert a string of bytes into an int in python?
Say like this: 'y\xcc\xa6\xbb'
I came up with a clever/stupid way of doing it:
sum(ord(c) << (i * 8) for i, c in enumerate('y\xcc\xa6\xbb'[::-1]))
I know there has to be something builtin or in the standard library that does this more simply...
This is different from converting a string of hex digits for which you can use int(xxx, 16), but instead I want to convert a string of actual byte values.
UPDATE:
I kind of like James' answer a little better because it doesn't require importing another module, but Greg's method is faster:
>>> from timeit import Timer
>>> Timer('struct.unpack("<L", "y\xcc\xa6\xbb")[0]', 'import struct').timeit()
0.36242198944091797
>>> Timer("int('y\xcc\xa6\xbb'.encode('hex'), 16)").timeit()
1.1432669162750244
My hacky method:
>>> Timer("sum(ord(c) << (i * 8) for i, c in enumerate('y\xcc\xa6\xbb'[::-1]))").timeit()
2.8819329738616943
FURTHER UPDATE:
Someone asked in comments what's the problem with importing another module. Well, importing a module isn't necessarily cheap, take a look:
>>> Timer("""import struct\nstruct.unpack(">L", "y\xcc\xa6\xbb")[0]""").timeit()
0.98822188377380371
Including the cost of importing the module negates almost all of the advantage that this method has. I believe that this will only include the expense of importing it once for the entire benchmark run; look what happens when I force it to reload every time:
>>> Timer("""reload(struct)\nstruct.unpack(">L", "y\xcc\xa6\xbb")[0]""", 'import struct').timeit()
68.474128007888794
Needless to say, if you're doing a lot of executions of this method per one import than this becomes proportionally less of an issue. It's also probably i/o cost rather than cpu so it may depend on the capacity and load characteristics of the particular machine.
In Python 3.2 and later, use
>>> int.from_bytes(b'y\xcc\xa6\xbb', byteorder='big')
2043455163
or
>>> int.from_bytes(b'y\xcc\xa6\xbb', byteorder='little')
3148270713
according to the endianness of your byte-string.
This also works for bytestring-integers of arbitrary length, and for two's-complement signed integers by specifying signed=True. See the docs for from_bytes.
You can also use the struct module to do this:
>>> struct.unpack("<L", "y\xcc\xa6\xbb")[0]
3148270713L
As Greg said, you can use struct if you are dealing with binary values, but if you just have a "hex number" but in byte format you might want to just convert it like:
s = 'y\xcc\xa6\xbb'
num = int(s.encode('hex'), 16)
...this is the same as:
num = struct.unpack(">L", s)[0]
...except it'll work for any number of bytes.
I use the following function to convert data between int, hex and bytes.
def bytes2int(str):
return int(str.encode('hex'), 16)
def bytes2hex(str):
return '0x'+str.encode('hex')
def int2bytes(i):
h = int2hex(i)
return hex2bytes(h)
def int2hex(i):
return hex(i)
def hex2int(h):
if len(h) > 1 and h[0:2] == '0x':
h = h[2:]
if len(h) % 2:
h = "0" + h
return int(h, 16)
def hex2bytes(h):
if len(h) > 1 and h[0:2] == '0x':
h = h[2:]
if len(h) % 2:
h = "0" + h
return h.decode('hex')
Source: http://opentechnotes.blogspot.com.au/2014/04/convert-values-to-from-integer-hex.html
import array
integerValue = array.array("I", 'y\xcc\xa6\xbb')[0]
Warning: the above is strongly platform-specific. Both the "I" specifier and the endianness of the string->int conversion are dependent on your particular Python implementation. But if you want to convert many integers/strings at once, then the array module does it quickly.
In Python 2.x, you could use the format specifiers <B for unsigned bytes, and <b for signed bytes with struct.unpack/struct.pack.
E.g:
Let x = '\xff\x10\x11'
data_ints = struct.unpack('<' + 'B'*len(x), x) # [255, 16, 17]
And:
data_bytes = struct.pack('<' + 'B'*len(data_ints), *data_ints) # '\xff\x10\x11'
That * is required!
See https://docs.python.org/2/library/struct.html#format-characters for a list of the format specifiers.
>>> reduce(lambda s, x: s*256 + x, bytearray("y\xcc\xa6\xbb"))
2043455163
Test 1: inverse:
>>> hex(2043455163)
'0x79cca6bb'
Test 2: Number of bytes > 8:
>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAAA"))
338822822454978555838225329091068225L
Test 3: Increment by one:
>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAAB"))
338822822454978555838225329091068226L
Test 4: Append one byte, say 'A':
>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAABA"))
86738642548474510294585684247313465921L
Test 5: Divide by 256:
>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAABA"))/256
338822822454978555838225329091068226L
Result equals the result of Test 4, as expected.
I was struggling to find a solution for arbitrary length byte sequences that would work under Python 2.x. Finally I wrote this one, it's a bit hacky because it performs a string conversion, but it works.
Function for Python 2.x, arbitrary length
def signedbytes(data):
"""Convert a bytearray into an integer, considering the first bit as
sign. The data must be big-endian."""
negative = data[0] & 0x80 > 0
if negative:
inverted = bytearray(~d % 256 for d in data)
return -signedbytes(inverted) - 1
encoded = str(data).encode('hex')
return int(encoded, 16)
This function has two requirements:
The input data needs to be a bytearray. You may call the function like this:
s = 'y\xcc\xa6\xbb'
n = signedbytes(s)
The data needs to be big-endian. In case you have a little-endian value, you should reverse it first:
n = signedbytes(s[::-1])
Of course, this should be used only if arbitrary length is needed. Otherwise, stick with more standard ways (e.g. struct).
int.from_bytes is the best solution if you are at version >=3.2.
The "struct.unpack" solution requires a string so it will not apply to arrays of bytes.
Here is another solution:
def bytes2int( tb, order='big'):
if order == 'big': seq=[0,1,2,3]
elif order == 'little': seq=[3,2,1,0]
i = 0
for j in seq: i = (i<<8)+tb[j]
return i
hex( bytes2int( [0x87, 0x65, 0x43, 0x21])) returns '0x87654321'.
It handles big and little endianness and is easily modifiable for 8 bytes
As mentioned above using unpack function of struct is a good way. If you want to implement your own function there is an another solution:
def bytes_to_int(bytes):
result = 0
for b in bytes:
result = result * 256 + int(b)
return result
In python 3 you can easily convert a byte string into a list of integers (0..255) by
>>> list(b'y\xcc\xa6\xbb')
[121, 204, 166, 187]
A decently speedy method utilizing array.array I've been using for some time:
predefined variables:
offset = 0
size = 4
big = True # endian
arr = array('B')
arr.fromstring("\x00\x00\xff\x00") # 5 bytes (encoding issues) [0, 0, 195, 191, 0]
to int: (read)
val = 0
for v in arr[offset:offset+size][::pow(-1,not big)]: val = (val<<8)|v
from int: (write)
val = 16384
arr[offset:offset+size] = \
array('B',((val>>(i<<3))&255 for i in range(size)))[::pow(-1,not big)]
It's possible these could be faster though.
EDIT:
For some numbers, here's a performance test (Anaconda 2.3.0) showing stable averages on read in comparison to reduce():
========================= byte array to int.py =========================
5000 iterations; threshold of min + 5000ns:
______________________________________code___|_______min______|_______max______|_______avg______|_efficiency
⣿⠀⠀⠀⠀⡇⢀⡀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⡀⠀⢰⠀⠀⠀⢰⠀⠀⠀⢸⠀⠀⢀⡇⠀⢀⠀⠀⠀⠀⢠⠀⠀⠀⠀⢰⠀⠀⠀⢸⡀⠀⠀⠀⢸⠀⡇⠀⠀⢠⠀⢰⠀⢸⠀
⣿⣦⣴⣰⣦⣿⣾⣧⣤⣷⣦⣤⣶⣾⣿⣦⣼⣶⣷⣶⣸⣴⣤⣀⣾⣾⣄⣤⣾⡆⣾⣿⣿⣶⣾⣾⣶⣿⣤⣾⣤⣤⣴⣼⣾⣼⣴⣤⣼⣷⣆⣴⣴⣿⣾⣷⣧⣶⣼⣴⣿⣶⣿⣶
val = 0 \nfor v in arr: val = (val<<8)|v | 5373.848ns | 850009.965ns | ~8649.64ns | 62.128%
⡇⠀⠀⢀⠀⠀⠀⡇⠀⡇⠀⠀⣠⠀⣿⠀⠀⠀⠀⡀⠀⠀⡆⠀⡆⢰⠀⠀⡆⠀⡄⠀⠀⠀⢠⢀⣼⠀⠀⡇⣠⣸⣤⡇⠀⡆⢸⠀⠀⠀⠀⢠⠀⢠⣿⠀⠀⢠⠀⠀⢸⢠⠀⡀
⣧⣶⣶⣾⣶⣷⣴⣿⣾⡇⣤⣶⣿⣸⣿⣶⣶⣶⣶⣧⣷⣼⣷⣷⣷⣿⣦⣴⣧⣄⣷⣠⣷⣶⣾⣸⣿⣶⣶⣷⣿⣿⣿⣷⣧⣷⣼⣦⣶⣾⣿⣾⣼⣿⣿⣶⣶⣼⣦⣼⣾⣿⣶⣷
val = reduce( shift, arr ) | 6489.921ns | 5094212.014ns | ~12040.269ns | 53.902%
This is a raw performance test, so the endian pow-flip is left out.
The shift function shown applies the same shift-oring operation as the for loop, and arr is just array.array('B',[0,0,255,0]) as it has the fastest iterative performance next to dict.
I should probably also note efficiency is measured by accuracy to the average time.

Categories

Resources