How to pack bits into variable length words in python? - python

I can pack characters into 32 bit words (or any other fixed size), but I want to make the size in bits a parameter:
Here's what works for 32 bits :
def vectorize_key(key):
return (v[0] << 24 | v[1] << 16 | v[2] << 8 | v[3] for v in split((ord(k) for k in key),4) )
And here's what doesn't work. It says int and tuple bad operands for | but I can not see how I get a tuple there. I explicitly "unpack" the tuple! :
def vectorize_key(key,word_size=32):
return (reduce(lambda p, (e,f) : p | (e << f),((x[i],i*8) for i in range(word_size/8))) for x in split((ord(k) for k in key),word_size/8))

Got it. I was missing the initilizer value for the reduce :
def vectorize_key(key,word_size=32):
return (reduce(lambda p, (e,f) : p | (e << f),((x[i],i*8) for i in range(word_size/8)),0) for x in split((ord(k) for k in key),word_size/8))

Related

Is it possible in python to pack a set of integers in non-byte increments?

Let's say I have three unsigned integers, A, B, and C, and I know the maximum values of each. Both A and B are less than 214, and C is less than 24. Since 14 + 14 + 4 = 32, I should be able to store these three integers in 32 bits, right?
If so, how can I do this in python? Struct.pack only appears to support byte-sized registers, so I would be limited to 'HHB', which is 40 bits (8 more than I need). Can this be done, or am I missing some fundamental concepts here?
Pack it yourself with bit operators:
(A<<18)|(B<<4)|C
Then use struct.pack on the 32-bit result.
The library sounds like the way to go, but just for fun with bitwise operations:
a_in, b_in, c_in = 15220, 9021, 9
a_bits, b_bits, c_bits = 14, 14, 4
packed = a_in << b_bits + c_bits | b_in << c_bits | c_in
a_out = (packed & 2 ** a_bits - 1 << b_bits + c_bits) >> b_bits + c_bits
b_out = (packed & 2 ** b_bits - 1 << c_bits) >> c_bits
c_out = packed & 2 ** c_bits - 1
print packed # 3989976025 = 11101101110100100011001111011001
print a_out # 15220 = 11101101110100
print b_out # 9021 = 10001100111101
print c_out # 9 = 1001

Bitwise Rotate Right

I am trying to convert this C function into Python;
typedef unsigned long var;
/* Bit rotate rightwards */
var ror(var v,unsigned int bits) {
return (v>>bits)|(v<<(8*sizeof(var)-bits));
}
I have tried Googling for some solutions, but I can't seem to get any of them to give the same results as the one here.
This is one solution I have found from another program;
def mask1(n):
"""Return a bitmask of length n (suitable for masking against an
int to coerce the size to a given length)
"""
if n >= 0:
return 2**n - 1
else:
return 0
def ror(n, rotations=1, width=8):
"""Return a given number of bitwise right rotations of an integer n,
for a given bit field width.
"""
rotations %= width
if rotations < 1:
return n
n &= mask1(width)
return (n >> rotations) | ((n << (8 * width - rotations)))
I am trying to btishift key = 0xf0f0f0f0f123456. The C code gives 000000000f0f0f12 when it is called with; ror(key, 8 << 1) and Python gives; 0x0f0f0f0f0f123456 (the original input!)
Your C output doesn't match the function that you provided. That is presumably because you are not printing it correctly. This program:
#include <stdio.h>
#include <stdint.h>
uint64_t ror(uint64_t v, unsigned int bits)
{
return (v>>bits) | (v<<(8*sizeof(uint64_t)-bits));
}
int main(void)
{
printf("%llx\n", ror(0x0123456789abcdef, 4));
printf("%llx\n", ror(0x0123456789abcdef, 8));
printf("%llx\n", ror(0x0123456789abcdef, 12));
printf("%llx\n", ror(0x0123456789abcdef, 16));
return 0;
}
produces the following output:
f0123456789abcde
ef0123456789abcd
def0123456789abc
cdef0123456789ab
To produce an ror function in Python I refer you to this excellent article: http://www.falatic.com/index.php/108/python-and-bitwise-rotation
This Python 2 code produces the same output as the C program above:
ror = lambda val, r_bits, max_bits: \
((val & (2**max_bits-1)) >> r_bits%max_bits) | \
(val << (max_bits-(r_bits%max_bits)) & (2**max_bits-1))
print "%x" % ror(0x0123456789abcdef, 4, 64)
print "%x" % ror(0x0123456789abcdef, 8, 64)
print "%x" % ror(0x0123456789abcdef, 12, 64)
print "%x" % ror(0x0123456789abcdef, 16, 64)
The shortest way I've found in Python:
(note this works only with integers as inputs)
def ror(n,rotations,width):
return (2**width-1)&(n>>rotations|n<<(width-rotations))
There are different problems in your question.
C part :
You use a value of key that is a 64 bits value (0x0f0f0f0f0f123456), but the output shows that for you compiler unsigned long is only 32 bits wide. So what C code does is rotating the 32 bits value 0x0f123456 16 times giving 0x34560f12
If you had used unsigned long long (assuming it is 64 bits on your architecture as it is on mine), you would have got 0x34560f0f0f0f0f12 (rotation 16 times of a 64 bits)
Python part :
The definition of width between mask1 and ror is not consistent. mask1 takes a width in bits, where ror takes a width in bytes and one byte = 8 bits.
The ror function should be :
def ror(n, rotations=1, width=8):
"""Return a given number of bitwise right rotations of an integer n,
for a given bit field width.
"""
rotations %= width * 8 # width bytes give 8*bytes bits
if rotations < 1:
return n
mask = mask1(8 * width) # store the mask
n &= mask
return (n >> rotations) | ((n << (8 * width - rotations)) & mask) # apply the mask to result
That way with key = 0x0f0f0f0f0f123456, you get :
>>> hex(ror(key, 16))
'0x34560f0f0f0f0f12L'
>>> hex(ror(key, 16, 4))
'0x34560f12L'
exactly the same as C output
i know its nearly 6 years old
I always find it easier to use string slices than bitwise operations.
def rotate_left(x, n):
return int(f"{x:032b}"[n:] + f"{x:032b}"[:n], 2)
def rotate_right(x, n):
return int(f"{x:032b}"[-n:] + f"{x:032b}"[:-n], 2)
def rotation_value(value, rotations, widht=32):
""" Return a given number of bitwise left or right rotations of an interger
value,
for a given bit field widht.
if rotations == -rotations:
left
else:
right
"""
if int(rotations) != abs(int(rotations)):
rotations = widht + int(rotations)
return (int(value)<<(widht-(rotations%widht)) | (int(value)>>(rotations%widht))) & ((1<<widht)-1)

Concatenate two 32 bit int to get a 64 bit long in Python

I want to generate 64 bits long int to serve as unique ID's for documents.
One idea is to combine the user's ID, which is a 32 bit int, with the Unix timestamp, which is another 32 bits int, to form an unique 64 bits long integer.
A scaled-down example would be:
Combine two 4-bit numbers 0010 and 0101 to form the 8-bit number 00100101.
Does this scheme make sense?
If it does, how do I do the "concatenation" of numbers in Python?
Left shift the first number by the number of bits in the second number, then add (or bitwise OR - replace + with | in the following examples) the second number.
result = (user_id << 32) + timestamp
With respect to your scaled-down example,
>>> x = 0b0010
>>> y = 0b0101
>>> (x << 4) + y
37
>>> 0b00100101
37
>>>
foo = <some int>
bar = <some int>
foobar = (foo << 32) + bar
This should do it:
(x << 32) + y
For the next guy (which was me in this case was me). Here is one way to do it in general (for the scaled down example):
def combineBytes(*args):
"""
given the bytes of a multi byte number combine into one
pass them in least to most significant
"""
ans = 0
for i, val in enumerate(args):
ans += (val << i*4)
return ans
for other sizes change the 4 to a 32 or whatever.
>>> bin(combineBytes(0b0101, 0b0010))
'0b100101'
None of the answers before this cover both merging and splitting the numbers. Splitting can be as much a necessity as merging.
NUM_BITS_PER_INT = 4 # Replace with 32, 48, 64, etc. as needed.
MAXINT = (1 << NUM_BITS_PER_INT) - 1
def merge(a, b):
c = (a << NUM_BITS_PER_INT) | b
return c
def split(c):
a = (c >> NUM_BITS_PER_INT) & MAXINT
b = c & MAXINT
return a, b
# Test
EXPECTED_MAX_NUM_BITS = NUM_BITS_PER_INT * 2
for a in range(MAXINT + 1):
for b in range(MAXINT + 1):
c = merge(a, b)
assert c.bit_length() <= EXPECTED_MAX_NUM_BITS
assert (a, b) == split(c)

Persistent Hashing of Strings in Python

How would you convert an arbitrary string into a unique integer, which would be the same across Python sessions and platforms? For example hash('my string') wouldn't work because a different value is returned for each Python session and platform.
Use a hash algorithm such as MD5 or SHA1, then convert the hexdigest via int():
>>> import hashlib
>>> int(hashlib.md5('Hello, world!').hexdigest(), 16)
144653930895353261282233826065192032313L
If a hash function really won't work for you, you can turn the string into a number.
my_string = 'my string'
def string_to_int(s):
ord3 = lambda x : '%.3d' % ord(x)
return int(''.join(map(ord3, s)))
In[10]: string_to_int(my_string)
Out[11]: 109121032115116114105110103L
This is invertible, by mapping each triplet through chr.
def int_to_string(n)
s = str(n)
return ''.join([chr(int(s[i:i+3])) for i in range(0, len(s), 3)])
In[12]: int_to_string(109121032115116114105110103L)
Out[13]: 'my string'
Here are my python27 implementation for algorithms listed here: http://www.cse.yorku.ca/~oz/hash.html.
No idea if they are efficient or not.
from ctypes import c_ulong
def ulong(i): return c_ulong(i).value # numpy would be better if available
def djb2(L):
"""
h = 5381
for c in L:
h = ((h << 5) + h) + ord(c) # h * 33 + c
return h
"""
return reduce(lambda h,c: ord(c) + ((h << 5) + h), L, 5381)
def djb2_l(L):
return reduce(lambda h,c: ulong(ord(c) + ((h << 5) + h)), L, 5381)
def sdbm(L):
"""
h = 0
for c in L:
h = ord(c) + (h << 6) + (h << 16) - h
return h
"""
return reduce(lambda h,c: ord(c) + (h << 6) + (h << 16) - h, L, 0)
def sdbm_l(L):
return reduce(lambda h,c: ulong(ord(c) + (h << 6) + (h << 16) - h), L, 0)
def loselose(L):
"""
h = 0
for c in L:
h += ord(c);
return h
"""
return sum(ord(c) for c in L)
def loselose_l(L):
return reduce(lambda h,c: ulong(ord(c) + h), L, 0)
First off, you probably don't really want the integers to be actually unique. If you do then your numbers might be unlimited in size. If that really is what you want then you could use a bignum library and interpret the bits of the string as the representation of a (potentially very large) integer. If your strings can include the \0 character then you should prepend a 1, so you can distinguish e.g. "\0\0" from "\0".
Now, if you prefer bounded-size numbers you'll be using some form of hashing. MD5 will work but it's overkill for the stated purpose. I recommend using sdbm instead, it works very well. In C it looks like this:
static unsigned long sdbm(unsigned char *str)
{
unsigned long hash = 0;
int c;
while (c = *str++)
hash = c + (hash << 6) + (hash << 16) - hash;
return hash;
}
The source, http://www.cse.yorku.ca/~oz/hash.html, also presents a few other hash functions.
Here's another option, quite crude (probably has many collisions) and not very legible.
It worked for the purpose of generating an int (and later on, a random color) for different strings:
aString = "don't panic"
reduce( lambda x,y:x+y, map( lambda x:ord(x[0])*x[1],zip( aString, range( 1, len( aString ) ) ) ) )

How do I manipulate bits in Python?

In C I could, for example, zero out bit #10 in a 32 bit unsigned value like so:
unsigned long value = 0xdeadbeef;
value &= ~(1<<10);
How do I do that in Python ?
Bitwise operations on Python ints work much like in C. The &, | and ^ operators in Python work just like in C. The ~ operator works as for a signed integer in C; that is, ~x computes -x-1.
You have to be somewhat careful with left shifts, since Python integers aren't fixed-width. Use bit masks to obtain the low order bits. For example, to do the equivalent of shift of a 32-bit integer do (x << 5) & 0xffffffff.
value = 0xdeadbeef
value &= ~(1<<10)
Some common bit operations that might serve as example:
def get_bit(value, n):
return ((value >> n & 1) != 0)
def set_bit(value, n):
return value | (1 << n)
def clear_bit(value, n):
return value & ~(1 << n)
Usage e.g.
>>> get_bit(5, 2)
True
>>> get_bit(5, 1)
False
>>> set_bit(5, 1)
7
>>> clear_bit(5, 2)
1
>>> clear_bit(7, 2)
3
Python has C style bit manipulation operators, so your example is literally the same in Python except without type keywords.
value = 0xdeadbeef
value &= ~(1 << 10)
You should also check out BitArray, which is a nice interface for dealing with sequences of bits.
Omit the 'unsigned long', and the semi-colons are not needed either:
value = 0xDEADBEEF
value &= ~(1<<10)
print value
"0x%08X" % value
Have you tried copying and pasting your code into the Python REPL to see what will happen?
>>> value = 0xdeadbeef
>>> value &= ~(1<<10)
>>> hex (value)
'0xdeadbaef'
If you're going to do a lot of bit manipulation ( and you care much more about readability rather than performance for your application ) then you may want to create an integer wrapper to enable slicing like in Verilog or VHDL:
import math
class BitVector:
def __init__(self,val):
self._val = val
def __setslice__(self,highIndx,lowIndx,newVal):
assert math.ceil(math.log(newVal)/math.log(2)) <= (highIndx-lowIndx+1)
# clear out bit slice
clean_mask = (2**(highIndx+1)-1)^(2**(lowIndx)-1)
self._val = self._val ^ (self._val & clean_mask)
# set new value
self._val = self._val | (newVal<<lowIndx)
def __getslice__(self,highIndx,lowIndx):
return (self._val>>lowIndx)&(2L**(highIndx-lowIndx+1)-1)
b = BitVector(0)
b[3:0] = 0xD
b[7:4] = 0xE
b[11:8] = 0xA
b[15:12] = 0xD
for i in xrange(0,16,4):
print '%X'%b[i+3:i]
Outputs:
D
E
A
D
a = int('00001111', 2)
b = int('11110000', 2)
bin(a & b)[2:].zfill(8)
bin(a | b)[2:].zfill(8)
bin(a << 2)[2:].zfill(8)
bin(a >> 2)[2:].zfill(8)
bin(a ^ b)[2:].zfill(8)
int(bin(a | b)[2:].zfill(8), 2)

Categories

Resources