i am trying to combine several variables into 1 element of a bytearray.
I have the variables:
version, padding, extension, cc
of sizes: 2b, 1b, 1b, 4b
how do i combine them in that order as one byte?
If the variables are integers, you can just use bit-shifts and bitwise-or operations to form a value comprised of 8-bits and then store that where you want in the bytearray.
ba[i] = version << 6 | padding << 5 | extension << 4 | cc
You can pack them into a byte using shift and bit-masking.
version, padding, extension, cc = 2, 0, 1, 3
byte = ((version & 3) << 6) | ((padding & 1) << 5) | ((extension & 1) << 4) | (cc & 7)
byte
# OUT: 147
Note that you have to mask them first or else if the value exceeds the range it will clobber the other fields.
Related
I have a hex code like this:
\xf0\x9f\x94\xb4
And I want to encode this like this:
1F534
How can I transform it with a method in python 2.7?
Thanks
Here you are just asking: how can I find the unicode code of the character represented in utf8 with the (byte) string '\xf0\x9f\x94\xb4'?
In Python3 it would be as simple as:
>>> hex(ord(b'\xf0\x9f\x94\xb4'.decode()))
'0x1f534'
In a Python2 version compiled with --enable-unicode=ucs4, it would be more or less the same:
>>> hex(ord('\xf0\x9f\x94\xb4'.decode('utf-8')))
'0x1f534'
But after your comments, you have a Python 2.7 version compiled with --enable-unicode=ucs2. In that case, Unicode strings actually contain a UTF16 representation of the string:
>>> print [hex(ord(i)) for i in '\xf0\x9f\x94\xb4'.decode('utf-8')]
['0xd83d', '0xdd34']
with no direct way to find the true unicode code point of the U+1F534 LARGE RED CIRCLE character.
The last option is then to decode the utf8 sequence by hand. You can find the description of the UTF8 encoding on wikipedia. The following function take an utf-8 representation of an unicode character and return its code point:
def from_utf8(bstr):
b = [ord(i) for i in bstr]
if b[0] & 0x80 == 0: return b
if b[0] & 0xe0 == 0xc0:
return ((b[0] & 0x1F) << 6) | (b[1] & 0x3F)
if b[0] & 0xf0 == 0xe0:
return ((b[0] & 0xF) << 12) | ((b[1] & 0x3F) << 6) | (b[2] & 0x3F)
else:
return ((b[0] & 7) << 18) | ((b[1] & 0x3F) << 12) | \
((b[2] & 0x3F) << 6) | (b[3] & 0x3F)
Beware, no control is done here to make sure that the string is a correct UTF-8 representation of a single character... But at least it gives the expected result:
>>> print hex(from_utf8("\xf0\x9f\x94\xb4"))
0x1f534
Let's say I have three unsigned integers, A, B, and C, and I know the maximum values of each. Both A and B are less than 214, and C is less than 24. Since 14 + 14 + 4 = 32, I should be able to store these three integers in 32 bits, right?
If so, how can I do this in python? Struct.pack only appears to support byte-sized registers, so I would be limited to 'HHB', which is 40 bits (8 more than I need). Can this be done, or am I missing some fundamental concepts here?
Pack it yourself with bit operators:
(A<<18)|(B<<4)|C
Then use struct.pack on the 32-bit result.
The library sounds like the way to go, but just for fun with bitwise operations:
a_in, b_in, c_in = 15220, 9021, 9
a_bits, b_bits, c_bits = 14, 14, 4
packed = a_in << b_bits + c_bits | b_in << c_bits | c_in
a_out = (packed & 2 ** a_bits - 1 << b_bits + c_bits) >> b_bits + c_bits
b_out = (packed & 2 ** b_bits - 1 << c_bits) >> c_bits
c_out = packed & 2 ** c_bits - 1
print packed # 3989976025 = 11101101110100100011001111011001
print a_out # 15220 = 11101101110100
print b_out # 9021 = 10001100111101
print c_out # 9 = 1001
I am trying to convert this C function into Python;
typedef unsigned long var;
/* Bit rotate rightwards */
var ror(var v,unsigned int bits) {
return (v>>bits)|(v<<(8*sizeof(var)-bits));
}
I have tried Googling for some solutions, but I can't seem to get any of them to give the same results as the one here.
This is one solution I have found from another program;
def mask1(n):
"""Return a bitmask of length n (suitable for masking against an
int to coerce the size to a given length)
"""
if n >= 0:
return 2**n - 1
else:
return 0
def ror(n, rotations=1, width=8):
"""Return a given number of bitwise right rotations of an integer n,
for a given bit field width.
"""
rotations %= width
if rotations < 1:
return n
n &= mask1(width)
return (n >> rotations) | ((n << (8 * width - rotations)))
I am trying to btishift key = 0xf0f0f0f0f123456. The C code gives 000000000f0f0f12 when it is called with; ror(key, 8 << 1) and Python gives; 0x0f0f0f0f0f123456 (the original input!)
Your C output doesn't match the function that you provided. That is presumably because you are not printing it correctly. This program:
#include <stdio.h>
#include <stdint.h>
uint64_t ror(uint64_t v, unsigned int bits)
{
return (v>>bits) | (v<<(8*sizeof(uint64_t)-bits));
}
int main(void)
{
printf("%llx\n", ror(0x0123456789abcdef, 4));
printf("%llx\n", ror(0x0123456789abcdef, 8));
printf("%llx\n", ror(0x0123456789abcdef, 12));
printf("%llx\n", ror(0x0123456789abcdef, 16));
return 0;
}
produces the following output:
f0123456789abcde
ef0123456789abcd
def0123456789abc
cdef0123456789ab
To produce an ror function in Python I refer you to this excellent article: http://www.falatic.com/index.php/108/python-and-bitwise-rotation
This Python 2 code produces the same output as the C program above:
ror = lambda val, r_bits, max_bits: \
((val & (2**max_bits-1)) >> r_bits%max_bits) | \
(val << (max_bits-(r_bits%max_bits)) & (2**max_bits-1))
print "%x" % ror(0x0123456789abcdef, 4, 64)
print "%x" % ror(0x0123456789abcdef, 8, 64)
print "%x" % ror(0x0123456789abcdef, 12, 64)
print "%x" % ror(0x0123456789abcdef, 16, 64)
The shortest way I've found in Python:
(note this works only with integers as inputs)
def ror(n,rotations,width):
return (2**width-1)&(n>>rotations|n<<(width-rotations))
There are different problems in your question.
C part :
You use a value of key that is a 64 bits value (0x0f0f0f0f0f123456), but the output shows that for you compiler unsigned long is only 32 bits wide. So what C code does is rotating the 32 bits value 0x0f123456 16 times giving 0x34560f12
If you had used unsigned long long (assuming it is 64 bits on your architecture as it is on mine), you would have got 0x34560f0f0f0f0f12 (rotation 16 times of a 64 bits)
Python part :
The definition of width between mask1 and ror is not consistent. mask1 takes a width in bits, where ror takes a width in bytes and one byte = 8 bits.
The ror function should be :
def ror(n, rotations=1, width=8):
"""Return a given number of bitwise right rotations of an integer n,
for a given bit field width.
"""
rotations %= width * 8 # width bytes give 8*bytes bits
if rotations < 1:
return n
mask = mask1(8 * width) # store the mask
n &= mask
return (n >> rotations) | ((n << (8 * width - rotations)) & mask) # apply the mask to result
That way with key = 0x0f0f0f0f0f123456, you get :
>>> hex(ror(key, 16))
'0x34560f0f0f0f0f12L'
>>> hex(ror(key, 16, 4))
'0x34560f12L'
exactly the same as C output
i know its nearly 6 years old
I always find it easier to use string slices than bitwise operations.
def rotate_left(x, n):
return int(f"{x:032b}"[n:] + f"{x:032b}"[:n], 2)
def rotate_right(x, n):
return int(f"{x:032b}"[-n:] + f"{x:032b}"[:-n], 2)
def rotation_value(value, rotations, widht=32):
""" Return a given number of bitwise left or right rotations of an interger
value,
for a given bit field widht.
if rotations == -rotations:
left
else:
right
"""
if int(rotations) != abs(int(rotations)):
rotations = widht + int(rotations)
return (int(value)<<(widht-(rotations%widht)) | (int(value)>>(rotations%widht))) & ((1<<widht)-1)
I can pack characters into 32 bit words (or any other fixed size), but I want to make the size in bits a parameter:
Here's what works for 32 bits :
def vectorize_key(key):
return (v[0] << 24 | v[1] << 16 | v[2] << 8 | v[3] for v in split((ord(k) for k in key),4) )
And here's what doesn't work. It says int and tuple bad operands for | but I can not see how I get a tuple there. I explicitly "unpack" the tuple! :
def vectorize_key(key,word_size=32):
return (reduce(lambda p, (e,f) : p | (e << f),((x[i],i*8) for i in range(word_size/8))) for x in split((ord(k) for k in key),word_size/8))
Got it. I was missing the initilizer value for the reduce :
def vectorize_key(key,word_size=32):
return (reduce(lambda p, (e,f) : p | (e << f),((x[i],i*8) for i in range(word_size/8)),0) for x in split((ord(k) for k in key),word_size/8))
I want to generate 64 bits long int to serve as unique ID's for documents.
One idea is to combine the user's ID, which is a 32 bit int, with the Unix timestamp, which is another 32 bits int, to form an unique 64 bits long integer.
A scaled-down example would be:
Combine two 4-bit numbers 0010 and 0101 to form the 8-bit number 00100101.
Does this scheme make sense?
If it does, how do I do the "concatenation" of numbers in Python?
Left shift the first number by the number of bits in the second number, then add (or bitwise OR - replace + with | in the following examples) the second number.
result = (user_id << 32) + timestamp
With respect to your scaled-down example,
>>> x = 0b0010
>>> y = 0b0101
>>> (x << 4) + y
37
>>> 0b00100101
37
>>>
foo = <some int>
bar = <some int>
foobar = (foo << 32) + bar
This should do it:
(x << 32) + y
For the next guy (which was me in this case was me). Here is one way to do it in general (for the scaled down example):
def combineBytes(*args):
"""
given the bytes of a multi byte number combine into one
pass them in least to most significant
"""
ans = 0
for i, val in enumerate(args):
ans += (val << i*4)
return ans
for other sizes change the 4 to a 32 or whatever.
>>> bin(combineBytes(0b0101, 0b0010))
'0b100101'
None of the answers before this cover both merging and splitting the numbers. Splitting can be as much a necessity as merging.
NUM_BITS_PER_INT = 4 # Replace with 32, 48, 64, etc. as needed.
MAXINT = (1 << NUM_BITS_PER_INT) - 1
def merge(a, b):
c = (a << NUM_BITS_PER_INT) | b
return c
def split(c):
a = (c >> NUM_BITS_PER_INT) & MAXINT
b = c & MAXINT
return a, b
# Test
EXPECTED_MAX_NUM_BITS = NUM_BITS_PER_INT * 2
for a in range(MAXINT + 1):
for b in range(MAXINT + 1):
c = merge(a, b)
assert c.bit_length() <= EXPECTED_MAX_NUM_BITS
assert (a, b) == split(c)