Interesting ways to check the size of an int [duplicate] - python

How can I find out the number of Bytes a certain integer number takes up to store?
E.g. for
hexadecimal \x00 - \xff (or decimal 0 - 255 = binary 0000 0000 - 1111 1111) I'm looking to get 1 (Byte),
hexadecimal \x100 - \xffff (or decimal 256 - 65535 = binary 0000 0001 0000 0000 - 1111 1111 1111 1111) would give me 2 (Bytes)
and so on.
Any clue for hexadecimal or decimal format as the input?

def byte_length(i):
return (i.bit_length() + 7) // 8
Of course, as Jon Clements points out, this isn't the size of the actual PyIntObject, which has a PyObject header, and stores the value as a bignum in whatever way is easiest to deal with rather than most compact, and which you have to have at least one pointer (4 or 8 bytes) to on top of the actual object, and so on.
But this is the byte length of the number itself. It's almost certainly the most efficient answer, and probably also the easiest to read.
Or is ceil(i.bit_length() / 8.0) more readable?

Unless you're dealing with an array.array or a numpy.array - the size always has object overhead. And since Python deals with BigInts naturally, it's really, really hard to tell...
>>> i = 5
>>> import sys
>>> sys.getsizeof(i)
24
So on a 64bit platform it requires 24 bytes to store what could be stored in 3 bits.
However, if you did,
>>> s = '\x05'
>>> sys.getsizeof(s)
38
So no, not really - you've got the memory-overhead of the definition of the object rather than raw storage...
If you then take:
>>> a = array.array('i', [3])
>>> a
array('i', [3])
>>> sys.getsizeof(a)
60L
>>> a = array.array('i', [3, 4, 5])
>>> sys.getsizeof(a)
68L
Then you get what would be called normal byte boundaries, etc.. etc... etc...
If you just want what "purely" should be stored - minus object overhead, then from 2.(6|7) you can use some_int.bit_length() (otherwise just bitshift it as other answers have shown) and then work from there

You can use simple math:
>>> from math import log
>>> def bytes_needed(n):
... if n == 0:
... return 1
... return int(log(n, 256)) + 1
...
>>> bytes_needed(0x01)
1
>>> bytes_needed(0x100)
2
>>> bytes_needed(0x10000)
3

By using a simple biwise operation to move all the used bits over 1 byte each time you can see how many bytes are needed to store a number.
It's probably worth noting that while this method is very generic, it will not work on negative numbers and only looks at the binary of the variable without taking into account what it is stored in.
a = 256
i = 0
while(a > 0):
a = a >> 8;
i += 1;
print (i)
The program behaves as follows:
a is 0000 0001 0000 0000 in binary
each run of the loop will shift this to the left by 8:
loop 1:
0000 0001 >> 0000 0000
0000 0001 > 0 (1 > 0)
loop 2:
0000 0000 >> 0000 0001
0000 0000 > 0 (0 > 0)
END 0 is not > 0
so there are 2 bytes needed to store the number.

on python command prompt, you can use size of function
**$ import python
$ import ctypes
$ ctypes.sizeof(ctypes.c_int)**

# Python 3
import math
nbr = 0xff # 255 defined in hexadecimal
nbr = "{0:b}".format(nbr) # Transform the number into a string formated as bytes.
bit_length = len(nbr) # Number of characters
byte_length = math.ceil( bit_length/8 ) # Get minimum number of bytes

Related

Efficient way to extract individual bit values from a hexa string

I'm developing a real-time application where I have to process a line of data as fast as possible to send it over to an app. These lines arrive at a very fast rate, around 40k per minute. The task is to extract the value of certain individual bits from the hexa data in the line. I have a solution already but I doubt it's the most efficient one, so I'm asking if you can improve it.
A sample line of data:
p1 p2 p3 len data
1497383697 0120 000 5 00 30 63 4f 15
len is how many bytes are in the data, data is what we're working with. Let's say I want to extract 3 bits starting from the 11th from the left. Converting the hexa to binary with padding:
0x0030634f15 = 0000 0000 0011 0000 0110 0011 0100 1111 0001 0101
The wanted value is 0b110 which is 6 in decimal.
My working solution for the problem is this:
# 11 and 3 in the example
start = config.getint(p, 'start')
length = config.getint(p, 'length')
parts = line.split()
hexadata = ''.join(parts[4:])
bindata = bin(int(hexadata, 16))[2:].zfill(len(hexadata) * 4)
val = int(bindata[start:start + length], 2)
val will hold the value 6 in the end. Any other, more efficent way to do this? Thank you
Instead of using string operations, it's faster to convert the input to a number and use bit operations:
parts = line.split(maxsplit=4)
# remove spaces in the number and convert it to int from base 16
num = int(parts[4].replace(' ', ''), 16)
# create a bit mask with exactly `length` 1s
mask = (1 << length) - 1
# calculate the offset from the right
shift = 40 - start - length
# shift the value to the right and apply the binary mask to get our value
val = (num >> shift) & mask
According to my timings, the bit operations are faster by about 20%. Timing results with 1 million iterations:
string_ops 2.735653492003621 seconds
bit_ops 2.190693126998667 seconds

The best way to base64 encode the last 6k bits of a python integer

I illustrate a case for where k = 2 (so, the bottom 12 digits)
import base64
# Hi
# 7 - 34
# 000111 - 100010
# 0001 - 1110 - 0010 = 0x1E2 = 482
# 1
integer = int(bin(482)[-12:] + '0' * 20, 2)
encoded = base64.b64encode(base64.b16decode('{0:08X}'.format(integer)))
print encoded
# 2
encoded = base64.b64encode(base64.b16decode('{0:08X}'.format(482 << 20)))
print encoded
Both output HiAAAA== as desired
An ideone link for your convenience: http://ideone.com/O73kQs
Intuitively these are very clear, and I'm favoring #2 by quite a bit.
One thing that "irks" me about #1, is that if the integers in python are not 32 bits, then I'm in trouble.
How can I get the proper size of an int? (total python newbie question?) (edit: yes, apparently a newbie-ish question How can i determine the exact size of a type used by python)
It would be nice, however, if there was a way to simply do something like
encoded = base64.b64encode('{0:08X}'.format(482 << 20))
Moreover, how can I go from
bin(1)
which equals
'0b1'
to the actual binary literal
0b1
you can go from bin back with int , which takes an optional 2nd parameter that is base
int(bin(18)[2:],2)
since you use this earlier you must know about it ... so I only assume you mean something else by binary literal than its integer representation ... although for the life of me im not sure what that is...
you can do
print 0b1
and see that the actual repr is the decimal value ...
to get the last 12 bits of an int
my_int = 482
k=2
mask = int("1"*(6*k),2)
last_bits = my_int & mask
then you can just shift it 20 or whatever ...
first get the last 12 bits as demonstrated above
import struct
print struct.pack('H',last_bits)
print struct.pack('H',0b100001)
alternatively you could
def get_chars(int_val):
while int_val > 0:
yield chr(int_val & 0xFF)
int_val <<= 8
print repr("".join(get_chars(last_bits)))

Get size in Bytes needed for an integer in Python

How can I find out the number of Bytes a certain integer number takes up to store?
E.g. for
hexadecimal \x00 - \xff (or decimal 0 - 255 = binary 0000 0000 - 1111 1111) I'm looking to get 1 (Byte),
hexadecimal \x100 - \xffff (or decimal 256 - 65535 = binary 0000 0001 0000 0000 - 1111 1111 1111 1111) would give me 2 (Bytes)
and so on.
Any clue for hexadecimal or decimal format as the input?
def byte_length(i):
return (i.bit_length() + 7) // 8
Of course, as Jon Clements points out, this isn't the size of the actual PyIntObject, which has a PyObject header, and stores the value as a bignum in whatever way is easiest to deal with rather than most compact, and which you have to have at least one pointer (4 or 8 bytes) to on top of the actual object, and so on.
But this is the byte length of the number itself. It's almost certainly the most efficient answer, and probably also the easiest to read.
Or is ceil(i.bit_length() / 8.0) more readable?
Unless you're dealing with an array.array or a numpy.array - the size always has object overhead. And since Python deals with BigInts naturally, it's really, really hard to tell...
>>> i = 5
>>> import sys
>>> sys.getsizeof(i)
24
So on a 64bit platform it requires 24 bytes to store what could be stored in 3 bits.
However, if you did,
>>> s = '\x05'
>>> sys.getsizeof(s)
38
So no, not really - you've got the memory-overhead of the definition of the object rather than raw storage...
If you then take:
>>> a = array.array('i', [3])
>>> a
array('i', [3])
>>> sys.getsizeof(a)
60L
>>> a = array.array('i', [3, 4, 5])
>>> sys.getsizeof(a)
68L
Then you get what would be called normal byte boundaries, etc.. etc... etc...
If you just want what "purely" should be stored - minus object overhead, then from 2.(6|7) you can use some_int.bit_length() (otherwise just bitshift it as other answers have shown) and then work from there
You can use simple math:
>>> from math import log
>>> def bytes_needed(n):
... if n == 0:
... return 1
... return int(log(n, 256)) + 1
...
>>> bytes_needed(0x01)
1
>>> bytes_needed(0x100)
2
>>> bytes_needed(0x10000)
3
By using a simple biwise operation to move all the used bits over 1 byte each time you can see how many bytes are needed to store a number.
It's probably worth noting that while this method is very generic, it will not work on negative numbers and only looks at the binary of the variable without taking into account what it is stored in.
a = 256
i = 0
while(a > 0):
a = a >> 8;
i += 1;
print (i)
The program behaves as follows:
a is 0000 0001 0000 0000 in binary
each run of the loop will shift this to the left by 8:
loop 1:
0000 0001 >> 0000 0000
0000 0001 > 0 (1 > 0)
loop 2:
0000 0000 >> 0000 0001
0000 0000 > 0 (0 > 0)
END 0 is not > 0
so there are 2 bytes needed to store the number.
on python command prompt, you can use size of function
**$ import python
$ import ctypes
$ ctypes.sizeof(ctypes.c_int)**
# Python 3
import math
nbr = 0xff # 255 defined in hexadecimal
nbr = "{0:b}".format(nbr) # Transform the number into a string formated as bytes.
bit_length = len(nbr) # Number of characters
byte_length = math.ceil( bit_length/8 ) # Get minimum number of bytes

Extracting/Shifting bits in python for websocket RFC 6455

I am trying to implement my own Websocket server in python im following the RFC 6455 Spec and im running into problems extracting the bits from the Base Frame header
im not having problems with the protocol im having problems with basic binary/hex math magic
according to the Specs the first 4 bits are single bit values
so to get the first bit i do something like this (d being my data from the websocket)
first_byte = ord(d[0])
print "finished bit",(first_byte >> 7) & 1
and later on if i want to get the payload size i do
sec_byte = ord(d[1])
print "payload size",sec_byte & 0x7f
however later in the spec i need to grab a 4bit value for the opcodes
this is what i need help on maybe even a link to how this math works ive googled/duckduckgoed my brains out most results being from stackoverflow
even more tinkering and its starting to fall into place i had been stuck on this for about 4 days now and still unsolved for anymore info anyone can give.
If you need to consider only the first (Most Significant) 4 bits, you need to right shift by 4 (extra masking with And could be unuseful, e.g. if your value is in the range 0-255, but it even stresses the bits you're are interested in). E.g.
>>> d = [128, 80, 40]
>>> print (d[0] >> 4) & 15
8
>>> print (d[1] >> 4) & 15
5
>>> print (d[2] >> 4) & 15
2
128 is in binary 1000 0000; right shifting by 4 gives 0000 1000 ("new" 0 bits enter from left), i.e. 8; 80 is 0101 0000, so you obtain 0000 0101; and finally 40 is 0010 1000 and we obtain 0000 0010.
In general, consider an octet like abcd efgh where each letter is a bit. You have to shift and And in order to isolate the bits you are interested in. E.g. suppose your spec says cd bits define four different kind of something. In order to obtain that 0-3 number, you right shift by 4 again, and and with 3, that is 0000 0011, i.e. you "isolate" the bits you want.

Getting a specific bit value in a byte string

There is a byte at a specific index in a byte string which represents eight flags; one flag per bit in the byte. If a flag is set, its corresponding bit is 1, otherwise its 0. For example, if I've got
b'\x21'
the flags would be
0001 0101 # Three flags are set at indexes 0, 2 and 4
# and the others are not set
What would be the best way to get each bit value in that byte, so I know whether a particular flag is set or not? (Preferably using bitwise operations)
Typically, the least-significant bit is bit index 0 and the most-significant bit is bit index 7. Using this terminology, we can determine if bit index k is set by taking the bitwise-and with 1 shifted to the left by k. If the bitwise and is non-zero, then that means that index k has a 1; otherwise, index k has a 0. So:
def get_bit(byteval,idx):
return ((byteval&(1<<idx))!=0);
This will correctly determine the value of bits at indices 0...7 of the byte, going from right-to-left (i.e. the least significant bit to the most significant bit, or equivalently from the 1s place to the 27 = 128 place).
Why it works
I figured I should add an explanation of why it works...
1<<0 is 1 = 0000 0001
1<<1 is 2 = 0000 0010
1<<2 is 4 = 0000 0100
As you can see, 1<<k is equivalent to 2k and contains a 1 at exactly the index we are interested and at no other location. Consequently, the bitwise and with 1<<k will either return 0 or 1<<k; it will be 0 if the bit at the index we are interested in is 0 (because 1 and 0 is 0, and all other bits in 1<<k are zero). If the bit we are interested in is 1, then we get a 1 and a 1 in that position, and a 0 and something else, everywhere else.
x & 1, x & 2, x & 4, x & 8, etc
if those are >0 then the bit 1,2,3,4, etc is set
Specify bit masks (read about bit masks on Wikipedia):
FLAG_1 = 1 # 0000 0001
FLAG_2 = 2 # 0000 0010
FLAG_3 = 4 # 0000 0100
...
And then use AND to check whether a bit is set (flags contains your byte):
if(flags & FLAG_1) { # bit 0 is set, example: 0001 0101 & 0000 0001 = 0000 0001
}
if(flags & FLAG_2) { # bit 1 is set, example: 0001 0101 & 000 0010 = 0000 0000
}
...
Of course you should name FLAG_1, etc to something meaningful, depending on the context. E.g. ENABLE_BORDER.
Update:
I was confused about your comment which bits are set, but after reading another answer I realized your are counting the bits from the wrong end. Bits are numbered zero based from the right.
The function would be:
def get_site_value(byteval, index):
''' Function to get local value at a given index'''
return (byteval >> index) &1

Categories

Resources