Python character length of a binary value - python

I am looking for way to determine the number of characters my binary value takes up.
For example if my values binary values were 4, 20, and 60 I'd get the following results:
bin(4), 0b100 = 3
bin(20), 0b10100 = 5
bin(60), 0b111100 = 6

a = 20
a.bit_length()
>> 5

A positive integer n has b bits when 2b-1 ≤ n ≤ 2b – 1. So The number of bits required to represent an integer n is :
floor(log n)+1 # note that base of log is 2
And since you have 0b at the leading you need to add 2 to aforementioned formula.
So it would be :
floor(log n) + 3
And in python you can use math module like following:
math.floor(math.log(n, 2)) + 3
Example :
>>> math.floor(math.log(10, 2)) + 3
6.0
>>>
>>> len(bin(10))
6
>>> math.floor(math.log(77, 2)) + 3
9.0
>>> len(bin(77))
9
As a more Pythonic way you can also use int.bit_length which returns the number of bits needs to represent an integer object. So for get the number of require characters you can add it with 2 :
int.bit_length() + 2

Related

Pandas: divide column into three bins of exact same size

What I have right now looks like this:
spread
0 0.00000787
1 0.00000785
2 0.00000749
3 0.00000788
4 0.00000786
5 0.00000538
6 0.00000472
7 0.00000759
And I would like to add a new column next to it, and if the value of spread in between (for example) 0 and 0.00005 then it is part of bin A, if (for example) between 0.00005 and 0.0006 then bin B (there are three bins in total). What I have tried so far:
minspread = df['spread'].min()
maxspread = df['spread'].max()
born = (float(maxspread)-float(minspread))/3
born1 = born + float(minspread)
born2 = float(maxspread) - born
df['Bin'] = df['spread'].apply(lambda x: 'A' if x < born1 else ( 'B' if born1 < x <= born2 else 'C'))
But when I do so everything ends up in the Bin A:
spread Bin
0 0.00000787 A
1 0.00000785 A
2 0.00000749 A
3 0.00000788 A
4 0.00000786 A
Does anyone have an idea on how to divide the column 'spread' in three bins (A-B-C) with the same number of observations in it? Thanks!
If get error:
unsupported operand type(s) for +: 'decimal.Decimal' and 'float'
It means the column type is Decimal, which works poorly with pandas, and should be converted to numeric.
One possible solution is to multiply columns by some big number e.g. 10e15 and convert to integer to avoid lost precision if converting to floats and then use qcut:
#sample data
#from decimal import Decimal
#df['spread'] = [Decimal(x) for x in df['spread']]
df['spread1'] = (df['spread'] * 10**15).astype(np.int64)
df['bins'] = pd.qcut(df['spread1'], 3, labels=list('ABC'))
print (df)
spread spread1 bins
0 0.00000787 7870000000 C
1 0.00000785 7850000000 B
2 0.00000749 7490000000 A
3 0.00000788 7880000000 C
4 0.00000786 7860000000 C
5 0.00000538 5380000000 A
6 0.00000472 4720000000 A
7 0.00000759 7590000000 B
Solution with no new column:
s = (df['spread'] * 10**15).astype(np.int64)
df['bins'] = pd.qcut(s, 3, labels=list('ABC'))
print (df)
spread bins
0 0.00000787 C
1 0.00000785 B
2 0.00000749 A
3 0.00000788 C
4 0.00000786 C
5 0.00000538 A
6 0.00000472 A
7 0.00000759 B

Math function to find the biggest multiple of a number within a range

I want to know if there is a math expression that I can use to find this relation between two numbers.
Some examples of the input and expected output are below:
Input Multiple Result
4 3 3
6 3 6
8 3 6
4 4 4
12 4 12
16 5 15
Also, the expressions below from Wolfram Alpha show me the expected result but since they don't expand on the explanation on how to do it I can't learn from them...
Biggest multiple of 4 from 10
Biggest multiple of 4 from 12
try with // and % operators!
for //, you would do
Result = (Input // Multiple) * Multiple
This way you get how many times Multiple Fits into Input - this number is then multiplied with the Multiple itself and therefore gives you the expected results!
EDIT: how to do it with modulo %?
Result = Input - (Input % Multiple)
taken from MCO's answer!
You can employ modulo for this. For example, to calculate the biggest multiple of 4 that is less or equal than 13:
13 % 4 = 1
13 - 1 = 12
in python, that could look like this:
def biggest_multiple(multiple_of, input_number):
return input_number - input_number % multiple_of
So you use it as:
$ biggest_multiple(4, 9)
8
$ biggest_multiple(4, 12)
12
Here's how I would do it:
return int(input / multiple) * multiple
It truncates the division so that you get an integer, which you can multiply.
This can be trivial but damn easy to understand. To take into account if multiple is negative or zero
Multiple=[3,3,3,4,4,5,0,-5]
Input=[4,6,8,4,12,16,1,8]
Result=[]
for input,multiple in zip(Input,Multiple):
if(multiple):
Result.append((range(multiple,input+1,abs(multiple)))[-1])
else:
Result.append(0)
print(Result)
Output:
[3, 6, 6, 4, 12, 15, 0, 5]

Two Complement's Python (with as least bits as possible)

I am trying to output the binary representation of an negative number with the least bytes available each time.
Example:
-3 -> 101
-10 -> 10110
Here's a way to do this using the .bit_length method of Python 3 integers. It also uses the string .format method to do the integer to binary string conversion. This function returns a string starting with '0' for non-negative numbers so that they can be distinguished from negative numbers.
def twos_complement(n):
m = n + 1 if n < 0 else n
bitlen = 1 + m.bit_length()
mask = (1 << bitlen) - 1
return '{0:0{1}b}'.format(n & mask, bitlen)
for i in (-10, -3, 0, 3, 10):
print('{:3}: {}'.format(i, twos_complement(i)))
print('- ' * 30)
for i in range(-15, 16):
print(i, twos_complement(i))
output
-10: 10110
-3: 101
0: 0
3: 011
10: 01010
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-15 10001
-14 10010
-13 10011
-12 10100
-11 10101
-10 10110
-9 10111
-8 1000
-7 1001
-6 1010
-5 1011
-4 100
-3 101
-2 10
-1 1
0 0
1 01
2 010
3 011
4 0100
5 0101
6 0110
7 0111
8 01000
9 01001
10 01010
11 01011
12 01100
13 01101
14 01110
15 01111
How it works
Python uses a modified form of two's complement to represent integers. Python integers have no size limit, so negative integers behave as if they have an infinite number of leading 1 bits, as explained in the Python Wiki article on Bitwise Operators.
The int.bit_length method tells us the minimum number of bits required to represent a number, we want one more bit than that so that all our non-negative numbers will start with 0 and all the negative numbers start with a 1. We need to modify that slightly to ensure that numbers of the form -2**n will only get a single leading one bit, we do that by adding 1 to all the negative numbers when calculating the bit length.
To select the bits we want we need a bit mask of the appropriate length. If the bit length is 4, we want a mask of 1111 = 2**4 - 1; we _could calculate it by using exponentiation, but it's more efficient to use bit shifting: (1 << bitlen) - 1. We then do the bitwise AND operation n & mask to select the bits we want. Fortunately, Python gives us a non-negative number when we perform such masking operations. :)
Finally we convert the resulting integer to a string using the .format method. We use a nested format specification so we can dynamically specify the correct length of the output string. In
'{0:0{1}b}'.format(n & mask, bitlen)
the first 0 of the format spec says that we're converting the value of the 0 arg in the argument list (n & mask), the :0{1}b says to convert it to binary, padded with leading zeroes if necessary, using the value of the 1 arg in the argument list (bitlen) as the total string length.
You can read about nested format specs in the Format String Syntax section of the docs:
A format_spec field can also include nested replacement fields
within it. These nested replacement fields may contain a field name,
conversion flag and format specification, but deeper nesting is not
allowed. The replacement fields within the format_spec are
substituted before the format_spec string is interpreted. This
allows the formatting of a value to be dynamically specified.

How is remainder calculated with modulo when solving fractions?

As you can see, dividing 3/7 yields a fraction. But when I do 3%7 it yields 3. How could this be? I suppose I expected an output value of 4 (because it would take 4 to complete 7) or 0, (because there is no remainder at all if you use integer division such as 3//7).
>>> 3/7
0.42857142857142855
>>> 3%7
3
>>>
Just trying to understand the depths of Python. Thanks!
Remember long division? Before you learned about fractions, 50 divided by 7 would be 7, remainder 1. The remainder is the modulus. It is the numerator of the 1/7 remaining after integer division.
Let's use different numbers for demonstration.
42 divided by 5 gives a quotient of 8 and a remainder of 2. That means 42 // 5 == 8 and 42 % 5 == 2.
3 divided by 7 gives a quotient of 0 and a remainder of 3. That means 3 // 7 == 0 and 3 % 7 == 3.
In Python, // and % represent the quotient and remainder you probably learned about before you learned about fractions and real numbers. The only (possible) difference is that // floors and % matches the sign of the right-hand operand.
modulo returns the whole number after integer (floor) division.
>>>3//7
0 # with remainder 3
>>>3%7
3
>>>2//5
2 # with remainder 1
>>>2%5
1
Re-reading your question, it occurred to me you got your terms mixed up and that may have been the underlying confusion that none of us properly answered.
But when I do 3%7 it yields 3. How could this be? I suppose I expected an output value of 4 (because it would take 4 to complete 7)
So first that issue was masked when you said 4. Since 4 is greater than 3, 3 can be subtracted a second time, leaving 1. So 1 is the output of 7 % 3.
But you asked about 3 % 7, even though you then proceeded to explain 7 % 3. 3 % 7 is less than 1 (because 7 > 3). So that is why the modulus is still 3. Integer division gives you 0, so 3 is left.
Take the first term (3) divided by the second term (7) using integer division (resulting in 0). Subtract that number from the first term (3) and you get 3. So: 3 % 7 = 3

Python: Why does right shift >> round down and where should it be used?

I've never used the >> and << operators, not because I've never needed them, but because I don't know if I could have used them, or where I should have.
100 >> 3 outputs 12 instead of 12.5. Why is this. Perhaps learning where to best use right shift will answer that implicitly, but I'm curious.
Right shift is not division
Let's look at what right-shift actually does, and it will become clear.
First, recall that a number is stored in memory as a collection of binary digits. If we have 8 bits of memory, we can store 2 as 00000010 and 5 as 00000101.
Right-shift takes those digits and shifts them to the right. For example, right-shifting our above two digits by one will give 00000001 and 00000010 respectively.
Notice that the lowest digit (right-most) is shifted off the end entirely and has no effect on the final result.
>> and << are the right and left bit shift operators, respectively. You should look at the binary representation of the numbers.
>>> bin(100)
'0b1100100'
>>> bin(12)
'0b1100'
The other answers explain the idea of bitshifting, but here's specifically what happens for 100>>3
100
128 64 32 16 8 4 2 1
0 1 1 0 0 1 0 0 = 100
100 >> 1
128 64 32 16 8 4 2 1
0 0 1 1 0 0 1 0 = 50
100 >> 2
128 64 32 16 8 4 2 1
0 0 0 1 1 0 0 1 = 25
100 >> 3
128 64 32 16 8 4 2 1
0 0 0 0 1 1 0 0 = 12
You won't often need to use it, unless you need some really quick division by 2, but even then, DON'T USE IT. it makes the code much more complicated then it needs to be, and the speed difference is unnoticeable.
The main time you'd ever need to use it would be if you're working with binary data, and you specifically need to shift the bits around. The only real use I've had for it was reading & writing ID3 tags, which stores size information in 7-bit bytes, like so:
0xxxxxxx 0xxxxxxx 0xxxxxxx 0xxxxxxx.
which would need to be put together like this:
0000xxxx xxxxxxxx xxxxxxxx xxxxxxxx
to give a normal integer in memory.
Bit shifting an integer gives another integer. For instance, the number 12 is written in binary as 0b1100. If we bit shift by 1 to the right, we get 0b110 = 6. If we bit shift by 2, we get 0b11 = 3. And lastly, if we bitshift by 3, we get 0b1 = 1 rather than 1.5. This is because the bits that are shifted beyond the register are lost.
One easy way to think of it is bitshifting to the right by N is the same as dividing by 2^N and then truncating the result.
I have read the answers above and just wanted to add a little bit more practical example, that I had seen before.
Let us assume, that you want to create a list of powers of two. So, you can do this using left shift:
n = 10
list_ = [1<<i for i in range(1, n+1)] # Where n is a maximum power.
print(list_)
# Output: [2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]
You can timeit it if you want, but I am pretty sure, that the code above is one the fastest solutions for this problem. But what I cannot understand is when you can use right shift.

Categories

Resources