Converting string to binary then xor binary - python

So I am trying to convert a string to binary then xor the binary by using the following methods
def string_to_binary(s):
return ' '.join(map(bin,bytearray(s,encoding='utf-8')))
def xor_bin(a,b):
return int(a,2) ^ int(b,2)
When I try and run the xor_bin function I get the following error:
Exception has occurred: exceptions.ValueError
invalid literal for int() with base 2: '0b1100010 0b1111001 0b1100101 0b1100101 0b1100101'
I can't see what's wrong here.

bin is bad here; it doesn't pad out to eight digits (so you'll lose data alignment whenever the high bit is a 0 and misinterpret all bits to the left of that loss as being lower magnitude than they should be), and it adds a 0b prefix that you don't want. str.format can fix both issues, by zero padding and omitting the 0b prefix (I also removed the space in the joiner string, since you don't want spaces in the result):
def string_to_binary(s):
return ''.join(map('{:08b}'.format, bytearray(s, encoding='utf-8')))
With that, string_to_binary('byeee') gets you '0110001001111001011001010110010101100101' which is what you want, as opposed to '0b1100010 0b1111001 0b1100101 0b1100101 0b1100101' which is obviously not a (single) valid base-2 integer.

Your question is unclear because you don't show how the two functions you defined where being used when the error occurred — therefore this answer is a guess.
You can convert a binary string representation of an integer into a Python int, (which are stored internally as binary values) by simply using passing it to the int() function — as you're doing in the xor_bin() function. Once you have two int values, you can xor them "in binary" by simply using the ^ operator — which again, you seem to know.
This means means to xor the binary string representations of two integers and convert the result back into a binary string representation could be done like this you one of your functions just as it is. Here's what I mean:
def xor_bin(a, b):
return int(a, 2) ^ int(b, 2)
s1 = '0b11000101111001110010111001011100101'
s2 = '0b00000000000000000000000000001111111'
# ---------------------------------------
# '0b11000101111001110010111001010011010' expected result of xoring them
result = xor_bin(s1, s2)
print bin(result) # -> 0b11000101111001110010111001010011010

Related

Comparing bit representation of objects in Python

I am watching a video named The Mighty Dictionary which has the following code:
k1 = bits(hash('Monty'))
k2 = bits(hash('Money'))
diff = ('^' [a==b] for a,b in zip(k1,k2))
print(k1,k2,''.join(diff))
As I understand, bits is not a built-in method in Python, but his own written method which is similar to `format(x, 'b'), or is it something that existed in Python 2? (I've never wrote code in Python 2)
I've tried to accomplish the same, get the bits representation of the strings and check where the bits differ:
k1 = format(hash('Monty'),'b')
k2 = format(hash('Money'),'b')
diff = ('^ ' [a==b] for a,b in zip(k1,k2))
print(k1,'\n',k2,'\n',''.join(diff))
I do get the expected result:
UPDATED
Had to shift the first line by 1 space to match the symbols
110111010100001110100101100000100110111111110001001101111000110
-1000001111101001011101001010101101000111001011011000011110100
^ ^^^ ^ ^^ ^^^ ^^^^^^^ ^ ^^^^^ ^^ ^^ ^^^^^^^ ^ ^ ^^^
Also, the lengths of the bits are not the same, whereas I understand that no matter the string, it will take the same, in my case, 64 bits? But its 63 and 62.
print(len(format(hash('Monty'),'b')))
print(len(format(hash('Money'),'b')))
63
62
So, to sum up my question:
Is bits a built-in method in Python2?
Is the recommended way to compare bit representation of an object is using the following:
def fn():
pass
print(format(hash(fn),'b'))
# -111111111111111111111111111111111101111000110001011100000000101
Shouldn't all objects have the same length of bits that represent the object depending on the processor? If I run the following code several times I get these results:
def fn():
pass
def nf():
pass
print(format(hash(fn),'b'))
print(format(hash(nf),'b'))
# first time
# 10001001010011010111110000100
# -111111111111111111111111111111111101110110101100101000001000001
# second time
# 10001001010011010111111101010
# 10001001010011010111110000100
# third time
# 10001001010011010111101010001
# -111111111111111111111111111111111101110110101100101000001000001
No, bits is not a built-in function in Python 2 or Python 3.
By default format() doesn't show leading zeroes. Use the format string 032b to format the number in a 32-character field with leading zeroes.
>>> format(hash('Monty'), '032b')
'1001000100011010010110101101101011000010101011100110001010001'
Another problem you're running into is that hash() can return negative numbers. Maybe this couldn't happen in Python 2, or his bits() function shows the two's complement bits of the number. You can do this by normalizing the input:
def bits(n):
if n < 0:
n = 2**32 + n
return format(n, '032b')
Every time you run the code, you define new fn and nf functions. Different functions will not necessarily have the same hash code, even if they have the same name.
If you don't redefine the functions, you should get the same hash codes each time.
Hashing strings and numbers just depends on the contents, but hashing more complex objects depends on the specific instance.

Discrepancy of floating representation

In this SO answer, an user provided this short function that returns the binary representation of a floating-point value:
import struct
import sys
def float_to_bin(f):
""" Convert a float into a binary string. """
if sys.version_info >= (3,): # Python 3?
ba = struct.pack('>d', f)
else:
ba = bytearray(struct.pack('>d', f)) # Convert str result.
s = ''.join('{:08b}'.format(b) for b in ba)
return s[:-1].lstrip('0') + s[0] # Strip but one leading zero.
When I call this function with the value 7/3-4/3 (in Python 3.5), or with 1.0000000000000002, I get this binary representation :
11111111110000000000000000000000000000000000000000000000000000
Using this online tool, with the same values, I get this binary representation :
11111111110000000000000000000000000000000000000000000000000001
Why is there a difference between these two representations ?
Why is float_to_bin returning the floating representation of 1.0 for 1.0000000000000002 ?
Is there some precision loss in float_to_bin induced somewhere (maybe when calling struct.pack) ?
The logic in that function to "strip but one leading zero" is completely wrong, and is removing significant digits from the result.
The correct representation of the value is neither of the values mentioned in your question; it is:
0011111111110000000000000000000000000000000000000000000000000001
which can be retrieved by replacing the last line of that function with:
return s
or by using the simpler implementation:
def float_to_bin(f):
[d] = struct.unpack(">Q", struct.pack(">d", f))
return '{:064b}'.format(d)
Leading and trailing zeroes in floating-point values are significant, and cannot be removed without altering the value.

Python define bitwise

I have a function that accepts 'data' as a parameter. Being new to python I wasn't really sure that that was even a type.
I noticed when printing something of that type it would be
b'h'
if I encoded the letter h. Which dosen't make a ton of sense to me. Is there a way to define bits in python, such as 1 or 0. I guess b'h' must be in hex? Is there a way for me to simply define an eight bit string
bits1 = 10100000
You're conflating a number of unrelated things.
First of all, (in Python 3), quoted literals prefixed with b are of type bytes -- that means a string of raw byte values. Example:
x = b'abc'
print(type(x)) # will output `<class 'bytes'>`
This is in contrast to the str type, which is a (Unicode) string.
Integer literals can be expressed in binary using an 0b prefix, e.g.
y = 0b10100000
print(y) # Will output 160
For what I know, 'data' is not a type. Your function (probably) accepts anything you pass to it, regardless of its type.
Now, b'h' means "the number (int) whose binary sequence maps to the char ´h´", this is not hexadecimal, but a number with possibly 8 bits (1 byte, which is the standard size for int and char).
The ASCII code for ´h´ is 104 (decimal), written in binary that would be b'\b01101000', or in hexa b'\x68'.
So, here is the answer I think you are looking for: if you want to code an 8-bit int from its binary representation just type b'\b01101000' (for 104). I would recommend to use hexa instead, to make it more compact and readable. In hexa, every four bits make a symbol from 0 to f, and the symbols can be concatenated every four bits to form a larger number. So the bit sequence 01101000 is written b'\b0110\b1000' or b'\x6\x8', which can be written as b'\x68'. The preceding b, before the quote marks tells python to interpret the string as a binary sequence expressed in the base defined by \b or \x (or \d for decimal), instead of using escape characters.

How to compute a double precision float score from the first 8 bytes of a string in Python?

Trying to get a double-precision floating point score from a UTF-8 encoded string object in Python. The idea is to grab the first 8 bytes of the string and create a float, so that the strings, ordered by their score, would be ordered lexicographically according to their first 8 bytes (or possibly their first 63 bits, after forcing them all to be positive to avoid sign errors).
For example:
get_score(u'aaaaaaa') < get_score(u'aaaaaaab') < get_score(u'zzzzzzzz')
I have tried to compute the score in an integer using bit-shift-left and XOR, but I am not sure of how to translate that into a float value. I am also not sure if there is a better way to do this.
How should the score for a string be computed so the condition I specified before is met?
Edit: The string object is UTF-8 encoded (as per #Bakuriu's commment).
float won't give you 64 bits of precision. Use integers instead.
def get_score(s):
return struct.unpack('>Q', (u'\0\0\0\0\0\0\0\0' + s[:8])[-8:])[0]
In Python 3:
def get_score(s):
return struct.unpack('>Q', ('\0\0\0\0\0\0\0\0' + s[:8])[-8:].encode('ascii', 'error'))[0]
EDIT:
For floats, with 6 characters:
def get_score(s):
return struct.unpack('>d', (u'\0\1' + (u'\0\0\0\0\0\0\0\0' + s[:6])[-6:]).encode('ascii', 'error'))[0]
You will need to setup the entire alphabet and do the conversion by hand, since conversions to base > 36 are not built in, in order to do that you only need to define the complete alphabet to use. If it was an ascii string for instance you would create a conversion to a long in base 256 from the input string using all the ascii table as an alphabet.
You have an example of the full functions to do it here: string to base 62 number
Also you don't need to worry about negative-positive numbers when doing this, since the encoding of the string with the first character in the alphabet will yield the minimum possible number in the representation, which is the negative value with the highest absolute value, in your case -2**63 which is the correct value and allows you to use < > against it.
Hope it helps!

Convert binary string to binary literal

I'm using Python 3.2.2.
I'm looking for a function that converts a binary string, e.g. '0b1010' or '1010', into a binary literal, e.g. 0b1010 (not a string or a decimal integer literal).
It's an easy matter to roll-my-own, but I prefer to use either a standard function or one that is well-established: I don't want to 're-invent the wheel.'
Regardless, I'm happy look at any efficient algorithms y'all might have.
The string is a literal.
3>> bin(int('0b1010', 2))
'0b1010'
3>> bin(int('1010', 2))
'0b1010'
3>> 0b1010
10
3>> int('0b1010', 2)
10
Try the following code:
#!python3
def fn(s, base=10):
prefix = s[0:2]
if prefix == '0x':
base = 16
elif prefix == '0b':
base = 2
return bin(int(s, base))
print(fn('15'))
print(fn('0xF'))
print(fn('0b1111'))
If you are sure you have s = "'0b010111'" and you only want to get 010111, then you can just slice the middle like:
s = s[2:-1]
i.e. from index 2 to the one before the last.
But as Ignacio and Antti wrote, numbers are abstract. The 0b11 is one of the string representations of the number 3 the same ways as 3 is another string representation of the number 3.
The repr() always returns a string. The only thing that can be done with the repr result is to strip the apostrophes -- because the repr adds the apostrophes to the string representation to emphasize it is the string representation. If you want a binary representation of a number (as a string without apostrophes) then bin() is the ultimate answer.

Categories

Resources