python string every two chars to byte - do it fast

python string every two chars to byte - do it fast - python

Got a binary blob string like:
input = "AB02CF4AFF"
Every pair "AB", "02", "CF", "4A", "FF" constitute a byte.
I'm doing this:
data = StringIO()
for j in range(0, len(input)/2):
bit = input[j*2:j*2+2]
data.write('%c' % int(bit,16))
data.seek(0)
Works ok, but with large binary blobs this becomes unacceptable slow and sometimes event throws a MemoryError.
struct.unpack comes to mind, but no luck thus far.
Any way to speed this up?

Use binascii.unhexlify:
>>> import binascii
>>> binascii.unhexlify('AB02CF4AFF')
b'\xab\x02\xcfJ\xff'
(In Python 2 you can decode with the hex codec but this isn't portable to Python 3.)

Give input.decode('hex') a try :)
Always a good idea to use built-in solutions

How about something like this?
def chrToInt(c):
if c >= '0' and c <= '9':
return int(ord(c) - ord('0'))
elif c >= 'A' and c <= 'F':
return int(ord(c) - ord('A')) + 10
else:
# invalid hex character, throw an exception or something here
return None
def hexToBytes(input):
bytes = []
for i in range(0, len(input) - 1, 2):
val = (chrToInt(input[i]) * 16) + chrToInt(input[i + 1])
bytes.append(val)
return bytes
print hexToBytes("AB02CF4AFF")
You could speed it up quite a bit by making chrToInt branchless by using binary operations, and you could also modify hexToBytes to say exactly how many characters it should read if you decide you want to use something bigger than bytes (so it returns it in groups of 4 for a short or 8 for an int).

Related

I want to save a string of 0's and 1's as bits to a file in Python, so it'll take less space [duplicate]

I'd simply like to convert a base-2 binary number string into an int, something like this:
>>> '11111111'.fromBinaryToInt()
255
Is there a way to do this in Python?

You use the built-in int() function, and pass it the base of the input number, i.e. 2 for a binary number:
>>> int('11111111', 2)
255
Here is documentation for Python 2, and for Python 3.

Just type 0b11111111 in python interactive interface:
>>> 0b11111111
255

Another way to do this is by using the bitstring module:
>>> from bitstring import BitArray
>>> b = BitArray(bin='11111111')
>>> b.uint
255
Note that the unsigned integer (uint) is different from the signed integer (int):
>>> b.int
-1
Your question is really asking for the unsigned integer representation; this is an important distinction.
The bitstring module isn't a requirement, but it has lots of performant methods for turning input into and from bits into other forms, as well as manipulating them.

Using int with base is the right way to go. I used to do this before I found int takes base also. It is basically a reduce applied on a list comprehension of the primitive way of converting binary to decimal ( e.g. 110 = 2**0 * 0 + 2 ** 1 * 1 + 2 ** 2 * 1)
add = lambda x,y : x + y
reduce(add, [int(x) * 2 ** y for x, y in zip(list(binstr), range(len(binstr) - 1, -1, -1))])

If you wanna know what is happening behind the scene, then here you go.
class Binary():
def __init__(self, binNumber):
self._binNumber = binNumber
self._binNumber = self._binNumber[::-1]
self._binNumber = list(self._binNumber)
self._x = [1]
self._count = 1
self._change = 2
self._amount = 0
print(self._ToNumber(self._binNumber))
def _ToNumber(self, number):
self._number = number
for i in range (1, len (self._number)):
self._total = self._count * self._change
self._count = self._total
self._x.append(self._count)
self._deep = zip(self._number, self._x)
for self._k, self._v in self._deep:
if self._k == '1':
self._amount += self._v
return self._amount
mo = Binary('101111110')

Here's another concise way to do it not mentioned in any of the above answers:
>>> eval('0b' + '11111111')
255
Admittedly, it's probably not very fast, and it's a very very bad idea if the string is coming from something you don't have control over that could be malicious (such as user input), but for completeness' sake, it does work.

A recursive Python implementation:
def int2bin(n):
return int2bin(n >> 1) + [n & 1] if n > 1 else [1]

If you are using python3.6 or later you can use f-string to do the
conversion:
Binary to decimal:
>>> print(f'{0b1011010:#0}')
90
>>> bin_2_decimal = int(f'{0b1011010:#0}')
>>> bin_2_decimal
90
binary to octal hexa and etc.
>>> f'{0b1011010:#o}'
'0o132' # octal
>>> f'{0b1011010:#x}'
'0x5a' # hexadecimal
>>> f'{0b1011010:#0}'
'90' # decimal
Pay attention to 2 piece of information separated by colon.
In this way, you can convert between {binary, octal, hexadecimal, decimal} to {binary, octal, hexadecimal, decimal} by changing right side of colon[:]
:#b -> converts to binary
:#o -> converts to octal
:#x -> converts to hexadecimal
:#0 -> converts to decimal as above example
Try changing left side of colon to have octal/hexadecimal/decimal.

For large matrix (10**5 rows and up) it is better to use a vectorized matmult. Pass in all rows and cols in one shot. It is extremely fast. There is no looping in python here. I originally designed it for converting many binary columns like 0/1 for like 10 different genre columns in MovieLens into a single integer for each example row.
def BitsToIntAFast(bits):
m,n = bits.shape
a = 2**np.arange(n)[::-1] # -1 reverses array of powers of 2 of same length as bits
return bits # a

For the record to go back and forth in basic python3:
a = 10
bin(a)
# '0b1010'
int(bin(a), 2)
# 10
eval(bin(a))
# 10

Converting ascii text to to number and back to ascii

I am creating a python code which has two functions
textToNumber(t)
numberToText(n)
textToNumber(t) function takes a plain text parameter ('Hello world') and converts it to a very big number by considering ASCII values of each letter. The function looks like this:
def textToNumber (txt):
text_number = 0
for letter in txt:
text_number = (text_number * 256) + ord(letter)
return text_number
numberToText(n) takes a number and converts it to its corresponding plain text. This function is exactly opposite to the first function. It looks like this:
def numberToText (nm):
n = nm
number_text = ""
while n > 0:
part_n = int(n) & 255
number_text = chr(part_n) + number_text
n = n - part_n
n = n / 256
return number_text
So, when we use the second function within the first, it should give us the original text back. The function works fine with a small text, but gives gibberish when the text is big. I think Python has no constraint over the size of variables as long as our machine has the space. So, why does this happen? How do I solve it?
Error output:
>>> numberToText(textToNumber('Hello world'))
'Hello x\x00\x00\x00d'

Use // instead of / to get integer division.
Otherwise you get floating point numbers, and they don't have as much precision as large integers.

Encoding in Lua like Python(ord function)

I want to translate below code segment in Lua:
def toLong(s):
ls = [ord(i) for i in s]
l = len(ls) -1
sum = 0
for i, v in enumerate(ls):
sum += v*(256**(l-i))
return sum
print(toLong("\x00\x00\x01f\xd3d\x80X"))
it prints the original number: 1541144871000

This one works with lua 5.2 if you install bigint:
local bigint = require "bigint"
function toLong(s)
ret = bigint:new(0)
for i=1,string.len(s),1 do
-- (leftshift(8) is just like times-256, but faster)
ret = ret:leftshift(8) + bigint:new(string.byte(s, i))
end
return ret
end
Prior to lua 5.2, the "\xAB"-style syntax wasn't supported, but you could do decimals like "\65" for an ASCII capital A.
BTW, you can do this without bigint like:
function numLong(s)
ret = 0
for i=1,string.len(s),1 do
ret = (ret * 256) + string.byte(s,i)
end
return ret
end
The big difference is that bigint will represent arbitrarily large numbers and the normal number value is a float by default in Lua and has a precision limit on the number of bits that are actually usable (though on my machine, the two representations come out the same when I tested your specific case).
That said: if you need arbitrarily-large number representation, use bigint or go insane.
Oh, BTW: You do realize you're using big-endian (or "network byte order") in calculating your numbers, right? Do be careful swapping between char strings and uint64s (especially if your target machine is little-endian like an Intel box)...

Two's complement function outputs wrong result for -1

I am generating the input for an FPGA program to use the trapezoidal integration method. Basically, the functions of interest here are the invert() and twos_comp() functions; the rest is just testing (creating a square wave signal, and then iterating through and converting it into two's complement).
signals = []
bit_signals = []
def invert(bit_val):
new_val = []
for i in bit_val:
new_val.append(str(int(not(int(i)))))
return ''.join(new_val)
def twos_comp(val):
if val < 0:
bin_val = format(val, '08b')[1:]
return format(int(invert(bin_val),2) + int('1', 2), '08b')
else:
bin_val = format(val, '08b')[1:]
return bin_val
x = 0
signal = 1
while x <= 25:
if x % 2 == 0:
signal*=-1
signals.append(signal)
x+=1
print(signals)
for i in signals:
bit_signals.append(twos_comp(i))
print(bit_signals)
The problem here is that this outputs the two's complement for 1 as 01111111, not 1111111. The output of invert() seems to be correct, the output for twos_comp() for positive numbers seems to be correct, and the generation of the signal also appears to be correct, so I think it must be something with the line
return format(int(invert(bin_val),2) + int('1', 2), '08b')
but looking around on SO and google this is how other people have handled adding in binary.
Please note that all inputs to twos_comp() will be 8 bits. Any help would be appreciated as to why this is not working. There are no outright errors, just an incorrect output.
You can run the code here.

Step through the values when val is -1:
>>> format(-1, '08b')
'-0000001'
You may have already spotted the error—08b means 8 characters wide, not 8 digits. For a negative number, the - takes up 1 of the characters, so you only get 8 digits. But in case it isn't obvious why that's a problem, let's keep going:
>>> format(val, '08b')[1:]
'0000001'
>>> invert('0000001')
'1111110'
>>> int(invert('0000001'), 2)
126
>>> int('1', 2) # BTW, why do you need this instead of just 1, exactly?
1
>>> 126 + 1
127
>>> format(127, '08b')
01111111
If you want a hacky solution (and I suspect you do, since you're already going back and forth between strings and numbers all over the place), just do this:
bin_val = format(val, '09b')[-8:]
That will work for both positive and negative numbers.

Unicode recursion for a number

I want to get any number. e.g: 14892. And return it as 25903
(according to each character's unicode value)
This is what I have so far:
def convert(n):
if len(n)>0:
x = (chr(ord(str((int(n[0])+1)))))
return x

def convert(n):
return int(''.join([str(int(elem)+1)[-1] for elem in str(n)]))
You could use a list comprehension.

To perform this transformation, you need to get each digit and add one to it, with 10 wrapping around to 0. The simple way to do that wrapping is to use the modulus operator. We can also use integer division and modulus to extract the digits, and we can do both operations using the built-in divmod function. We store the modified digits in a list, since the simple way to combine the digits back into a single number needs the digits in reverse order.
def convert(n):
a = []
while n:
n, r = divmod(n, 10)
a.append((r + 1) % 10)
n = 0
for u in reversed(a):
n = 10 * n + u
return n
# Test
print(convert(14892))
output
25903
That algorithm is fairly close to the usual way to do this transformation in traditional languages. However, in Python, it's actually faster to do this sort of thing using strings, since the str and int constructors can do most of their work at C speed. The resulting code is a little more cryptic, but much more compact.
def convert(n):
return int(''.join([str((int(c) + 1) % 10) for c in str(n)]))

You could convert the number to a string, use the translate function to swap out the numbers, and convert back to integer again:
>>> t=str.maketrans('1234567890','2345678901')
>>> x = 14892
>>> y = int(str(x).translate(t))
>>> y
25903

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python string every two chars to byte - do it fast - python

Use binascii.unhexlify: >>> import binascii >>> binascii.unhexlify('AB02CF4AFF') b'\xab\x02\xcfJ\xff' (In Python 2 you can decode with the hex codec but this isn't portable to Python 3.)

Give input.decode('hex') a try :) Always a good idea to use built-in solutions

Related

I want to save a string of 0's and 1's as bits to a file in Python, so it'll take less space [duplicate]

Converting ascii text to to number and back to ascii

Encoding in Lua like Python(ord function)

Two's complement function outputs wrong result for -1

Unicode recursion for a number

Categories

Resources