I need to convert the elements of a python2.7 bytearray() or string or bytes() into integers for processing. In many languages(ie C, etc) bytes and 'chars' are more or less 8 bit ints that you an perform math operations on. How can I convince python to let me use (appropriate) bytearrays or strings interchangebly?
Consider toHex(stringlikeThing):
zerof = '0123456789ABCDEF'
def toHex(strg):
ba = bytearray(len(strg)*2)
for xx in range(len(strg)):
vv = ord(strg[xx])
ba[xx*2] = zerof[vv>>4]
ba[xx*2+1] = zerof[vv&0xf]
return ba
which should take a string like thing (ie bytearray or string) and make a printable string like thing of hexadecimal text. It converts "string" to the hex ASCII:
>>> toHex("string")
bytearray(b'737472696E67')
However, when given a bytearray:
>>> nobCom.toHex(bytearray("bytes"))
EX ord() expected string of length 1, but int found: 0 bytes
The ord() in the 'for' loop gets strg[xx], an item of a bytearray, which seems to be an integer (Whereas an item of a str is a single element string)
So ord() wants a char (single element string) not an int.
Is there some method or function that takes an argument that is a byte, char, small int, one element string and returns it's value?
Of course you could check the type(strg[xx]) and handle the cases laboriously.
The unvoiced question is: Why (what is the reasoning) for Python to be so picky about the difference between a byte and char (normal or unicode) (ie single element string)?
When you index a bytearray object in python, you get an integer. This integer is the code for the corresponding character in the bytearray, or in other words, the very thing that the ord function would return.
There is no method in python that takes a byte, character, small integer, or one element string and returns it's value in python. Making such a method would be simple however.
def toInt(x):
return x if type(x) == int else ord(x)
Related
I'm trying to get the first char of a byte-string in python 3.4, but when I index it, I get an int:
>>> my_bytes = b'just a byte string'
b'just a byte string'
>>> my_bytes[0]
106
>>> type(my_bytes[0])
<class 'int'>
This seems unintuitive to me, as I was expecting to get b'j'.
I have discovered that I can get the value I expect, but it feels like a hack to me.
>>> my_bytes[0:1]
b'j'
Can someone please explain why this happens?
The bytes type is a Binary Sequence type, and is explicitly documented as containing a sequence of integers in the range 0 to 255.
From the documentation:
Bytes objects are immutable sequences of single bytes.
[...]
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256[.]
[...]
Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer, while b[0:1] will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1).
Bold emphasis mine. Note than indexing a string is a bit of an exception among the sequence types; 'abc'[0] gives you a str object of length one; str is the only sequence type that contains elements of its own type, always.
This echoes how other languages treat string data; in C the unsigned char type is also effectively an integer in the range 0-255. Many C compilers default to unsigned if you use an unqualified char type, and text is modelled as a char[] array.
I'm having problem in the ord() def, my code is given below:
def write(data):
for block_idx in list(range(0, len(data), 10)):
chksum = 0
for byte_idx in list(range(block_idx, 2)):
chksum += ord(data[byte_idx])
write(b'123')
TypeError: ord() expected string of length 1, but int found
You're providing b'123' to the function, which is bytes, that is why when you're trying to get the byte of an specific position you're getting an int. And when you're passing an int to the ord(), the exception is thrown, because:
Given string of length 1, the ord() function returns
an integer representing the Unicode code point of the character when
the argument is a Unicode object, or the value of the byte when the
argument is an 8-bit string.
Use:
chksum += ord(chr(data[byte_idx]))
As the Python bytes documentation states:
While bytes literals and representations are based on ASCII text,
bytes objects actually behave like immutable sequences of integers
When b"123"[0] is called it returns an int which is an invalid argument to the builtin ord function.
You don't even need to call ord in this case:
chksum += data[byte_idx]
I'm trying to get the first char of a byte-string in python 3.4, but when I index it, I get an int:
>>> my_bytes = b'just a byte string'
b'just a byte string'
>>> my_bytes[0]
106
>>> type(my_bytes[0])
<class 'int'>
This seems unintuitive to me, as I was expecting to get b'j'.
I have discovered that I can get the value I expect, but it feels like a hack to me.
>>> my_bytes[0:1]
b'j'
Can someone please explain why this happens?
The bytes type is a Binary Sequence type, and is explicitly documented as containing a sequence of integers in the range 0 to 255.
From the documentation:
Bytes objects are immutable sequences of single bytes.
[...]
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256[.]
[...]
Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer, while b[0:1] will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1).
Bold emphasis mine. Note than indexing a string is a bit of an exception among the sequence types; 'abc'[0] gives you a str object of length one; str is the only sequence type that contains elements of its own type, always.
This echoes how other languages treat string data; in C the unsigned char type is also effectively an integer in the range 0-255. Many C compilers default to unsigned if you use an unqualified char type, and text is modelled as a char[] array.
I'm trying to get the first char of a byte-string in python 3.4, but when I index it, I get an int:
>>> my_bytes = b'just a byte string'
b'just a byte string'
>>> my_bytes[0]
106
>>> type(my_bytes[0])
<class 'int'>
This seems unintuitive to me, as I was expecting to get b'j'.
I have discovered that I can get the value I expect, but it feels like a hack to me.
>>> my_bytes[0:1]
b'j'
Can someone please explain why this happens?
The bytes type is a Binary Sequence type, and is explicitly documented as containing a sequence of integers in the range 0 to 255.
From the documentation:
Bytes objects are immutable sequences of single bytes.
[...]
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256[.]
[...]
Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer, while b[0:1] will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1).
Bold emphasis mine. Note than indexing a string is a bit of an exception among the sequence types; 'abc'[0] gives you a str object of length one; str is the only sequence type that contains elements of its own type, always.
This echoes how other languages treat string data; in C the unsigned char type is also effectively an integer in the range 0-255. Many C compilers default to unsigned if you use an unqualified char type, and text is modelled as a char[] array.
I have an array of integer, and I need to transform it into string.
[1,2,3,4] => '\x01\x02\x03\x04'
What function can I use for it? I tried with str(), but it returns '1234'.
string = ""
for val in [1,2,3,4]:
string += str(val) # '1234'
''.join([chr(x) for x in [1, 2, 3, 4]])
You can convert a list of small numbers directly to a bytearray:
If it is an iterable, it must be an iterable of integers in the range 0 <= x < 256, which are used as the initial contents of the array.
And you can convert a bytearray directly to a str (2.x) or bytes (3.x, or 2.6+).
In fact, in 3.x, you can even convert the list straight to bytes without going through bytearray:
constructor arguments are interpreted as for bytearray().
So:
str(bytearray([1,2,3,4])) # 2.6-2.7 only
bytes(bytearray([1,2,3,4])) # 2.6-2.7, 3.0+
bytes([1,2,3,4]) # 3.0+ only
If you really want a string in 3.x, as opposed to a byte string, you need to decode it:
bytes(bytearray([1,2,3,4])).decode('ascii')
See Binary Sequence Types in the docs for more details.
Simple solution
digits = [1,2,3,4]
print(''.join(map(str,digits)))