Why do I get an int when I index bytes? - python

I'm trying to get the first char of a byte-string in python 3.4, but when I index it, I get an int:
>>> my_bytes = b'just a byte string'
b'just a byte string'
>>> my_bytes[0]
106
>>> type(my_bytes[0])
<class 'int'>
This seems unintuitive to me, as I was expecting to get b'j'.
I have discovered that I can get the value I expect, but it feels like a hack to me.
>>> my_bytes[0:1]
b'j'
Can someone please explain why this happens?

The bytes type is a Binary Sequence type, and is explicitly documented as containing a sequence of integers in the range 0 to 255.
From the documentation:
Bytes objects are immutable sequences of single bytes.
[...]
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256[.]
[...]
Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer, while b[0:1] will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1).
Bold emphasis mine. Note than indexing a string is a bit of an exception among the sequence types; 'abc'[0] gives you a str object of length one; str is the only sequence type that contains elements of its own type, always.
This echoes how other languages treat string data; in C the unsigned char type is also effectively an integer in the range 0-255. Many C compilers default to unsigned if you use an unqualified char type, and text is modelled as a char[] array.

Related

Why 'list(bytestring)' is displaying list of decimal values? [duplicate]

I'm trying to get the first char of a byte-string in python 3.4, but when I index it, I get an int:
>>> my_bytes = b'just a byte string'
b'just a byte string'
>>> my_bytes[0]
106
>>> type(my_bytes[0])
<class 'int'>
This seems unintuitive to me, as I was expecting to get b'j'.
I have discovered that I can get the value I expect, but it feels like a hack to me.
>>> my_bytes[0:1]
b'j'
Can someone please explain why this happens?
The bytes type is a Binary Sequence type, and is explicitly documented as containing a sequence of integers in the range 0 to 255.
From the documentation:
Bytes objects are immutable sequences of single bytes.
[...]
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256[.]
[...]
Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer, while b[0:1] will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1).
Bold emphasis mine. Note than indexing a string is a bit of an exception among the sequence types; 'abc'[0] gives you a str object of length one; str is the only sequence type that contains elements of its own type, always.
This echoes how other languages treat string data; in C the unsigned char type is also effectively an integer in the range 0-255. Many C compilers default to unsigned if you use an unqualified char type, and text is modelled as a char[] array.

str.join TypeError when decoding binary file using struct.unpack [duplicate]

I'm trying to get the first char of a byte-string in python 3.4, but when I index it, I get an int:
>>> my_bytes = b'just a byte string'
b'just a byte string'
>>> my_bytes[0]
106
>>> type(my_bytes[0])
<class 'int'>
This seems unintuitive to me, as I was expecting to get b'j'.
I have discovered that I can get the value I expect, but it feels like a hack to me.
>>> my_bytes[0:1]
b'j'
Can someone please explain why this happens?
The bytes type is a Binary Sequence type, and is explicitly documented as containing a sequence of integers in the range 0 to 255.
From the documentation:
Bytes objects are immutable sequences of single bytes.
[...]
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256[.]
[...]
Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer, while b[0:1] will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1).
Bold emphasis mine. Note than indexing a string is a bit of an exception among the sequence types; 'abc'[0] gives you a str object of length one; str is the only sequence type that contains elements of its own type, always.
This echoes how other languages treat string data; in C the unsigned char type is also effectively an integer in the range 0-255. Many C compilers default to unsigned if you use an unqualified char type, and text is modelled as a char[] array.

Converting integer to a pair of bytes produces unexpected format?

I am using python 3.8.5, and trying to convert from an integer in the range (0,65535) to a pair of bytes. I am currently using the following code:
from struct import pack
input_integer = 2111
bytes_val = voltage.to_bytes(2,'little')
output_data = struct.pack('bb',bytes_val[1],bytes_val[0])
print(output_data)
This produces the following output:
b'\x08?'
This \x08 is 8 in hex, the most significant byte, and ? is 63 in ascii. So together, the numbers add up to 2111 (8*256+63=2111). What I can't figure out is why the least significant byte is coming out in ascii instead of hex? It's very strange to me that it's in a different format than the MSB right next to it. I want it in hex for the output data, and am trying to figure out how to achieve that.
I have also tried modifying the format string in the last line to the following:
output_data = struct.pack('cc',bytes_val[1],bytes_val[0])
which produces the following error:
struct.error: char format requires a bytes object of length 1
I checked the types at each step, and it looks like bytes_val is a bytearray of length 2, but when I take one of the individual elements, say bytes_val[1], it is an integer rather than a byte array.
Any ideas?
All your observations can be verified from the docs for the bytes class:
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers
In Python strings any letters and punctuation are represented by themselves in ASCII, while any control codes by their hexadecimal value (0-31, 127). You can see this by printing ''.join(map(chr, range(128))). Bytes literals follow the same convention, except that individual byte elements are integer, i.e., output_data[0].
If you want to represent everything as hex
>>> output_data.hex()
'083f'
>>> bytes.fromhex('083f') # to recover
b'\x08?'
As of version 3.8 bytes.hex() now supports optional sep and bytes_per_sep parameters to insert separators between bytes in the hex output.
>>> b'abcdef'.hex(' ', 2)
'6162 6364 6566'

Python array[0:1] not the same as array[0]

I'm using Python to split a string of 2 bytes b'\x01\x00'. The string of bytes is stored in a variable called flags.
Why when I say flags[0] do I get b'\x00' but when I say flags[0:1] I get the expected answer of b'\x01'.
Should both of these operations not be exactly the same?
What I did:
>>> flags = b'\x01\x00'
>>> flags[0:1]
b'\x01'
>>> bytes(flags[0])
b'\x00'
In Python 3, bytes is a sequence type containing integers (each in the range 0 - 255) so indexing to a specific index gives you an integer.
And just like slicing a list produces a new list object for the slice, so does slicing a bytes object produce a new bytes instance. And the representation of a bytes instance tries to show you a b'...' literal syntax with the integers represented as either printable ASCII characters or an applicable escape sequence when the byte isn't printable. All this is great for developing but may hide the fact that bytes are really a sequence of integers.
However, you will still get the same piece of information; flags[0:1] is a one-byte long bytes value with the \x01 byte in it, and flags[0] will give you the integer 1:
>>> flags = b'\x01\x00'
>>> flags[0]
1
>>> flags[0:1]
b'\x01'
What you really did was not use flags[0], you used bytes(flags[0]) instead. Passing in a single integer to the bytes() type creates a new bytes object of the specified length, pre-filled with \x00 bytes:
>>> flags[0]
1
>>> bytes(1)
b'\x00'
Since flags[0] produces the integer 1, you told bytes() to return a new bytes value of length 1, filled with \x00 bytes.
From the bytes documentation:
Bytes objects are immutable sequences of single bytes.
[...]
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256.
[...]
In addition to the literal forms, bytes objects can be created in a number of other ways:
A zero-filled bytes object of a specified length: bytes(10)
Bold emphasis mine.
If you wanted to create a new bytes object with that one byte in it, you'll need to put the integer value in a list first:
>>> bytes([flags[0]])
b'\x01'
Yes, you should get the same thing. In both cases b'\x01'. flags is probably not what you think it is.
>>> flags = b'\x01\x00'
>>> flags[0]
'\x01'
>>> flags[0:1]
'\x01'

How do you convert a python sequence item to an integer

I need to convert the elements of a python2.7 bytearray() or string or bytes() into integers for processing. In many languages(ie C, etc) bytes and 'chars' are more or less 8 bit ints that you an perform math operations on. How can I convince python to let me use (appropriate) bytearrays or strings interchangebly?
Consider toHex(stringlikeThing):
zerof = '0123456789ABCDEF'
def toHex(strg):
ba = bytearray(len(strg)*2)
for xx in range(len(strg)):
vv = ord(strg[xx])
ba[xx*2] = zerof[vv>>4]
ba[xx*2+1] = zerof[vv&0xf]
return ba
which should take a string like thing (ie bytearray or string) and make a printable string like thing of hexadecimal text. It converts "string" to the hex ASCII:
>>> toHex("string")
bytearray(b'737472696E67')
However, when given a bytearray:
>>> nobCom.toHex(bytearray("bytes"))
EX ord() expected string of length 1, but int found: 0 bytes
The ord() in the 'for' loop gets strg[xx], an item of a bytearray, which seems to be an integer (Whereas an item of a str is a single element string)
So ord() wants a char (single element string) not an int.
Is there some method or function that takes an argument that is a byte, char, small int, one element string and returns it's value?
Of course you could check the type(strg[xx]) and handle the cases laboriously.
The unvoiced question is: Why (what is the reasoning) for Python to be so picky about the difference between a byte and char (normal or unicode) (ie single element string)?
When you index a bytearray object in python, you get an integer. This integer is the code for the corresponding character in the bytearray, or in other words, the very thing that the ord function would return.
There is no method in python that takes a byte, character, small integer, or one element string and returns it's value in python. Making such a method would be simple however.
def toInt(x):
return x if type(x) == int else ord(x)

Categories

Resources