I'm using Python to split a string of 2 bytes b'\x01\x00'. The string of bytes is stored in a variable called flags.
Why when I say flags[0] do I get b'\x00' but when I say flags[0:1] I get the expected answer of b'\x01'.
Should both of these operations not be exactly the same?
What I did:
>>> flags = b'\x01\x00'
>>> flags[0:1]
b'\x01'
>>> bytes(flags[0])
b'\x00'
In Python 3, bytes is a sequence type containing integers (each in the range 0 - 255) so indexing to a specific index gives you an integer.
And just like slicing a list produces a new list object for the slice, so does slicing a bytes object produce a new bytes instance. And the representation of a bytes instance tries to show you a b'...' literal syntax with the integers represented as either printable ASCII characters or an applicable escape sequence when the byte isn't printable. All this is great for developing but may hide the fact that bytes are really a sequence of integers.
However, you will still get the same piece of information; flags[0:1] is a one-byte long bytes value with the \x01 byte in it, and flags[0] will give you the integer 1:
>>> flags = b'\x01\x00'
>>> flags[0]
1
>>> flags[0:1]
b'\x01'
What you really did was not use flags[0], you used bytes(flags[0]) instead. Passing in a single integer to the bytes() type creates a new bytes object of the specified length, pre-filled with \x00 bytes:
>>> flags[0]
1
>>> bytes(1)
b'\x00'
Since flags[0] produces the integer 1, you told bytes() to return a new bytes value of length 1, filled with \x00 bytes.
From the bytes documentation:
Bytes objects are immutable sequences of single bytes.
[...]
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256.
[...]
In addition to the literal forms, bytes objects can be created in a number of other ways:
A zero-filled bytes object of a specified length: bytes(10)
Bold emphasis mine.
If you wanted to create a new bytes object with that one byte in it, you'll need to put the integer value in a list first:
>>> bytes([flags[0]])
b'\x01'
Yes, you should get the same thing. In both cases b'\x01'. flags is probably not what you think it is.
>>> flags = b'\x01\x00'
>>> flags[0]
'\x01'
>>> flags[0:1]
'\x01'
Related
I'm trying to get the first char of a byte-string in python 3.4, but when I index it, I get an int:
>>> my_bytes = b'just a byte string'
b'just a byte string'
>>> my_bytes[0]
106
>>> type(my_bytes[0])
<class 'int'>
This seems unintuitive to me, as I was expecting to get b'j'.
I have discovered that I can get the value I expect, but it feels like a hack to me.
>>> my_bytes[0:1]
b'j'
Can someone please explain why this happens?
The bytes type is a Binary Sequence type, and is explicitly documented as containing a sequence of integers in the range 0 to 255.
From the documentation:
Bytes objects are immutable sequences of single bytes.
[...]
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256[.]
[...]
Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer, while b[0:1] will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1).
Bold emphasis mine. Note than indexing a string is a bit of an exception among the sequence types; 'abc'[0] gives you a str object of length one; str is the only sequence type that contains elements of its own type, always.
This echoes how other languages treat string data; in C the unsigned char type is also effectively an integer in the range 0-255. Many C compilers default to unsigned if you use an unqualified char type, and text is modelled as a char[] array.
I'm trying to get the first char of a byte-string in python 3.4, but when I index it, I get an int:
>>> my_bytes = b'just a byte string'
b'just a byte string'
>>> my_bytes[0]
106
>>> type(my_bytes[0])
<class 'int'>
This seems unintuitive to me, as I was expecting to get b'j'.
I have discovered that I can get the value I expect, but it feels like a hack to me.
>>> my_bytes[0:1]
b'j'
Can someone please explain why this happens?
The bytes type is a Binary Sequence type, and is explicitly documented as containing a sequence of integers in the range 0 to 255.
From the documentation:
Bytes objects are immutable sequences of single bytes.
[...]
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256[.]
[...]
Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer, while b[0:1] will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1).
Bold emphasis mine. Note than indexing a string is a bit of an exception among the sequence types; 'abc'[0] gives you a str object of length one; str is the only sequence type that contains elements of its own type, always.
This echoes how other languages treat string data; in C the unsigned char type is also effectively an integer in the range 0-255. Many C compilers default to unsigned if you use an unqualified char type, and text is modelled as a char[] array.
I am using python 3.8.5, and trying to convert from an integer in the range (0,65535) to a pair of bytes. I am currently using the following code:
from struct import pack
input_integer = 2111
bytes_val = voltage.to_bytes(2,'little')
output_data = struct.pack('bb',bytes_val[1],bytes_val[0])
print(output_data)
This produces the following output:
b'\x08?'
This \x08 is 8 in hex, the most significant byte, and ? is 63 in ascii. So together, the numbers add up to 2111 (8*256+63=2111). What I can't figure out is why the least significant byte is coming out in ascii instead of hex? It's very strange to me that it's in a different format than the MSB right next to it. I want it in hex for the output data, and am trying to figure out how to achieve that.
I have also tried modifying the format string in the last line to the following:
output_data = struct.pack('cc',bytes_val[1],bytes_val[0])
which produces the following error:
struct.error: char format requires a bytes object of length 1
I checked the types at each step, and it looks like bytes_val is a bytearray of length 2, but when I take one of the individual elements, say bytes_val[1], it is an integer rather than a byte array.
Any ideas?
All your observations can be verified from the docs for the bytes class:
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers
In Python strings any letters and punctuation are represented by themselves in ASCII, while any control codes by their hexadecimal value (0-31, 127). You can see this by printing ''.join(map(chr, range(128))). Bytes literals follow the same convention, except that individual byte elements are integer, i.e., output_data[0].
If you want to represent everything as hex
>>> output_data.hex()
'083f'
>>> bytes.fromhex('083f') # to recover
b'\x08?'
As of version 3.8 bytes.hex() now supports optional sep and bytes_per_sep parameters to insert separators between bytes in the hex output.
>>> b'abcdef'.hex(' ', 2)
'6162 6364 6566'
I'm trying to get the first char of a byte-string in python 3.4, but when I index it, I get an int:
>>> my_bytes = b'just a byte string'
b'just a byte string'
>>> my_bytes[0]
106
>>> type(my_bytes[0])
<class 'int'>
This seems unintuitive to me, as I was expecting to get b'j'.
I have discovered that I can get the value I expect, but it feels like a hack to me.
>>> my_bytes[0:1]
b'j'
Can someone please explain why this happens?
The bytes type is a Binary Sequence type, and is explicitly documented as containing a sequence of integers in the range 0 to 255.
From the documentation:
Bytes objects are immutable sequences of single bytes.
[...]
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256[.]
[...]
Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer, while b[0:1] will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1).
Bold emphasis mine. Note than indexing a string is a bit of an exception among the sequence types; 'abc'[0] gives you a str object of length one; str is the only sequence type that contains elements of its own type, always.
This echoes how other languages treat string data; in C the unsigned char type is also effectively an integer in the range 0-255. Many C compilers default to unsigned if you use an unqualified char type, and text is modelled as a char[] array.
Is this the only way to get a byte of binary data?
index = 40
data = afile.read()
byte = data[index:index+1] # <- here
If I use byte = data[index] it throws an error:
TypeError: 'int' does not support the buffer interface
From the bytes documentation:
Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer, while b[0:1] will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1).
So byte = data[index] assigns an integer value to byte - which as your error message points out, does not support the buffer interface.
You could, instead of byte = data[index:index+1], write this:
byte = bytes([data[index]])
... relying on the fact that the bytes constructor, if passed a sequence (note the brackets around data[index]), will use it to create a bytes object - but that's not really any more readable, and is significantly slower (by a factor of six on my machine with Python 3.4).