ord() expected string, but int found - python

I'm having problem in the ord() def, my code is given below:
def write(data):
for block_idx in list(range(0, len(data), 10)):
chksum = 0
for byte_idx in list(range(block_idx, 2)):
chksum += ord(data[byte_idx])
write(b'123')
TypeError: ord() expected string of length 1, but int found

You're providing b'123' to the function, which is bytes, that is why when you're trying to get the byte of an specific position you're getting an int. And when you're passing an int to the ord(), the exception is thrown, because:
Given string of length 1, the ord() function returns
an integer representing the Unicode code point of the character when
the argument is a Unicode object, or the value of the byte when the
argument is an 8-bit string.
Use:
chksum += ord(chr(data[byte_idx]))

As the Python bytes documentation states:
While bytes literals and representations are based on ASCII text,
bytes objects actually behave like immutable sequences of integers
When b"123"[0] is called it returns an int which is an invalid argument to the builtin ord function.
You don't even need to call ord in this case:
chksum += data[byte_idx]

Related

Why 'list(bytestring)' is displaying list of decimal values? [duplicate]

I'm trying to get the first char of a byte-string in python 3.4, but when I index it, I get an int:
>>> my_bytes = b'just a byte string'
b'just a byte string'
>>> my_bytes[0]
106
>>> type(my_bytes[0])
<class 'int'>
This seems unintuitive to me, as I was expecting to get b'j'.
I have discovered that I can get the value I expect, but it feels like a hack to me.
>>> my_bytes[0:1]
b'j'
Can someone please explain why this happens?
The bytes type is a Binary Sequence type, and is explicitly documented as containing a sequence of integers in the range 0 to 255.
From the documentation:
Bytes objects are immutable sequences of single bytes.
[...]
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256[.]
[...]
Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer, while b[0:1] will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1).
Bold emphasis mine. Note than indexing a string is a bit of an exception among the sequence types; 'abc'[0] gives you a str object of length one; str is the only sequence type that contains elements of its own type, always.
This echoes how other languages treat string data; in C the unsigned char type is also effectively an integer in the range 0-255. Many C compilers default to unsigned if you use an unqualified char type, and text is modelled as a char[] array.

str.join TypeError when decoding binary file using struct.unpack [duplicate]

I'm trying to get the first char of a byte-string in python 3.4, but when I index it, I get an int:
>>> my_bytes = b'just a byte string'
b'just a byte string'
>>> my_bytes[0]
106
>>> type(my_bytes[0])
<class 'int'>
This seems unintuitive to me, as I was expecting to get b'j'.
I have discovered that I can get the value I expect, but it feels like a hack to me.
>>> my_bytes[0:1]
b'j'
Can someone please explain why this happens?
The bytes type is a Binary Sequence type, and is explicitly documented as containing a sequence of integers in the range 0 to 255.
From the documentation:
Bytes objects are immutable sequences of single bytes.
[...]
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256[.]
[...]
Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer, while b[0:1] will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1).
Bold emphasis mine. Note than indexing a string is a bit of an exception among the sequence types; 'abc'[0] gives you a str object of length one; str is the only sequence type that contains elements of its own type, always.
This echoes how other languages treat string data; in C the unsigned char type is also effectively an integer in the range 0-255. Many C compilers default to unsigned if you use an unqualified char type, and text is modelled as a char[] array.

Python array[0:1] not the same as array[0]

I'm using Python to split a string of 2 bytes b'\x01\x00'. The string of bytes is stored in a variable called flags.
Why when I say flags[0] do I get b'\x00' but when I say flags[0:1] I get the expected answer of b'\x01'.
Should both of these operations not be exactly the same?
What I did:
>>> flags = b'\x01\x00'
>>> flags[0:1]
b'\x01'
>>> bytes(flags[0])
b'\x00'
In Python 3, bytes is a sequence type containing integers (each in the range 0 - 255) so indexing to a specific index gives you an integer.
And just like slicing a list produces a new list object for the slice, so does slicing a bytes object produce a new bytes instance. And the representation of a bytes instance tries to show you a b'...' literal syntax with the integers represented as either printable ASCII characters or an applicable escape sequence when the byte isn't printable. All this is great for developing but may hide the fact that bytes are really a sequence of integers.
However, you will still get the same piece of information; flags[0:1] is a one-byte long bytes value with the \x01 byte in it, and flags[0] will give you the integer 1:
>>> flags = b'\x01\x00'
>>> flags[0]
1
>>> flags[0:1]
b'\x01'
What you really did was not use flags[0], you used bytes(flags[0]) instead. Passing in a single integer to the bytes() type creates a new bytes object of the specified length, pre-filled with \x00 bytes:
>>> flags[0]
1
>>> bytes(1)
b'\x00'
Since flags[0] produces the integer 1, you told bytes() to return a new bytes value of length 1, filled with \x00 bytes.
From the bytes documentation:
Bytes objects are immutable sequences of single bytes.
[...]
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256.
[...]
In addition to the literal forms, bytes objects can be created in a number of other ways:
A zero-filled bytes object of a specified length: bytes(10)
Bold emphasis mine.
If you wanted to create a new bytes object with that one byte in it, you'll need to put the integer value in a list first:
>>> bytes([flags[0]])
b'\x01'
Yes, you should get the same thing. In both cases b'\x01'. flags is probably not what you think it is.
>>> flags = b'\x01\x00'
>>> flags[0]
'\x01'
>>> flags[0:1]
'\x01'

How do you convert a python sequence item to an integer

I need to convert the elements of a python2.7 bytearray() or string or bytes() into integers for processing. In many languages(ie C, etc) bytes and 'chars' are more or less 8 bit ints that you an perform math operations on. How can I convince python to let me use (appropriate) bytearrays or strings interchangebly?
Consider toHex(stringlikeThing):
zerof = '0123456789ABCDEF'
def toHex(strg):
ba = bytearray(len(strg)*2)
for xx in range(len(strg)):
vv = ord(strg[xx])
ba[xx*2] = zerof[vv>>4]
ba[xx*2+1] = zerof[vv&0xf]
return ba
which should take a string like thing (ie bytearray or string) and make a printable string like thing of hexadecimal text. It converts "string" to the hex ASCII:
>>> toHex("string")
bytearray(b'737472696E67')
However, when given a bytearray:
>>> nobCom.toHex(bytearray("bytes"))
EX ord() expected string of length 1, but int found: 0 bytes
The ord() in the 'for' loop gets strg[xx], an item of a bytearray, which seems to be an integer (Whereas an item of a str is a single element string)
So ord() wants a char (single element string) not an int.
Is there some method or function that takes an argument that is a byte, char, small int, one element string and returns it's value?
Of course you could check the type(strg[xx]) and handle the cases laboriously.
The unvoiced question is: Why (what is the reasoning) for Python to be so picky about the difference between a byte and char (normal or unicode) (ie single element string)?
When you index a bytearray object in python, you get an integer. This integer is the code for the corresponding character in the bytearray, or in other words, the very thing that the ord function would return.
There is no method in python that takes a byte, character, small integer, or one element string and returns it's value in python. Making such a method would be simple however.
def toInt(x):
return x if type(x) == int else ord(x)

Why do I get an int when I index bytes?

I'm trying to get the first char of a byte-string in python 3.4, but when I index it, I get an int:
>>> my_bytes = b'just a byte string'
b'just a byte string'
>>> my_bytes[0]
106
>>> type(my_bytes[0])
<class 'int'>
This seems unintuitive to me, as I was expecting to get b'j'.
I have discovered that I can get the value I expect, but it feels like a hack to me.
>>> my_bytes[0:1]
b'j'
Can someone please explain why this happens?
The bytes type is a Binary Sequence type, and is explicitly documented as containing a sequence of integers in the range 0 to 255.
From the documentation:
Bytes objects are immutable sequences of single bytes.
[...]
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256[.]
[...]
Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer, while b[0:1] will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1).
Bold emphasis mine. Note than indexing a string is a bit of an exception among the sequence types; 'abc'[0] gives you a str object of length one; str is the only sequence type that contains elements of its own type, always.
This echoes how other languages treat string data; in C the unsigned char type is also effectively an integer in the range 0-255. Many C compilers default to unsigned if you use an unqualified char type, and text is modelled as a char[] array.

Categories

Resources