Convert Python byte to "unsigned 8 bit integer"

Convert Python byte to "unsigned 8 bit integer" - python

I am reading in a byte array/list from socket. I want Python to treat the first byte as an "unsigned 8 bit integer". How is it possible to get its integer value as an unsigned 8 bit integer?

Use the struct module.
import struct
value = struct.unpack('B', data[0])[0]
Note that unpack always returns a tuple, even if you're only unpacking one item.
Also, have a look at this SO question.

bytes/bytearray is a sequence of integers. If you just access an element by its index you'll have an integer:
>>> b'abc'
b'abc'
>>> _[0]
97
By their very definition, bytes and bytearrays contain integers in the range(0, 256). So they're "unsigned 8-bit integers".

Another very reasonable and simple option, if you just need the first byte’s integer value, would be something like the following:
value = ord(data[0])
If you want to unpack all of the elements of your received data at once (and they’re not just a homogeneous array), or if you are dealing with multibyte objects like 32-bit integers, then you’ll need to use something like the struct module.

Related

change endian of hex in bitstring unpack

I'm using the module bitstring to unpack a 24 byte boundary file. I dont have control over the input file. The default interpretation of the module is big-endian apparently which is easy to fix when unpacking data types like int or float but some data I want to be represented as hex values. Using the unpack hex values it displays the incorrect byte ordering. Is there a fix for this? Example input: D806 desired output: 06D8
from bitstring import ConstBitStream
fp = ConstBitStream(filename="testfile.bin")
firstChunk = fp.read(2*8)
data=firstChunk.unpack('hex:16')
print(data)

You could use ordinary Python formatting on a little-endian integer interpretation.
Rather than a read then unpack you also can do both together:
print('{:0>4X}'.format(fp.read('uintle:16')))
This reads then next 16 bits from the stream, interprets it as an unsigned little-endian integer then formats it as four characters of hexadecimal, right-aligned and padded with zeros.

What is a bytearray? Why was it used?

I'm going over other people's code in CoderByte exercises. I was just reviewing the first exercise to review a string.
Here is the code:
def FirstReverse(s):
ar = bytearray(s)
ar.reverse()
return str(ar)
print FirstReverse("Argument goes here")
I printed ar after the first line and just got the string back so I'm unclear how the bytearray helped. I also still didn't understand it after reading the documentation here: https://docs.python.org/2/library/functions.html#bytearray
So what is a bytearray? Did it make sense to use it in this example?

As the doc says,
Return a new array of bytes. ... is a mutable sequence of integers in the range 0 <= x < 256
For example,
>>> s = 'hello world'
>>> print bytearray(s)
hello world
>>> bytearray(s)[0]
104
and 104 is the ASCII side of h.
Class bytearray has the method reverse, but string doesn't. In order to reverse the string, this code first gets its bytes array, and then reserves, finally gets the reversed string by str.
In addition, you can use [::-1] to reverse a string.
>>> 'Argument goes here'[::-1]
'ereh seog tnemugrA'

The difference between a str and a bytearray is that a str is a sequence of Unicode code points, whereas a bytearray is a sequence of bytes. A single Unicode String may be represented by multiple different bytearrays, depending on the encoding format (e.g. there would be different bytearrays for the UTF-8 representation and the UTF-16 representation of the same str). In addition, str is intended to represent text; by contrast, bytearray may be used to represent arbitrary byte sequences that do not correspond to text at all (e.g. sequences of bytes that are not valid Unicode in any standard encoding format and that will, in fact, be interpreted as something completely different from text altogether such as integer sequences, serialized objects, extended precision integers, or anything else you would want to represent as a sequence of bytes).
In addition to this distinction, str is immutable whereas bytearray is mutable. This means that transformations on str necessarily perform copying operations; by contrast, the contents of a bytearray may be updated / modified in place.
In this particular example, there really is no reason to use a bytearray (and in fact, doing that is more dangerous than using a reversed slice of str, because bytearray.reverse() reverses the underlying bytes... for characters that are encoded by more than one byte, this may result in totally invalid Unicode sequences when interpreting back into Unicode code points). However, if you want to examine or manipulate the encoded form of a string or perform something that is totally unrelated to raw text (like populate the bytes of a datagram packet), that would be a use case for bytearray.

I don't see how it helped personally. You can do this type of reversal natively with a string by just slicing it with a step size of -1:
def FirstReverse(s):
return s[::-1]
print FirstReverse("Argument goes here")
I timed the bytearray version and this version using Python 2.7.10 and didn't see one being faster than the other.
So I guess it is a different approach, but I don't see it as a better approach.
The only advantage I could see is if the string were unicode and you are using Python 2.x instead of 3.x (because Python 2.x strings were not natively unicode). However, to pull a unicode string into a bytearray, you need to specify the encoding, which wasn't done here. So it must not have been for that purpose.

Difference between bytearray and list

What is the difference between bytearray and for example, a list or tuple?
As the name suggests, bytearray must be an array that carries byte objects.
In python, it seems that bytes and str are treated equally
>>> bytes
<type 'str'>
So, what is the difference?
Also, if you print a bytearray, the result is pretty weird
>>> v = bytearray([200, 201])
>>> print v
ÈÉ
It seems that it transforms the integer in chr(integer) , is that right? What is the use of a bytearray then?

You are correct in some way: In Python 2, bytes is synonymous with the str type. This is because originally, there was no bytes object, there was only str and unicode (the latter being for unicode string, i.e. having multi-byte capabilities). When Python 3 came, they changed the whole string things and made unicode the default Python 3 str type, and they added bytes as the type for raw byte sequences (making it equivalent to Python 2’s str object).
So while in Python 3 you differ between str and bytes, the corresponding types in Python 2 are unicode and str.
Now what makes the bytearray type interesting is that it’s mutable. All string and byte sequences above are immutable, so with every change, you are creating a new object. But you can modify bytearray objects, making them interesting for various purposes where you need to modify individual bytes in a sequence.

How to unpack from a binary file a byte array using Python?

I'm giving myself a crash course in reading a binary file using Python. I'm new to both, so please bear with me.
The file format's documentation tells me that the first 16 bytes are a GUID and further reading tells me that this GUID is formatted thus:
typedef struct {
unsigned long Data1;
unsigned short Data2;
unsigned short Data3;
byte Data4[8];
} GUID,
UUID,
*PGUID;
I've got as far us being able to unpack the first three entries in the struct, but I'm getting stumped on #4. It's an array of 8 bytes I think but I'm not sure how to unpack it.
import struct
fp = open("./file.bin", mode='rb')
Data1 = struct.unpack('<L', fp.read(4)) # unsigned long, little-endian
Data2 = struct.unpack('<H', fp.read(2)) # unsigned short, little-endian
Data3 = struct.unpack('<H', fp.read(2)) # unsigned short, little-endian
Data4 = struct.unpack('<s', bytearray(fp.read(8))) # byte array with 8 entries?
struct.error: unpack requires a bytes object of length 1
What am I doing wrong for Data4? (I'm using Python 3.2 BTW)
Data1 thru 3 are OK. If I use hex() on them I am getting the correct data that I'd expect to see (woohoo) I'm just failing over on the syntax of this byte array.
Edit: Answer
I'm reading a GUID as defined in MS-DTYP and this nailed it:
data = uuid.UUID(bytes_le=fp.read(16))

If you want an 8-byte string, you need to put the number 8 in there:
struct.unpack('<8s', bytearray(fp.read(8)))
From the docs:
A format character may be preceded by an integral repeat count. For example, the format string '4h' means exactly the same as 'hhhh'.
…
For the 's' format character, the count is interpreted as the length of the bytes, not a repeat count like for the other format characters; for example, '10s' means a single 10-byte string, while '10c' means 10 characters. If a count is not given, it defaults to 1. For packing, the string is truncated or padded with null bytes as appropriate to make it fit. For unpacking, the resulting bytes object always has exactly the specified number of bytes. As a special case, '0s' means a single, empty string (while '0c' means 0 characters).
However, I'm not sure why you're doing this in the first place.
fp.read(8) gives you an 8-byte bytes object. You want an 8-byte bytes object. So, just do this:
Data4 = fp.read(8)
Converting the bytes to a bytearray has no effect except to make a mutable copy. Unpacking it just gives you back a copy of the same bytes you started with. So… why?
Well, actually, struct.unpack returns a tuple whose one value is a copy of the same bytes you started with, but you can do that with:
Data4 = (fp.read(8),)
Which raises the question of why you want four single-element tuples in the first place. You're going to be doing Data1[0], etc. all over the place for no good reason. Why not this?
Data1, Data2, Data3, Data4 = struct.unpack('<LHH8s', fp.read(16))
Of course if this is meant to read a UUID, it's always better to use the "batteries included" than to try to build your own batteries from nickel and cadmium ore. As icktoofay says, just use the uuid module:
data = uuid.UUID(bytes_le=fp.read(16))
But keep in mind that Python's uuid uses the 4-2-2-1-1-6 format, not the 4-2-2-8 format. If you really need exactly that format, you'll need to convert it, which means either struct or bit twiddling anyway. (Microsoft's GUID makes things even more fun by using a 4-2-2-2-6 format, which is not the same as either, and representing the first 3 in native-endian and the last two in big-endian, because they like to make things easier…)

UUIDs are supported by Python with the uuid module. Do something like this:
import uuid
my_uuid = uuid.UUID(bytes_le=fp.read(16))

python reverse byteorder from network service

I get the following bytes from a network service: \x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01 These are 8 bit number. I want to change the representation to my system's representation (32 bits) to be able to work on the bytes. How would I do this with python? Is there a special 'reverse' function for this?
best regards

If you have 8-bit numbers the byte order is irrelevant, as there is only one byte in each of them. If you want to convert every character to integer you can write:
struct.unpack("11B", "\x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01")
or
struct.unpack("!11B", "\x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01")
or
map(ord, "\x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01")
It's equivalent.
If string contains 16-bit or 32-bit integers, you can write things like:
struct.unpack("!IIHB", "\x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01")
which would be decoded as two 4-byte, one 2-byte and one 1-byte unsigned integers. The ! (which is equivalent to big-endian >) means that string is in network byte order, so all integers larger than one byte can be converted correctly to your native byte order.
EDIT: If what you want is to get eleven numbers and process them in reversed order, you should use one of above methods and call reversed, for example: reversed(map(ord, data)); but this reverses the order regardless of your native byte order. You didn't say what the data really is thou and I'm not convinced endianness does matter here.

Determine which byte order the bytes are in, and supply the correct byte order character to struct.unpack.

If you want to reverse all of the bytes in a string, you can do this:
'example string'[::-1]
I would recommend the struct module for unpacking network or otherwise binary data, as you otherwise don't have a good way to tell where exactly the reversing needs to happen. It allows you to specify the byte order.

I'm not sure what you mean by 8308040460020081150101, but the struct package should have everything you need.

Have you looked at the core struct library? It has methods for converting byte orders.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.