I get the following bytes from a network service: \x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01 These are 8 bit number. I want to change the representation to my system's representation (32 bits) to be able to work on the bytes. How would I do this with python? Is there a special 'reverse' function for this?
best regards
If you have 8-bit numbers the byte order is irrelevant, as there is only one byte in each of them. If you want to convert every character to integer you can write:
struct.unpack("11B", "\x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01")
or
struct.unpack("!11B", "\x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01")
or
map(ord, "\x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01")
It's equivalent.
If string contains 16-bit or 32-bit integers, you can write things like:
struct.unpack("!IIHB", "\x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01")
which would be decoded as two 4-byte, one 2-byte and one 1-byte unsigned integers. The ! (which is equivalent to big-endian >) means that string is in network byte order, so all integers larger than one byte can be converted correctly to your native byte order.
EDIT: If what you want is to get eleven numbers and process them in reversed order, you should use one of above methods and call reversed, for example: reversed(map(ord, data)); but this reverses the order regardless of your native byte order. You didn't say what the data really is thou and I'm not convinced endianness does matter here.
Determine which byte order the bytes are in, and supply the correct byte order character to struct.unpack.
If you want to reverse all of the bytes in a string, you can do this:
'example string'[::-1]
I would recommend the struct module for unpacking network or otherwise binary data, as you otherwise don't have a good way to tell where exactly the reversing needs to happen. It allows you to specify the byte order.
I'm not sure what you mean by 8308040460020081150101, but the struct package should have everything you need.
Have you looked at the core struct library? It has methods for converting byte orders.
Related
Currently, I have a system that converts a list of integers to their binary representations. I calculate the number of bytes each number requires and then use the to_bytes() function to convert them to bytes, like so:
o = open(outFileName, "wb")
for n in result:
numBytes = math.ceil(n.bit_length()/8)
o.write(n.to_bytes(numBytes, 'little'))
o.close()
However, since the bytes are of varying lengths, what would be the method to allow an unpacking program/function to know how long each byte was? I have heard uses of the struct module and specifically the pack function, but with a focus on efficiency and reducing the size of the file as much as possible in mind, what would be the best way of approaching this to allow such an unpacking program to retrieve the exact list of originally encoded integers?
You can't. Your encoding maps different lists of integers to the same sequence of bytes. It is then impossible to know which one was the original input.
You need a different encoding.
Take a look at using the high bit each byte. There are other ways that might be better, depending on the distribution of your integers, such as Golomb coding.
I have a piece of hardware sending data at a fixed length: 2bytes, 1 bytes, 4 bytes, 4 bytes, 2 bytes, 4bytes for a total of 17 bytes. If I change my format to 18bytes the code works but values are incorrect.
format = '<2s1s4s4s2s4s'
print(struct.calcsize(format))
print(len(hardware_data))
splitdata = struct.unpack(format,hardware_data)
The output is 17, 18 and an error because of the mismatch. I think this is caused by alignment but I'm unsure and nothing I've tried had fixed this. Below are a couple typical strings, if I print(hardware_data) I noticed the 'R' and 'n' characters but I'm unsure how to handle.
b'\x18\x06\x00R\x1f\x01\x00\x00\x00\x00\x00\xd8\xff\x00\x00\x00\x00\x80'
b'\x18\x06\x00R\x1f\x01\x00\x00\x00\x00\x00\n\x00\x00\x00\x00\x00\x80'
Odds are whatever is sending the data is padding it in some way you're not expecting.
For example, if the first four byte field is supposed to represent an int, C struct padding rules would require a padding byte, after the one byte field (to align the next four byte field to four byte alignment). So just add the padding byte explicitly, changing your format string to:
format = '<2s1sx4s4s2s4s'
The x in there says "I expect a byte here, but it's padding, don't unpack it to anything." It's possible the pad byte belongs elsewhere (I have no idea what your hardware is doing); I notice the third byte is the NUL (\0) byte in both examples, but the spot I assumed would be padding is 'R', so it's possible you want:
format = '<2sx1s4s4s2s4s'
instead. Or it could be somewhere else (without knowing which of the fields is a char array in the hardware struct, and which are larger types with alignment requirements, it's impossible to say). Point is, your hardware is sending 18 bytes; figure out which one is garbage, and put the x pad byte at the appropriate location.
Side-note: The repr of bytes objects will use ASCII or simpler ASCII escapes when available. That's why you see an R and a \n in your output; b'R' and b'\x52' are equivalent literals, as are b'\n' and b'\x0a' and Python chooses to use the "more readable" version (when the bytes is actually just ASCII, this is much more readable).
For example given an arbitrary string. Could be chars or just random bytes:
string = '\xf0\x9f\xa4\xb1'
I want to output:
b'\xf0\x9f\xa4\xb1'
This seems so simple, but I could not find an answer anywhere. Of course just typing the b followed by the string will do. But I want to do this runtime, or from a variable containing the strings of byte.
if the given string was AAAA or some known characters I can simply do string.encode('utf-8'), but I am expecting the string of bytes to just be random. Doing that to '\xf0\x9f\xa4\xb1' ( random bytes ) produces unexpected result b'\xc3\xb0\xc2\x9f\xc2\xa4\xc2\xb1'.
There must be a simpler way to do this?
Edit:
I want to convert the string to bytes without using an encoding
The Latin-1 character encoding trivially (and unlike every other encoding supported by Python) encodes every code point in the range 0x00-0xff to a byte with the same value.
byteobj = '\xf0\x9f\xa4\xb1'.encode('latin-1')
You say you don't want to use an encoding, but the alternatives which avoid it seem far inferior.
The UTF-8 encoding is unsuitable because, as you already discovered, code points above 0x7f map to a sequence of multiple bytes (up to four bytes) none of which are exactly the input code point as a byte value.
Omitting the argument to .encode() (as in a now-deleted answer) forces Python to guess an encoding, which produces system-dependent behavior (probably picks UTF-8 on most systems except Windows, where it will typically instead choose something much more unpredictable, as well as usually much more sinister and horrible).
I found a working solution
import struct
def convert_string_to_bytes(string):
bytes = b''
for i in string:
bytes += struct.pack("B", ord(i))
return bytes
string = '\xf0\x9f\xa4\xb1'
print (convert_string_to_bytes(string)))
output:
b'\xf0\x9f\xa4\xb1'
I know for integers one can use htonl and ntohl but what about pickle byte streams?
If I know that the next 150 bytes that are received are a pickle object, do I still have to reverse byte-order just in case one machine uses big-endian and the other is little-endian?
So I think I figured out the answer.
By default, the pickle data format uses a printable ASCII
representation.
since ASCII is a single byte representation the endianess does not matter.
I'm going over other people's code in CoderByte exercises. I was just reviewing the first exercise to review a string.
Here is the code:
def FirstReverse(s):
ar = bytearray(s)
ar.reverse()
return str(ar)
print FirstReverse("Argument goes here")
I printed ar after the first line and just got the string back so I'm unclear how the bytearray helped. I also still didn't understand it after reading the documentation here: https://docs.python.org/2/library/functions.html#bytearray
So what is a bytearray? Did it make sense to use it in this example?
As the doc says,
Return a new array of bytes. ... is a mutable sequence of integers in the range 0 <= x < 256
For example,
>>> s = 'hello world'
>>> print bytearray(s)
hello world
>>> bytearray(s)[0]
104
and 104 is the ASCII side of h.
Class bytearray has the method reverse, but string doesn't. In order to reverse the string, this code first gets its bytes array, and then reserves, finally gets the reversed string by str.
In addition, you can use [::-1] to reverse a string.
>>> 'Argument goes here'[::-1]
'ereh seog tnemugrA'
The difference between a str and a bytearray is that a str is a sequence of Unicode code points, whereas a bytearray is a sequence of bytes. A single Unicode String may be represented by multiple different bytearrays, depending on the encoding format (e.g. there would be different bytearrays for the UTF-8 representation and the UTF-16 representation of the same str). In addition, str is intended to represent text; by contrast, bytearray may be used to represent arbitrary byte sequences that do not correspond to text at all (e.g. sequences of bytes that are not valid Unicode in any standard encoding format and that will, in fact, be interpreted as something completely different from text altogether such as integer sequences, serialized objects, extended precision integers, or anything else you would want to represent as a sequence of bytes).
In addition to this distinction, str is immutable whereas bytearray is mutable. This means that transformations on str necessarily perform copying operations; by contrast, the contents of a bytearray may be updated / modified in place.
In this particular example, there really is no reason to use a bytearray (and in fact, doing that is more dangerous than using a reversed slice of str, because bytearray.reverse() reverses the underlying bytes... for characters that are encoded by more than one byte, this may result in totally invalid Unicode sequences when interpreting back into Unicode code points). However, if you want to examine or manipulate the encoded form of a string or perform something that is totally unrelated to raw text (like populate the bytes of a datagram packet), that would be a use case for bytearray.
I don't see how it helped personally. You can do this type of reversal natively with a string by just slicing it with a step size of -1:
def FirstReverse(s):
return s[::-1]
print FirstReverse("Argument goes here")
I timed the bytearray version and this version using Python 2.7.10 and didn't see one being faster than the other.
So I guess it is a different approach, but I don't see it as a better approach.
The only advantage I could see is if the string were unicode and you are using Python 2.x instead of 3.x (because Python 2.x strings were not natively unicode). However, to pull a unicode string into a bytearray, you need to specify the encoding, which wasn't done here. So it must not have been for that purpose.