Python hex bit flipping ascii - python

The following statement is from a documentation I'm following.
“7c bd 9c 91” 2442968444(919cbd7c hex)usec = 2442.9sec
If you assume:
7c -> a
bd -> b
9c -> c
91 -> d
Then its easy to see how they got 919cbd7c simply by flipping it abcd to dcba.
What I don't understand is why they aren't filliping the actual bits.
That is to say I expect 19c9dbc7 rather than 919cbd7c.
Is there a way to convert the original string to what they expect?
EG: convert 7cbd9c91 to 919cbd7c?
I know that I can split the string in twos and reverse the order. But is there a way python is aware of this and can decode it automatically?
Here is the documentation. The part in question is on the 2nd line of page 22.

I think you're trying to put too much thought into it. The hex pairs you're seeing are actually single bytes, and the order of the bits within the bytes is unambiguous. It's only the byte-order of the higher-level multi-byte integer that can go more than one way. Fortunately, byte-order swapping is very easy, since computers have to do it all the time (network byte order is big-endian, but most PCs these days are little-endian internally).
In Python, just pass the raw bytestring you're getting (which would be b"\x7c\xbd\x9c\x91" for the example data shown in the documentation) to struct.unpack with an appropriate format parameter. Since the documentation says it's a little endian 4-byte number, use "<L" as the format code to specify a "little-endian unsigned long integer":
>>> bytestring = b"\x7c\xbd\x9c\x91" # from wherever
>>> struct.unpack("<L", bytestring)
(2442968444,)

Related

Python Decode OctetString 7-bit Characters

I'm currently playing around with decoded asn1 data and can't wrap my head around correctly decoding the data into strings (if the data is numerical it's working absolutely fine)
Example:
Hex String -> 0ddc2f93c6c7bb10
Expected Result -> MegaFon
According to the spec the first two octets are meta info and starting with octet 3 there should be two 7 bit chars in each octet
I tried to use the soltion's mentioned in decode 7-bit GSM but I just get scrap returns, would highly appreciate any ideas
managed to solve the riddle in the meantime (#BoarGules, you are right, the spec is misleading from my perspective). First of all, for Chars (the hex starts with d0 in this case), the nibbles must not be rotated as it is done for numerical output. Then just cut out the first two octets (d0 in our case) and run it through the gsm7bitdecode function mentioned in the other stackoverlow thread (linked in the question). To keep with the example 'CD' => 11001101, cut the first bit or set it to 0 gives us 01001101 or 4D in Hex which is M in Ascii!

Use string as bytes [duplicate]

This question already has answers here:
Process escape sequences in a string in Python
(8 answers)
Closed 7 months ago.
My problem is as follows:
I'm reading a .csv generated by some software and to read it I'm using Pandas. Pandas read the .csv properly but one of the columns stores bytes sequences representing vectors and Pandas stores them as a string.
So I have data (string) and I want to use np.frombuffer() to get the proper vector. The problem is, data is a string so its already encoded so when I use .encode() to turn it into bytes, the sequence is not the original one.
Example: The .csv contains \x00\x00 representing the vector [0,0] with dtype=np.uint8. Pandas stores it as a string and when I try to process it something like this happens:
data = df.data[x] # With x any row.
type(data)
<class 'str'>
print(data)
\x00\x00
e_data = data.encode("latin1")
print(e_data)
b'\\x00\\x00'
v = np.frombuffer(e_data, np.uint8)
print(v)
array([ 92 120 48 48 92 120 48 48], dtype=uint8)
I just want to get b'\x00\x00' from data instead of b'\\x00\\x00' which I understand is a little encoding mess I have not been able to fix yet.
Any way to do this?
Thanks!
Issue: you (apparently) have a string that contains literal backslash escape sequences, such as:
>>> x = r'\x00' # note the use of a raw string literal
>>> x # Python's representation of the string escapes the backslash
'\\x00'
>>> print(x) # but it looks right when printing
\x00
From this, you wish to create a corresponding bytes object, wherein the backslash-escape sequences are translated into the corresponding byte.
Handling these kinds of escape sequences is done using the unicode-escape string encoding. As you may be aware, string encodings convert between bytes and str objects, specifying the rules for which byte sequences correspond to what Unicode code points.
However, the unicode-escape codec assumes that the escape sequences are on the bytes side of the equation and that the str side will have the corresponding Unicode characters:
>>> rb'\x00'.decode('unicode-escape') # create a string with a NUL char
'\x00'
Applying .encode to the string will reverse that process; so if you start with the backslash-escape sequence, it will re-escape the backslash:
>>> r'\x00'.encode('unicode-escape') # the result contains two backslashes, represented as four
b'\\\\x00'
>>> list(r'\x00'.encode('unicode-escape')) # let's look at the numeric values of the bytes
[92, 92, 120, 48, 48]
As you can see, that is clearly not what we want.
We want to convert from bytes to str to do the backslash-escaping. But we have a str to start, so we need to change that to bytes; and we want bytes at the end, so we need to change the str that we get from the backslash-escaping. In both cases, we need to make each Unicode code point from 0-255 inclusive, correspond to a single byte with the same value.
The encoding we need for that task is called latin-1, also known as iso-8859-1.
For example:
>>> r'\x00'.encode('latin-1')
b'\\x00'
Thus, we can reason out the overall conversion:
>>> r'\x00'.encode('latin-1').decode('unicode-escape').encode('latin-1')
b'\x00'
As desired: our str with a literal backslash, lowercase x and two zeros, is converted to a bytes object containing a single zero byte.
Alternately: we can request that backslash-escapes are processed while decoding, by using escape_decode from the codecs standard library module. However, this isn't documented and isn't really meant to be used that way - it's internal stuff used to implement the unicode-escape codec and possibly some other things.
If you want to expose yourself to the risk of that breaking in the future, it looks like:
>>> import codecs
>>> codecs.escape_decode(r'\x00\x00')
(b'\x00\x00', 8)
We get a 2-tuple, with the desired bytes and what I assume is the number of Unicode code points that were decoded (i.e. the length of the string). From my testing, it appears that it can only use UTF-8 encoding for the non-backslash sequences (but this could be specific to how Python is configured), and you can't change this; there is no actual parameter to specify the encoding, for a decode method. Like I said - not meant for general use.
Yes, all of that is as awkward as it seems. The reason you don't get easy support for this kind of thing is that it isn't really how you're intended to design your system. Fundamentally, all data is bytes; text is an abstraction that is encoded by that byte data. Using a single byte (with value 0) to represent four characters of text (the symbols \, x, 0 and 0) is not a normal encoding, and not a reversible one (how do I know whether to decode the byte as those four characters, or as a single NUL character?). Instead, you should strongly consider using some other friendly string representation of your data (perhaps a plain hex dump) and a non-text-encoding-related way to parse it. For example:
>>> data = '41 42' # a string in a simple hex dump format
>>> bytes.fromhex(data) # support is built-in, and works simply
b'AB'
>>> list(bytes.fromhex(data))
[65, 66]

Alignment/Packing in Python Struct.Unpack

I have a piece of hardware sending data at a fixed length: 2bytes, 1 bytes, 4 bytes, 4 bytes, 2 bytes, 4bytes for a total of 17 bytes. If I change my format to 18bytes the code works but values are incorrect.
format = '<2s1s4s4s2s4s'
print(struct.calcsize(format))
print(len(hardware_data))
splitdata = struct.unpack(format,hardware_data)
The output is 17, 18 and an error because of the mismatch. I think this is caused by alignment but I'm unsure and nothing I've tried had fixed this. Below are a couple typical strings, if I print(hardware_data) I noticed the 'R' and 'n' characters but I'm unsure how to handle.
b'\x18\x06\x00R\x1f\x01\x00\x00\x00\x00\x00\xd8\xff\x00\x00\x00\x00\x80'
b'\x18\x06\x00R\x1f\x01\x00\x00\x00\x00\x00\n\x00\x00\x00\x00\x00\x80'
Odds are whatever is sending the data is padding it in some way you're not expecting.
For example, if the first four byte field is supposed to represent an int, C struct padding rules would require a padding byte, after the one byte field (to align the next four byte field to four byte alignment). So just add the padding byte explicitly, changing your format string to:
format = '<2s1sx4s4s2s4s'
The x in there says "I expect a byte here, but it's padding, don't unpack it to anything." It's possible the pad byte belongs elsewhere (I have no idea what your hardware is doing); I notice the third byte is the NUL (\0) byte in both examples, but the spot I assumed would be padding is 'R', so it's possible you want:
format = '<2sx1s4s4s2s4s'
instead. Or it could be somewhere else (without knowing which of the fields is a char array in the hardware struct, and which are larger types with alignment requirements, it's impossible to say). Point is, your hardware is sending 18 bytes; figure out which one is garbage, and put the x pad byte at the appropriate location.
Side-note: The repr of bytes objects will use ASCII or simpler ASCII escapes when available. That's why you see an R and a \n in your output; b'R' and b'\x52' are equivalent literals, as are b'\n' and b'\x0a' and Python chooses to use the "more readable" version (when the bytes is actually just ASCII, this is much more readable).

What is a bytearray? Why was it used?

I'm going over other people's code in CoderByte exercises. I was just reviewing the first exercise to review a string.
Here is the code:
def FirstReverse(s):
ar = bytearray(s)
ar.reverse()
return str(ar)
print FirstReverse("Argument goes here")
I printed ar after the first line and just got the string back so I'm unclear how the bytearray helped. I also still didn't understand it after reading the documentation here: https://docs.python.org/2/library/functions.html#bytearray
So what is a bytearray? Did it make sense to use it in this example?
As the doc says,
Return a new array of bytes. ... is a mutable sequence of integers in the range 0 <= x < 256
For example,
>>> s = 'hello world'
>>> print bytearray(s)
hello world
>>> bytearray(s)[0]
104
and 104 is the ASCII side of h.
Class bytearray has the method reverse, but string doesn't. In order to reverse the string, this code first gets its bytes array, and then reserves, finally gets the reversed string by str.
In addition, you can use [::-1] to reverse a string.
>>> 'Argument goes here'[::-1]
'ereh seog tnemugrA'
The difference between a str and a bytearray is that a str is a sequence of Unicode code points, whereas a bytearray is a sequence of bytes. A single Unicode String may be represented by multiple different bytearrays, depending on the encoding format (e.g. there would be different bytearrays for the UTF-8 representation and the UTF-16 representation of the same str). In addition, str is intended to represent text; by contrast, bytearray may be used to represent arbitrary byte sequences that do not correspond to text at all (e.g. sequences of bytes that are not valid Unicode in any standard encoding format and that will, in fact, be interpreted as something completely different from text altogether such as integer sequences, serialized objects, extended precision integers, or anything else you would want to represent as a sequence of bytes).
In addition to this distinction, str is immutable whereas bytearray is mutable. This means that transformations on str necessarily perform copying operations; by contrast, the contents of a bytearray may be updated / modified in place.
In this particular example, there really is no reason to use a bytearray (and in fact, doing that is more dangerous than using a reversed slice of str, because bytearray.reverse() reverses the underlying bytes... for characters that are encoded by more than one byte, this may result in totally invalid Unicode sequences when interpreting back into Unicode code points). However, if you want to examine or manipulate the encoded form of a string or perform something that is totally unrelated to raw text (like populate the bytes of a datagram packet), that would be a use case for bytearray.
I don't see how it helped personally. You can do this type of reversal natively with a string by just slicing it with a step size of -1:
def FirstReverse(s):
return s[::-1]
print FirstReverse("Argument goes here")
I timed the bytearray version and this version using Python 2.7.10 and didn't see one being faster than the other.
So I guess it is a different approach, but I don't see it as a better approach.
The only advantage I could see is if the string were unicode and you are using Python 2.x instead of 3.x (because Python 2.x strings were not natively unicode). However, to pull a unicode string into a bytearray, you need to specify the encoding, which wasn't done here. So it must not have been for that purpose.

python reverse byteorder from network service

I get the following bytes from a network service: \x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01 These are 8 bit number. I want to change the representation to my system's representation (32 bits) to be able to work on the bytes. How would I do this with python? Is there a special 'reverse' function for this?
best regards
If you have 8-bit numbers the byte order is irrelevant, as there is only one byte in each of them. If you want to convert every character to integer you can write:
struct.unpack("11B", "\x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01")
or
struct.unpack("!11B", "\x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01")
or
map(ord, "\x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01")
It's equivalent.
If string contains 16-bit or 32-bit integers, you can write things like:
struct.unpack("!IIHB", "\x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01")
which would be decoded as two 4-byte, one 2-byte and one 1-byte unsigned integers. The ! (which is equivalent to big-endian >) means that string is in network byte order, so all integers larger than one byte can be converted correctly to your native byte order.
EDIT: If what you want is to get eleven numbers and process them in reversed order, you should use one of above methods and call reversed, for example: reversed(map(ord, data)); but this reverses the order regardless of your native byte order. You didn't say what the data really is thou and I'm not convinced endianness does matter here.
Determine which byte order the bytes are in, and supply the correct byte order character to struct.unpack.
If you want to reverse all of the bytes in a string, you can do this:
'example string'[::-1]
I would recommend the struct module for unpacking network or otherwise binary data, as you otherwise don't have a good way to tell where exactly the reversing needs to happen. It allows you to specify the byte order.
I'm not sure what you mean by 8308040460020081150101, but the struct package should have everything you need.
Have you looked at the core struct library? It has methods for converting byte orders.

Categories

Resources