I am building a parser, and I kinda new to this.
I have problem with decoding specific bytes, they always return same int(and they shouldn't) so I must doing it wrong.
byte = ser.read(1)
byte += ser.read(ser.inWaiting())
a = 0
for i in byte:
if i == 0x04:
value = struct.unpack("<h", bytes([i, a]))[0]
print (value)
I recive bytes like this:
b'\xaa\x04\x80\x02\xff\xfb\x83\xaa\xaa\x04\x80\
And I need to decode packet 0x04. I am using Python 3.6
Try something like :
value = int.from_bytes(byte, byteorder='little')
Related
So, I have this string 01010011101100000110010101101100011011000110111101110100011010000110010101110010011001010110100001101111011101110111100101101111011101010110010001101111011010010110111001100111011010010110110101100110011010010110111001100101011000010111001001100101011110010110111101110101011001100110100101101110011001010101000000000000
and I want to decode it using python, I'm getting this error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 280: invalid start byte
According to this webiste: https://www.binaryhexconverter.com/binary-to-ascii-text-converter
The output should be S�ellotherehowyoudoingimfineareyoufineP
Here's my code:
def decodeAscii(bin_string):
binary_int = int(bin_string, 2);
byte_number = binary_int.bit_length() + 7 // 8
binary_array = binary_int.to_bytes(byte_number, "big")
ascii_text = binary_array.decode()
print(ascii_text)
How do I fix it?
Your bytes simply cannot be decoded as utf-8, just as the error message tells you.
utf-8 is the default encoding parameter of decode - and the best way to put in the correct encoding value is to know the encoding - otherwise you'll have to guess.
And guessing is probably what the website does, too, by trying the most common encodings, until one does not throw an exception:
def decodeAscii(bin_string):
binary_int = int(bin_string, 2);
byte_number = binary_int.bit_length() + 7 // 8
binary_array = binary_int.to_bytes(byte_number, "big")
ascii_text = "Bin string cannot be decoded"
for enc in ['utf-8', 'ascii', 'ansi']:
try:
ascii_text = binary_array.decode(encoding=enc)
break
except:
pass
print(ascii_text)
s = "01010011101100000110010101101100011011000110111101110100011010000110010101110010011001010110100001101111011101110111100101101111011101010110010001101111011010010110111001100111011010010110110101100110011010010110111001100101011000010111001001100101011110010110111101110101011001100110100101101110011001010101000000000000"
decodeAscii(s)
Output:
S°ellotherehowyoudoingimfineareyoufineP
But there's no guarantee that you find the "correct" encoding by guessing.
Your binary string is just not a valid ascii or utf-8 string. You can tell decode to ignore invalid sequences by saying
ascii_text = binary_array.decode(errors='ignore')
It could be solved in one line:
Try this:
def bin_to_text(bin_str):
bin_to_str = "".join([chr(int(bin_str[i:i+8],2)) for i in range(0,len(bin_str),8)])
return bin_to_str
bin_str = '01010011101100000110010101101100011011000110111101110100011010000110010101110010011001010110100001101111011101110111100101101111011101010110010001101111011010010110111001100111011010010110110101100110011010010110111001100101011000010111001001100101011110010110111101110101011001100110100101101110011001010101000000000000'
bin_to_str = bin_to_text(bin_str)
print(bin_to_str)
Output:
S°ellotherehowyoudoingimfineareyoufineP
I would like to scan through data files from GPS receiver byte-wise (actually it will be a continuous flow, not want to test the code with offline data). If find a match, then check the next 2 bytes for the 'length' and get the next 2 bytes and shift 2 bits(not byte) to the right, etc. I didn't handle binary before, so stuck in a simple task. I could read the binary file byte-by-byte, but can not find a way to match by desired pattern (i.e. D3).
with open("COM6_200417.ubx", "rb") as f:
byte = f.read(1) # read 1-byte at a time
while byte != b"":
# Do stuff with byte.
byte = f.read(1)
print(byte)
The output file is:
b'\x82'
b'\xc2'
b'\xe3'
b'\xb8'
b'\xe0'
b'\x00'
b'#'
b'\x13'
b'\x05'
b'!'
b'\xd3'
b'\x00'
b'\x13'
....
how to check if that byte is == '\xd3'? (D3)
also would like to know how to shift bit-wise, as I need to check decimal value consisting of 6 bits
(1-byte and next byte's first 2-bits). Considering, taking 2-bytes(8-bits) and then 2-bit right-shift
to get 6-bits. Is it possible in python? Any improvement/addition/changes are very much appreciated.
ps. can I get rid of that pesky 'b' from the front? but if ignoring it does not affect then no problem though.
Thanks in advance.
'That byte' is represented with a b'' in front, indicating that it is a byte object. To get rid of it, you can convert it to an int:
thatbyte = b'\xd3'
byteint = thatbyte[0] # or
int.from_bytes(thatbyte, 'big') # 'big' or 'little' endian, which results in the same when converting a single byte
To compare, you can do:
thatbyte == b'\xd3'
Thus compare a byte object with another byte object.
The shift << operator works on int only
To convert an int back to bytes (assuming it is [0..255]) you can use:
bytes([byteint]) # note the extra brackets!
And as for improvements, I would suggest to read the whole binary file at once:
with open("COM6_200417.ubx", "rb") as f:
allbytes = f.read() # read all
for val in allbytes:
# Do stuff with val, val is int !!!
print(bytes([val]))
I am struggling to read in data from an Arduino and save this data as a csv file I could meddle with in Python later. Right now my code reads.
import serial
serial_port = '/dev/ttyUSB0'
baud_rate = 9600
file_path = "output.csv"
ser = serial.Serial(serial_port,baud_rate)
done = False
data = []
while done == False:
raw_bytes = ser.readline()
decoded_bytes = float(raw_bytes.decode("utf-8"))
data.append(decoded_bytes)
if (len(data) > 10) :
done = True
import numpy as np
np.savetxt(file_path, data, delimiter = ',', fmt='%s')
but I'm running into the error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 1: invalid continuation byte
I want to decode into UTF-8 don't I? What is going wrong? I have checked the Serial Monitor on the Arduino IDE and I am getting correct outputs there. Thanks in advance.
If there's no other way to find out which coding your Arduino IDE uses, you can check/guess the coding on the Arduino side by returning the codes for characters in question via SerialMonitor
void loop () {
int c = Serial.read();
if ( c == -1 ) return; // nothing available
Serial.println (c, HEX); // return the character code in hex notation
}
However the characters you use to convert text into a float number should be plain ASCII, so your
float(raw_bytes.decode("utf-8"))
would probably fail anyway.
I'm trying to write a server in Python to communicate with a pre-existing client whose message packets are ASCII strings, but prepended by four-byte unsigned integer values representative of the length of the remaining string.
I've done a receiver, but I'm sure there's a a more pythonic way. Until I find it, I haven't done the sender. I can easily calculate the message length, convert it to bytes and transmit the message.The bit I'm struggling with is creating an integer which is an array of four bytes.
Let me clarify: If my string is 260 characters in length, I wish to prepend a big-endian four byte integer representation of 260. So, I don't want the ASCII string "0260" in front of the string, rather, I want four (non-ASCII) bytes representative of 0x00000104.
My code to receive the length prepended string from the client looks like this:
sizeBytes = 4 # size of the integer representing the string length
# receive big-endian 4 byte integer from client
data = conn.recv(sizeBytes)
if not data:
break
dLen = 0
for i in range(sizeBytes):
dLen = dLen + pow(2,i) * data[sizeBytes-i-1]
data = str(conn.recv(dLen),'UTF-8')
I could simply do the reverse. I'm new to Python and feel that what I've done is probably longhand!!
1) Is there a better way of receiving and decoding the length?
2) What's the "sister" method to encode the length for transmission?
Thanks.
The struct module is helpful here
for writing:
import struct
msg = 'some message containing 260 ascii characters'
length = len(msg)
encoded_length = struct.pack('>I', length)
encoded_length will be a string of 4 bytes with value '\x00\x00\x01\x04'
for reading:
length = struct.unpack('>I', received_msg[:4])[0]
An example using asyncio:
import asyncio
import struct
def send_message(writer, message):
data = message.encode()
size = struct.pack('>L', len(data))
writer.write(size + data)
async def receive_message(reader):
data = await reader.readexactly(4)
size = struct.unpack('>L', data)[0]
data = await reader.readexactly(size)
return data.decode()
The complete code is here
I am having some trouble dissecting a UDP packet. I am receiving the packets and storing the data and sender-address in variables 'data' and 'addr' with:
data,addr = UDPSock.recvfrom(buf)
This parses the data as a string, that I am now unable to turn into bytes. I know the structure of the datagram packet which is a total of 28 bytes, and that the data I am trying to get out is in bytes 17:28.
I have tried doing this:
mybytes = data[16:19]
print struct.unpack('>I', mybytes)
--> struct.error: unpack str size does not match format
And this:
response = (0, 0, data[16], data[17], 6)
bytes = array('B', response[:-1])
print struct.unpack('>I', bytes)
--> TypeError: Type not compatible with array type
And this:
print "\nData byte 17:", str.encode(data[17])
--> UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in position 0: ordinal not in range(128)
More specifically I want to parse what I think is an unsigned int. And now I am not sure what to try next. I am completely new to sockets and byte-conversions in Python, so any advice would be helpful :)
Thanks,
Thomas
An unsigned int32 is 4 bytes long, so you have to feed 4 bytes into struct.unpack.
Replace
mybytes = data[16:19]
with
mybytes = data[16:20]
(right number is the first byte not included, i.e. range(16,19) = [16,17,18]) and you should be good to go.