Best approach to parse buffer string in python

Best approach to parse buffer string in python - python

I'm working on an embedded system that sends commands via Uart.
Uart works at 115200 baud
On PC side I want to read these commands, parse them and execute the related action.
I choose python as language to build a script.
This is a typical command received from the embedded system:
S;SEND;40;{"ID":"asg01","T":1,"P":{"T":180}};E
Each message starts with S and ends with E.
The command associated to the message is "SEND" and the payload length is 40.
My idea is read the bytes coming from the UART and:
check if the message starts with S
check if the message ends with E
if the above assumptions are true, split the message in order to find the command and the payload.
Which is the best way to parse the all bytes coming from an asynchronous uart?
My concern regards the lost of message due to wrong (or slow) parsing.
Thanks for the help!
BR,
Federico

In my day job, I wrote the software for an embedded system and a PC communicating with each other by a USB cable, using the UART protocol at 115,200 baud.
I see that you tagged your post with PySerial, so you already know about Python's most popular package for serial port communication. I will add that if you are using PyQt, there's a serial module included in that package as well.
115,200 baud is not fast for a modern desktop PC. I doubt that any parsing you do on the PC side will fail to keep up. I parse data streams and plot graphs of my data in real time using PyQt.
What I have noticed in my work with communication between an embedded system and a PC over a UART is that some data gets corrupted occasionally. A byte can be garbled, repeated, or dropped. Also, even if no bytes are added or dropped, you can occasionally perform a read while only part of a packet is in the buffer, and the read will terminate early. If you use a fixed read length of 40 bytes and trust that each read will always line up exactly with a data packet as you show above, you will frequently be wrong.
To solve these kinds of problems, I wrote a FIFO class in Python which consumes serial port data at the head of the FIFO, yields valid data packets at the tail, and discards invalid data. My FIFO holds 3 times as many bytes as my data packets, so if I am looking for packet boundaries using specific sequences, I have plenty of signposts.
A few more recommendations: work in Python 3 if you have the choice, it's cleaner. Use bytes and bytearray objects. Don't use str, because you will find yourself converting back and forth between Unicode and ASCII.

This format is almost parseable as a csv, but not quite, because the fourth field is JSON, and you may not be able to guarantee that the JSON doesn't contain any strings with embedded semicolons. So, I think you probably want to just use string (or, rather, bytes) manipulation functions:
def parsemsg(buf):
s, cmd, length, rest = buf.split(b';', 3)
j, _, e = rest.rpartition(b';')
if s != b'S' or e != b'E':
raise ValueError('must start with S and end with E')
return cmd.decode('utf-8'), int(length), json.loads(j)
Then:
>>> parsemsg(b'S,SEND,40,{"ID":"asg01","T":1,"P":{"T":180}},E')
('SEND', 40, {'ID': 'asg01', 'T': 1, 'P': {'T': 180}})
The actual semicolon-parsing part takes 602ns on my laptop, The decode and int raise that to 902ns. The json.loads, on the other hand, takes 10us. So, if you're worried about performance, the JSON part is really the only part that matters (trying third-party JSON libs I happen to have installed, the fastest one is still 8.1us, which isn't much better). You might as well keep everything else simple and robust.
Also, considering that you're reading this at 115000 baud, you can't get these messages any faster than about 6ms, so spending 11us parsing them is not even close to a problem in the first place.

Related

How to maintain type consistency when transferring between Max/MSP and Python via UDP?

I'm new to Python and trying to send a list of floats from Max/MSP but all I receive is some encrypted jargon on the other side. For example if I tried to send
-64.463172 24.633138 10.054035 -2.445868 -7.855343 -8.22241 -7.066427 -5.288864 -2.530465 0.458666 2.289094 2.566208 1.953798 1.114607 0.296125 -0.339662 -0.604555 -0.518344 -0.328184 -0.239883 -0.265401 -0.312797 -0.300493 -0.189546
I receive
b'list\x00\x00\x00\x00,ffffffffffffffffffffffff\x00\x00\x00\xc2\x80\xed%A\xc5\x10\xabA \xddT\xc0\x1c\x89\x1a\xc0\xfb^\xf8\xc1\x03\x8e\xfe\x
c0\xe2 ,\xc0\xa9>_\xc0!\xf3%>\xea\xd6B#\x12\x80\x86#$<\xc1?\xfa\x16\x0f?\x8e\xabt>\x97\x9d\xc1\xbe\xad\xe8=\xbf\x1a\xc4\x1e\xbf\x04\xb2*\xb
e\xa8\x07\xc3\xbeu\xa3\xcb\xbe\x87\xe2\x96\xbe\xa0&\xe8\xbe\x99\xda.\xbeB\x18^'
A similar question was asked here
max/msp to ruby via udp message format
but that was dealing with integers and it was easy enough to parse and get the right number out, but I have no idea how to decode this. Any help?

I discovered that the object I was using ([udpsend]) uses the OSC protocol and that's what was responsible for all the excess encryption. I switched to [MXJ net.udp.send] and the numbers came through cleanly, except for being surrounded by apostrophes with a b in front. This apparently indicates that it's in bytes and can be removed with bytes.decode().

The Sadam object library contains several flawlessly working UDP objects. As you said the OSC objects are not made for raw UDP communication.

Pyserial Sending More than One Byte

First time poster.
Before I start, I just want to say that I am a beginner programmer so bear with me, but I can still follow along quite well.
I have a wireless device called a Pololu Wixel, which can send and receive data wirelessly. I'm using two of them. One to send and one to receive. It's USB so it can plug straight into my Raspberry Pi or PC, so all I have to do is connect to the COM port through a terminal to read and write data to it. It comes with a testing terminal program that allows me to send 1-16 bytes of info. I've done this and I've sent and received 2 bytes (which is what I need) with no problem.
Now here's my problem: When I open up the Ubuntu terminal and use Pyserial to connect to the correct sending Wixel COM Port and write a value larger than 255, my receiving COM port, which is connected to another instance of Terminal also using Pyserial, doesn't read the right value, hence I think I'm not being able to read and write two bytes, but only one. After doing something reading online in the pyserial documentation, I believe, not know, that Pyserial can only read and write 5,6,7, or 8 bits at a time.
I hope my problem is obvious now. How the heck can I write 2 bytes worth of info to the COM port to my device and send it to the other device which needs to read those 2 bytes through, all using pyserial?
I hope this all makes sense, and I would greatly appreciate any help.
Thanks
UPDATE
Okay, I think I've got this going now. So I did:
import serial
s=serial.Serial(3) //device #1 at COM Port 4 (sending)
r=serial.Serial(4) //device #4 at COM Port 5 (receiving)
s.timeout=1
r.timeout=1
s.write('0x80')
r.readline()
//Output said: '0x80'
s.write('hh')
r.readline()
//Output said: 'hh'
Honestly, I think this solves my problem. Maybe there never was a problem to begin with. Maybe I can take my 16bit binary data from the program, example "1101101011010101", turn it into characters (I've seen something called char() before I think that's it)
then use s.write('WHATEVER')
then use r.readline() and convert back to binary

You'll likely need to pull your number apart into multiple bytes, and send the pieces in little endian or big endian order.
EG:
low_byte = number % 256
high_byte = number // 256
That should get you up to 65535. You can reconstruct the number on the other side with high_byte * 256 + low_byte.

How can I deserialize incoming data on the TCP server?

I've set up a server reusing the code found in the documentation where I have self.data = self.request.recv(1024).strip().
But how do I go from this, deserialize it to protobuf message (Message.proto/Message_pb2.py). Right now it seems that it's receiving chunks of 1024 bytes, and that more then one at the time... making it all rubbish :D

TCP is typically just a stream of data. Just because you sent each packet as a unit, doesn't mean the receiver gets that. Large messages may be split into multiple packets; small messages may be combined into a single packet.
The only way to interpret multiple messages over TCP is with some kind of "framing". With text-based protocols, a CR/LF/CRLF/zero-byte might signify the end of each frame, but that won't work with binary protocols like protobuf. In such cases, the most common approach is to simply prepend each message with the length, for example in a fixed-size (4 bytes?) network-byte-order chunk. Then the payload. In the case of protobuf, the API for your platform may also provide a mechanism to write the length as a "varint".
Then, reading is a matter of:
read an entire length-header
read (and buffer) that many bytes
process the buffered data
rinse and repeat
But keeping in mind that you might have (in a single packet) the end of one message, 2 complete messages, and the start of another message (maybe half of the length-header, just to make it interesting). So: keeping track of exactly what you are reading at any point becomes paramount.

Pyserial buffer fills faster than I can read

I am reading data from a microcontroller via serial, at a baudrate of 921600. I'm reading a large amount of ASCII csv data, and since it comes in so fast, the buffer get's filled and all the rest of the data gets lost before I can read it. I know I could manually edit the pyserial source code for serialwin32 to increase the buffer size, but I was wondering if there is another way around it?
I can only estimate the amount of data I will receive, but it is somewhere around 200kB of data.

Have you considered reading from the serial interface in a separate thread that is running prior to sending the command to uC to send the data?
This would remove some of the delay after the write command and starting the read. There are other SO users who have had success with this method, granted they weren't having buffer overruns.
If this isn't clear let me know and I can throw something together to show this.
EDIT
Thinking about it a bit more, if you're trying to read from the buffer and write it out to the file system even the standalone thread might not save you. To minimize the processing time you might consider reading say 100 bytes at a time serial.Read(size=100) and pushing that data into a Queue to process it all after the transfer has completed
Pseudo Code Example
def thread_main_loop(myserialobj, data_queue):
data_queue.put_no_wait(myserialobj.Read(size=100))
def process_queue_when_done(data_queue):
while(1):
if len(data_queue) > 0:
poped_data = data_queue.get_no_wait()
# Process the data as needed
else:
break;

There's a "Receive Buffer" slider that's accessible from the com port's Properties Page in Device Manager. It is found by following the Advanced button on the "Port Settings" tab.
More info:
http://support.microsoft.com/kb/131016 under heading Receive Buffer
http://tldp.org/HOWTO/Serial-HOWTO-4.html under heading Interrupts
Try knocking it down a notch or two.

You do not need to manually change pyserial code.
If you run your code on Windows platform, you simply need to add a line in your code
ser.set_buffer_size(rx_size = 12800, tx_size = 12800)
Where 12800 is an arbitrary number I chose. You can make receiving(rx) and transmitting(tx) buffer as big as 2147483647a
See also:
https://docs.python.org/3/library/ctypes.html
https://msdn.microsoft.com/en-us/library/system.io.ports.serialport.readbuffersize(v=vs.110).aspx
You might be able to setup the serial port from the DLL
// Setup serial
mySerialPort.BaudRate = 9600;
mySerialPort.PortName = comPort;
mySerialPort.Parity = Parity.None;
mySerialPort.StopBits = StopBits.One;
mySerialPort.DataBits = 8;
mySerialPort.Handshake = Handshake.None;
mySerialPort.RtsEnable = true;
mySerialPort.ReadBufferSize = 32768;
Property Value
Type: System.Int32
The buffer size, in bytes. The default value is 4096; the maximum value is that of a positive int, or 2147483647
And then open and use it in Python

I am somewhat surprised that nobody has yet mentioned the correct solution to such problems (when available), which is effective flow control through either software (XON/XOFF) or hardware flow control between the microcontroller and its sink. The issue is well described by this web article.
It may be that the source device doesn't honour such protocols, in which case you are stuck with a series of solutions that delegate the problem upwards to where more resources are available (move it from the UART buffer to the driver and upwards towards your application code). If you are losing data, it would certainly seem sensible to try and implement a lower data rate if that's a possibility.

For me the problem was it was overloading the buffer when receiving data from the Arduino.
All I had to do was mySerialPort.flushInput() and it worked.
I don't know why mySerialPort.flush() didn't work. flush() must only flush the outgoing data?
All I know is mySerialPort.flushInput() solved my problems.

"Message Must be multiple of 16" Encrypted Audio over the Network with Python

I am sending encrypted audio through the network in Python. This app works momentarily then breaks saying it must be a multiple of 16.
Not sure what I am doing wrong. Or where to look in the code to solve this.
I would appreciate any help you have to offer
EDIT * I believe I have it working now if anyone is interested in taking a look I made a google code project
http://code.google.com/p/mii-chat/

msg = conn.recv(2024)
if msg:
cmd, msg = ord(msg[0]),msg[1:]
if cmd == CMD_MSG:
listb1.insert(END, decrypt_my_message(msg.strip()) + "\n")
The snippet above from your code reads 2024 bytes of data (which isn't a multiple of 16) and then (if the "if" statements are True) then calls decrypt_my_message with msg.strip() as the argument. Then decrypt_my_message complains that it was given a string whose length wasn't a multiple of 16. (I'm guessing that this is the problem. Have a look in the traceback to see if this is the line that causes the exception).
You need to call decrypt_my_message with a string of length n*16.
You might need to rethink your logic for reading the stream - or have something in the middle to buffer the calls to decrypt_my_message into chunks of n*16.

I did a quick scan of the code. All messages are sent after being encrypted, so the total data you send is a multiple of 16, plus 1 for the command. So far, so good.
On the decrypting side, you strip off the command, which leaves you with a message that is a multiple of 16 again. However, you are calling msg.strip() before you call decrypt_my_message. It is possible that the call to strip corrupts your encrypted data by removing bytes from the beginning or the end.
I will examine the code further, and edit this answer if I find anything else.
EDIT:
You are using space character for padding, and I suppose you meant to remove the padding using the strip call. You should change decrypt_my_message(msg.strip()) to decrypt_my_message(msg).strip().
You are using TCP to send the data, so your protocol is bound to give you headaches in the long term. I always send the length of the payload in my messages with this sort of custom protocol, so the receiving end can determine if it received the message block correctly. For example, you could use: CMD|LEN(2)|PAYLOAD(LEN) as your data frame. It means, one byte for command, two more bytes to tell the server how many bytes to expect, and LEN bytes of actual message. This way, your recv call can loop until it reads the correct amount. And more importantly, it will not attempt to read the next packet when/if they are sent back-to-back.
Alternatively, if your packets are small enough, you could go for UDP. It opens up another can of worms, but you know that the recv will only receive a single packet with UDP.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.