parsing ERF capture files in python

parsing ERF capture files in python - python

What is the best way of parsing ERF (endace) capture files in python? I found a libpcap wrapper for python but I do not think that lipcap supports ERF format.
Thanks!

Here's a simplistic ERF record parser which returns a dict per packet (I just hacked it together, so not extremely well tested. Not all flag fields are decoded, but the ones that aren't, aren't widely applicable):
NB:
ERF record types: 1 = HDLC, 2 = Ethernet, 3 = ATM, 4 = Reassembled AAL5, 5-7 multichannel variants with extra headers not processed here.
rlen can be less than wlen+len(header) if the snaplength is too short.
The interstitial loss counter is the number of packets lost between this packet and the previous captured packet as noted by the Dag packet processor when its input queue overflows.
Comment out the two scapy lines if you don't want to use scapy.
Code:
import scapy.layers.all as sl
def erf_records( f ):
"""
Generator which parses ERF records from file-like ``f``
"""
while True:
# The ERF header is fixed length 16 bytes
hdr = f.read( 16 )
if hdr:
rec = {}
# The timestamp is in Intel byte-order
rec['ts'] = struct.unpack( '<Q', hdr[:8] )[0]
# The rest is in network byte-order
rec.update( zip( ('type', # ERF record type
'flags', # Raw flags bit field
'rlen', # Length of entire record
'lctr', # Interstitial loss counter
'wlen'), # Length of packet on wire
struct.unpack( '>BBHHH', hdr[8:] ) ) )
rec['iface'] = rec['flags'] & 0x03
rec['rx_err'] = rec['flags'] & 0x10 != 0
rec['pkt'] = f.read( rec['rlen'] - 16 )
if rec['type'] == 2:
# ERF Ethernet has an extra two bytes of pad between ERF header
# and beginning of MAC header so that IP-layer data are DWORD
# aligned. From memory, none of the other types have pad.
rec['pkt'] = rec['pkt'][2:]
rec['pkt'] = sl.Ether( rec['pkt'] )
yield rec
else:
return

ERF records can contain optional Extension Headers which are appended to the 16 byte ERF record header. The high bit of the 'type' field indicates the presence of an Extension Header. I've added a test for the Extension Header to strix's example, along with a decode of the Extension Header itself. Note that the test for an Ethernet frame also needs to change slightly if an Extension Header is present.
Caveat: I believe that ERF records can contain multiple Extensions Headers, but I don't know to test for these. The Extension Header structure is not particularly well documented and the only records I have in captivity just contain a single extension.
import struct
import scapy.layers.all as sl
def erf_records( f ):
"""
Generator which parses ERF records from file-like ``f``
"""
while True:
# The ERF header is fixed length 16 bytes
hdr = f.read( 16 )
if hdr:
rec = {}
# The timestamp is in Intel byte-order
rec['ts'] = struct.unpack( '<Q', hdr[:8] )[0]
# The rest is in network byte-order
rec.update( zip( ('type', # ERF record type
'flags', # Raw flags bit field
'rlen', # Length of entire record
'lctr', # Interstitial loss counter
'wlen'), # Length of packet on wire
struct.unpack( '>BBHHH', hdr[8:] ) ) )
rec['iface'] = rec['flags'] & 0x03
rec['rx_err'] = rec['flags'] & 0x10 != 0
#- Check if ERF Extension Header present.
# Each Extension Header is 8 bytes.
if rec['type'] & 0x80:
ext_hdr = f.read( 8 )
rec.update( zip( (
'ext_hdr_signature', # 1 byte
'ext_hdr_payload_hash', # 3 bytes
'ext_hdr_filter_color', # 1 bye
'ext_hdr_flow_hash'), # 3 bytes
struct.unpack( '>B3sB3s', ext_hdr ) ) )
#- get remaining payload, less ext_hdr
rec['pkt'] = f.read( rec['rlen'] - 24 )
else:
rec['pkt'] = f.read( rec['rlen'] - 16 )
if rec['type'] & 0x02:
# ERF Ethernet has an extra two bytes of pad between ERF header
# and beginning of MAC header so that IP-layer data are DWORD
# aligned. From memory, none of the other types have pad.
rec['pkt'] = rec['pkt'][2:]
rec['pkt'] = sl.Ether( rec['pkt'] )
yield rec
else:
return

Related

Exclude escaped byte char from serial.read_until()

I'm writing code to communicate back and forth with a module over serial which returns specific byte values to indicate the start/end of its communication. The length of the data returned can vary as can all content between the start header and end footer.
In an ideal scenario, I'd be able to use the following code to receive all data from the module:
start = b'\x5a'
end = b'\x5b'
max_size = 1024
def get_from_serial(ser: serial.Serial) -> bytes:
with ser:
_ = ser.read_until(expected=start, size=max_size)
data = ser.read_until(expected=end, size=max_size)
return start + data
Unfortunately, there are circumstances where the data sent by the module includes bytes that match either the start or end byte values. In these instances, the module prepends an escape character to them:
valid_start = b'\x5a'
valid_end = b'\x5b'
escaped_start = b'\x5c\x5a'
escaped_end = b'\x5c\x5b'
A valid start/end byte can be preceded by ANY byte value other than an escape one:
good_result = b'\x5a\xff\x5c\x5b\xff\x5b'
bad_result = b'\x5a\xff\x5c\x5b' # missed b'\xff\x5b'
Is there a way to configure ser.read_until() to ignore any escaped instance of a start/end byte and only return when encountering a valid start/end byte?
There's probably a way to do this with a loop that checks if data[-2] == b'\x5c': each time ser.read_until() returns something though I feel it could get complicated if the module returns multiple instances of an escaped start/end byte scattered throughout the data.
Any thoughts or suggestions would be greatly appreciated.
Edit:
Starting to think this isn't actually possible to do from inside ser.read_until() so have added a check before returning the data.
start = b'\x5a'
end = b'\x5b'
escape = b'\x5c'
max_size = 1024
def get_from_serial(ser: serial.Serial) -> bytes:
with ser:
_ = ser.read_until(expected=start, size=max_size)
data = ser.read_until(expected=end, size=max_size)
if valid_packet(data):
return start + data
else:
raise Exception("Invalid packet")
def valid_packet(packet: bytearray) -> bool:
header = packet[:1]
footer = packet[-1:]
escape_check = packet[-2:-1]
valid_header = header == start
valid_footer = footer == end
not_escaped = escape_check != escape
return all([
valid_header,
valid_footer,
not_escaped
])

Weird Python output when printing

I'm very new to python and I'm trying to read pressure data from a Honeywell differential pressure sensor using the LibMPSSE library. I'm reading it from a Adafruit FT232H chip and I'm using python 2.7.6 on ubuntu Linux.
#!/usr/bin/env python
from mpsse import *
SIZE = 2 # 2 bytes MSB first
WCMD = "\x50" # Write start address command
RCMD = "\x51" # Read command
FOUT = "eeprom.txt" # Output file
try:
eeprom = MPSSE(I2C)
# print "%s initialized at %dHz (I2C)" % (eeprom.GetDescription(), eeprom.GetClock())
eeprom.Start()
eeprom.Write(WCMD)
# Send Start condition to i2c-slave (Pressure Sensor)
if eeprom.GetAck() == ACK:
# ACK received,resend START condition and set R/W bit to 1 for read
eeprom.Start()
eeprom.Write(RCMD)
if eeprom.GetAck() == ACK:
# ACK recieved, continue supply the clock to slave
data = eeprom.Read(SIZE)
eeprom.SendNacks()
eeprom.Read(1)
else:
raise Exception("Received read command NACK2!")
else:
raise Exception("Received write command NACK1!")
eeprom.Stop()
print(data)
eeprom.Close()
except Exception, e:
print "MPSSE failure:", e
According to the library, Read returns a string of size bytes and whenever I want to print the data, the only output I see is �����2��T_`ʋ�Q}�*/�eE�
. I've tried encoding with utf-8 and still no luck.

Python may print "weird" output when bytes/characters in the string (data in your case) contain non-printable characters. To "view" the content of the individual bytes (as integers or hex), do the following:
print(','.join(['{:d}'.format(x) for x in map(ord, data)])) # decimal
or
print(','.join(['{:02X}'.format(x) for x in map(ord, data)]))
Since the length of your data buffer is set by SIZE=2, to extract each byte from this buffer as an integer you can do the following:
hi, lo = map(ord, data)
# or:
hi = ord(data[0])
lo = ord(data[1])
To read more about ord and what it does - see https://docs.python.org/2/library/functions.html#ord

Python String Prefix by 4 Byte Length

I'm trying to write a server in Python to communicate with a pre-existing client whose message packets are ASCII strings, but prepended by four-byte unsigned integer values representative of the length of the remaining string.
I've done a receiver, but I'm sure there's a a more pythonic way. Until I find it, I haven't done the sender. I can easily calculate the message length, convert it to bytes and transmit the message.The bit I'm struggling with is creating an integer which is an array of four bytes.
Let me clarify: If my string is 260 characters in length, I wish to prepend a big-endian four byte integer representation of 260. So, I don't want the ASCII string "0260" in front of the string, rather, I want four (non-ASCII) bytes representative of 0x00000104.
My code to receive the length prepended string from the client looks like this:
sizeBytes = 4 # size of the integer representing the string length
# receive big-endian 4 byte integer from client
data = conn.recv(sizeBytes)
if not data:
break
dLen = 0
for i in range(sizeBytes):
dLen = dLen + pow(2,i) * data[sizeBytes-i-1]
data = str(conn.recv(dLen),'UTF-8')
I could simply do the reverse. I'm new to Python and feel that what I've done is probably longhand!!
1) Is there a better way of receiving and decoding the length?
2) What's the "sister" method to encode the length for transmission?
Thanks.

The struct module is helpful here
for writing:
import struct
msg = 'some message containing 260 ascii characters'
length = len(msg)
encoded_length = struct.pack('>I', length)
encoded_length will be a string of 4 bytes with value '\x00\x00\x01\x04'
for reading:
length = struct.unpack('>I', received_msg[:4])[0]

An example using asyncio:
import asyncio
import struct
def send_message(writer, message):
data = message.encode()
size = struct.pack('>L', len(data))
writer.write(size + data)
async def receive_message(reader):
data = await reader.readexactly(4)
size = struct.unpack('>L', data)[0]
data = await reader.readexactly(size)
return data.decode()
The complete code is here

read and stock various data from various usb devices in python

I am a beginner in python, and I am trying to read the data from several sensors (humidity, temperature, pressure sensors...) that I connect with a usb hub to my computer. My main goal is to record every five minutes the different values of those sensors and then store it to analyse it.
I have got all the data sheets and manuals of my sensors (which are from Hygrosens Instruments), I know how they work and what kind of data they are sending. But I do not know how to read them. Below is what I tried, using pyserial.
import serial #import the serial library
from time import sleep #import the sleep command from the time library
import binascii
output_file = open('hygro.txt', 'w') #create a file and allow you to write in it only. The name of this file is hygro.txt
ser = serial.Serial("/dev/tty.usbserial-A400DUTI", 9600) #load into a variable 'ser' the information about the usb you are listening. /dev/tty.usbserial.... is the port after plugging in the hygrometer, 9600 is for bauds, it can be diminished
count = 0
while 1:
read_byte = ser.read(size=1)
So now I want to find the end of the line of the data as the measurement informations that I need are in a line that begins with 'V', and if the data sheet of my sensor, it said that a line ends by , so I want to read one byte at a time and look for '<', then 'c', then 'r', then '>'. So I wanted to do this:
while 1:
read_byte = ser.read(size=8) #read a byte
read_byte_hexa =binascii.hexlify(read_byte) #convert the byte into hexadecimal
trad_hexa = int(read_byte_hexa , 16) #convert the hexadecimal into an int in purpose to compare it with another int
trad_firstcrchar = int('3c' , 16) #convert the hexadecimal of the '<' into a int to compare it with the first byte
if (trad_hexa == trad_firstcrchar ): #compare the first byte with the '<'
read_byte = ser.read(size=1) #read the next byte (I am not sure if that really works)
read_byte_hexa =binascii.hexlify(read_byte)# from now I am doing the same thing as before
trad_hexa = int(read_byte_hexa , 16)
trad_scdcrchar = int('63' , 16)
print(trad_hexa, end='/')# this just show me if it gets in the condition
print(trad_scdcrchar)
if (trad_hexa == trad_scdcrchar ):
read_byte = ser.read(size=1) #read the next byte
read_byte_hexa =binascii.hexlify(read_byte)
trad_hexa = int(read_byte_hexa , 16)
trad_thirdcrchar = int('72' , 16)
print(trad_hexa, end='///')
print(trad_thirdcrchar)
if (trad_hexa == trad_thirdcrchar ):
read_byte = ser.read(size=1) #read the next byte
read_byte_hexa =binascii.hexlify(read_byte)
trad_hexa = int(read_byte_hexa , 16)
trad_fourthcrchar = int('3e' , 16)
print(trad_hexa, end='////')
print(trad_fourthcrchar)
if (trad_hexa == trad_fourthcrchar ):
print ('end of the line')
But I am not sure that it works, I mean I think it does not have the time to read the second one, the second byte I am reading, it's not exactly the second one. So that's why I want to use a buffer, but I don't really get how I can do that. I am going to look for it, but if someone knows an easier way to do what I want, I am ready to try it!
Thank you

You seem to be under the impression that the end-of-line character for that sensor's communication protocol is 4 different characters: <, c, r and >. However, what is being referred to is the carriage return, often denoted by <cr> and in many programming languages just by \r (even though it looks like 2 characters, it represents just one character).
You could simplify your code greatly by reading in the data from the sensors line by line, as the protocol is structured. Here's something to help you get started:
import time
def parse_info_line(line):
# implement to your own liking
logical_channel, physical_probe, hardware_id, crc = [line[index:index+2] for index in (1, 3, 5, 19)]
serialno = line[7:19]
return physical_probe
def parse_value_line(line):
channel, crc = [line[ind:ind+2] for ind in (1,7)]
encoded_temp = line[3:7]
return twos_comp(int(encoded_temp, 16), 16)/100.
def twos_comp(val, bits):
"""compute the 2's compliment of int value `val`"""
if (val & (1 << (bits - 1))) != 0: # if sign bit is set e.g., 8bit: 128-255
val = val - (1 << bits) # compute negative value
return val # return positive value as is
def listen_on_serial(ser):
ser.readline() # do nothing with the first line: you have no idea when you start listening to the data broadcast from the sensor
while True:
line = ser.readline()
try:
first_char = line[0]
except IndexError: # got no data from sensor
break
else:
if first_char == '#': # begins a new sensor record
in_record = True
elif first_char == '$':
in_record = False
elif first_char == 'I':
parse_info_line(line)
elif first_char == 'V':
print(parse_value_line(line))
else:
print("Unexpected character at the start of the line:\n{}".format(line))
time.sleep(2)
The twos_comp function was written by travc and you are encouraged to upvote his answer when you have enough reputation and if you intend to use his code (and even if you won't, it's still a good answer, I upvoted it just now). The listen_on_serial could be improved as well (many Python programmers will recognize the switch-structure and implement it with a dictionary rather than if... elif... elif...), but this is only intended to get you started.
As a test, the following code extract simulates the sensor sending some data (which is line-delimited, using the carriage return as the end-of-line marker), which I copied from the pdf you linked to (FAQ_terminalfenster_E.pdf).
>>> import serial
>>> import io
>>>
>>> ser = serial.serial_for_url('loop://', timeout=1)
>>> serio = io.TextIOWrapper(io.BufferedRWPair(ser, ser), newline='\r', line_buffering=True)
>>> serio.write(u'A1A0\r' # simulation of starting to listen halfway between 2 records
... '$\r' # marks the end of the previous record
... '#\r' # marks the start of a new sensor record
... 'I0101010000000000001B\r' # info about a sensor's probe
... 'V0109470D\r' # data matching that probe
... 'I0202010000000000002B\r' # other probe, same sensor
... 'V021BB55C\r') # data corresponding with 2nd probe
73L
>>>
>>> listen_on_serial(serio)
23.75
70.93
>>>
Note that it is recommended by the pyserial docs to be using TextIOWrapper when the end-of-line character is not \n (the linefeed character), as was also answered here.

Content-Encoding: gzip + Transfer-Encoding: chunked with gzip/zlib gives incorrect header check

How do you manage chunked data with gzip encoding?
I have a server which sends data in the following manner:
HTTP/1.1 200 OK\r\n
...
Transfer-Encoding: chunked\r\n
Content-Encoding: gzip\r\n
\r\n
1f50\r\n\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\xec}\xebr\xdb\xb8\xd2\xe0\xef\xb8\xea\xbc\x03\xa2\xcc\x17\xd9\xc7\xba\xfa\x1e\xc9r*\x93\xcbL\xf6\xcc\x9c\xcc7\xf1\x9c\xf9\xb6r\xb2.H ... L\x9aFs\xe7d\xe3\xff\x01\x00\x00\xff\xff\x03\x00H\x9c\xf6\xe93\x00\x01\x00\r\n0\r\n\r\n
I've had a few different approaches to this but there's something i'm forgetting here.
data = b''
depleted = False
while not depleted:
depleted = True
for fd, event in poller.poll(2.0):
depleted = False
if event == select.EPOLLIN:
tmp = sock.recv(8192)
data += zlib.decompress(tmp, 15 + 32)
Gives (also tried decoding only data after \r\n\r\n obv):
zlib.error: Error -3 while decompressing data: incorrect header check
So I figured the data should be decompressed once the data has been recieved in it's whole format..
...
if event == select.EPOLLIN:
data += sock.recv(8192)
data = zlib.decompress(data.split(b'\r\n\r\n',1)[1], 15 + 32)
Same error. Also tried decompressing data[:-7] because of the chunk ID at the very end of the data and with data[2:-7] and other various combinations, but with the same error.
I've also tried the gzip module via:
with gzip.GzipFile(fileobj=Bytes(data), 'rb') as fh:
fh.read()
But that gives me "Not a gzipped file".
Even after recording down the data as recieved by the servers (headers + data) down into a file, and then creating a server-socket on port 80 serving the data (again, as is) to the browser it renders perfectly so the data is intact.
I took this data, stripped off the headers (and nothing else) and tried gzip on the file:
Thanks to #mark-adler I produced the following code to un-chunk the chunked data:
unchunked = b''
pos = 0
while pos <= len(data):
chunkLen = int(binascii.hexlify(data[pos:pos+2]), 16)
unchunked += data[pos+2:pos+2+chunkLen]
pos += 2+len('\r\n')+chunkLen
with gzip.GzipFile(fileobj=BytesIO(data[:-7])) as fh:
data = fh.read()
This produces OSError: CRC check failed 0x70a18ee9 != 0x5666e236 which is one step closer. In short I clip the data according to these four parts:
<chunk length o' X bytes> \r\n <chunk> \r\n
I'm probably getting there, but not close enough.
Footnote: Yes, the socket is far from optimal, but it looks this way because i thought i didn't get all the data from the socket so i implemented a huge timeout and a attempt at a fail-safe with depleted :)

You can't split on \r\n since the compressed data may contain, and if long enough, certainly will contain that sequence. You need to dechunk first using the length provided (e.g. the first length 1f50) and feed the resulting chunks to decompress. The compressed data starts with the \x1f\x8b.
The chunking is hex number, crlf, chunk with that many bytes, crlf, hex number, crlf, chunk, crlf, ..., last chunk (of zero length), [possibly some headers], crlf.

#mark-adler gave me some good pointers on how the chunked mode in the HTML protocol works, besides this i fiddled around with different ways of unzipping the data.
You're supposed to stitch the chunks into one big heap
You're supposed to use gzip not zlib
You can only unzip the entire stitched chunks, doing it in parts will not work
Here's the solution for all three of the above problems:
unchunked = b''
pos = 0
while pos <= len(data):
chunkNumLen = data.find(b'\r\n', pos)-pos
# print('Chunk length found between:',(pos, pos+chunkNumLen))
chunkLen=int(data[pos:pos+chunkNumLen], 16)
# print('This is the chunk length:', chunkLen)
if chunkLen == 0:
# print('The length was 0, we have reached the end of all chunks')
break
chunk = data[pos+chunkNumLen+len('\r\n'):pos+chunkNumLen+len('\r\n')+chunkLen]
# print('This is the chunk (Skipping',pos+chunkNumLen+len('\r\n'),', grabing',len(chunk),'bytes):', [data[pos+chunkNumLen+len('\r\n'):pos+chunkNumLen+len('\r\n')+chunkLen]],'...',[data[pos+chunkNumLen+len('\r\n')+chunkLen:pos+chunkNumLen+len('\r\n')+chunkLen+4]])
unchunked += chunk
pos += chunkNumLen+len('\r\n')+chunkLen+len('\r\n')
with gzip.GzipFile(fileobj=BytesIO(unchunked)) as fh:
unzipped = fh.read()
return unzipped
I left the debug output in there but uncommented for a reason.
It was extremely useful even tho it looks like a mess to get what data you/i was actually trying to decompress and which parts was fetched where and which values each calculation brings fourth.
This code will walk through the chunked data with the following format:
<chunk length o' X bytes> \r\n <chunk> \r\n
Had to be careful when first of all extracting the X bytes as they came in 1f50 which i first had to use binascii.hexlify(data[0:4]) on before putting it into int(), not sure why i don't need that anymore because i needed it in order to get a length of ~8000 before but then it gave me a REALLY big number all of a sudden which was't logical even tho i didn't really give it any other data.. anyway.
After that it was just a matter of making sure the numbers were correct and then combine all the chunks into one hughe pile of gzip data and feed that into .GzipFile(...).
Edit 3 years later:
I'm aware that this was a client-side problem at first, but here's a server-side function to send a some what functional test:
def http_gzip(data):
compressed = gzip.compress(data)
# format(49, 'x') returns `31` which is `\x31` but without the `\x` notation.
# basically the same as `hex(49)` but ment for these kind of things.
return bytes(format(len(compressed), 'x')),'UTF-8') + b'\r\n' + compressed + b'\r\n0\r\n\r\n'

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

parsing ERF capture files in python - python

What is the best way of parsing ERF (endace) capture files in python? I found a libpcap wrapper for python but I do not think that lipcap supports ERF format. Thanks!

Related

Exclude escaped byte char from serial.read_until()

Weird Python output when printing

Python String Prefix by 4 Byte Length

read and stock various data from various usb devices in python

Content-Encoding: gzip + Transfer-Encoding: chunked with gzip/zlib gives incorrect header check

Categories

Resources