Python parsing serial hex string with fixed format

Python parsing serial hex string with fixed format - python

I am successfully communicating with a simple device over serial, with a specific request packet, and a fixed format return packet is received. Im starting with python so I can use this on multiple devices, I'm only really used to PHP/C.
For example, I send the following as hex:
12 05 0b 03 1f
and in return I get
12 05 0b 03 1f 12 1D A0 03 18 00 22 00 00 CA D4 4F 00 00 22 D6 99 18 00 70 80 00 80 00 06 06 00 00 D9
I know how the packet is constructed, the first 5 bytes is the data that was sent. The next 3 bytes are an ID, the packet length, and a response code. Its commented in my code here:
import serial, time
ser = serial.Serial(port='COM1', baudrate=9600, timeout=0, parity=serial.PARITY_EVEN, stopbits=serial.STOPBITS_ONE, bytesize=serial.EIGHTBITS)
while True:
# Send The Request - 0x12 0x05 0x0B 0x03 0x1F
ser.write("\x12\x05\x0B\x03\x1F")
# Clear first 5 bytes (Original Request is Returned)
ser.read(5)
# Response Header (3 bytes)
# - ID (Always 12)
# - Packet Length (inc these three)
# - General Response (a0 = Success, a1 = Busy, ff = Bad Command)
ResponseHeader = ser.read(3).encode('hex')
PacketLength = int(ResponseHeader[2:4],16)
if ResponseHeader[4:6] == "a0":
# Response Success
ResponseData = ser.read(PacketLength).encode('hex')
# Read First Two Bytes
Data1 = int(ResponseData[0:4],16)
print Data1
else:
# Clear The Buffer
RemainingBuffer = ser.inWaiting()
ser.read(RemainingBuffer)
time.sleep(0.12)
To keep it simple for now, I was just trying to read the first two bytes of the actual response (ResponseData), which should give me the hex 0318. I then want to output that as a decimal =792. The program is meant to run in a continuous loop.
Some of the variables in the packet are one byte, some are two bytes. Although, up to now I'm just getting an error:
ValueError: invalid literal for int() with base 16: ''
I'm guessing this is due to the format of the data/variables I have set, so not sure if I'm even going about this the right way. I just want to read the returned HEX data in byte form and be able to access them on an individual level, so I can format/output them as required.
Is there a better way to do this? Many thanks.

I recommend using the struct module to read binary data, instead of recoding it using string functions to hex and trying to parse the hex strings.

As your code stands now, you send binary (not hex) data over the wire, and receive binary (not hex) data back from the device. Then you convert the binary data to hex, only to convert it again to Python variables.
Let's skip the extra conversion step by using struct.unpack:
# UNTESTED
import struct
...
while True:
# Send The Request - 0x12 0x05 0x0B 0x03 0x1F
ser.write("\x12\x05\x0B\x03\x1F")
# Clear first 5 bytes (Original Request is Returned)
ser.read(5)
# Response Header (3 bytes)
# - ID (Always 12)
# - Packet Length (inc these three)
# - General Response (a0 = Success, a1 = Busy, ff = Bad Command)
ResponseHeader = ser.read(3)
ID,PacketLength,Response = struct.unpack("!BBB", ResponseHeader)
if Response == 0xa0:
# Response Success
ResponseData = ser.read(PacketLength)
# Read First Two Bytes
Data1 = struct.unpack("!H", ResponseData[0:2])
print Data1
else:
# Clear The Buffer
RemainingBuffer = ser.inWaiting()
ser.read(RemainingBuffer)

Related

How to read double, float and int values from binary files in python?

I have a binary file that was created in C++. The first value is double and the second is integer. I am reading the values fine using the following code in C++.
double dob_value;
int integer_value;
fread(&dob_value, sizeof(dob_value), 1, fp);
fread(&integer_value, sizeof(integer_value), 1, fp);
I am trying to read the same file in python but I am running into issues. My dob_value is 400000000.00 and my integer_value 400000. I am using following code in python for double.
def interpret_float(x):
return struct.unpack('d',x[4:]+x[:4])
with open(file_name, 'rb') as readfile:
dob = readfile.read(8)
dob_value = interpret_float(dob)[0]
val = readfile.read(4)
test2 = readfile.read(4)
integer_value = int.from_bytes(test2, "little")
My dob_value is 400000000.02384186 . My question is where is this extra decimals coming from? Also, how do I get the correct integer_value? With above code, my integer_value is 1091122467. I also have float values after integer but I haven't looked into that yet.

If the link goes broken and just in case the test.bin contains 00 00 00 00 84 D7 B7 41 80 1A 06 00 70 85 69 C0.
Your binary contains correct 41B7D78400000000 hexadecimal representation of 400000000.0 in the first 8 bytes. Running
import binascii
import struct
fname = r'test.bin'
with open(fname, 'rb') as readfile:
dob = readfile.read(8)
print(struct.unpack('d', dob)[0])
print(binascii.hexlify(dob))
outputs
>> 400000000.0
>> b'0000000084d7b741'
which is also correct little endian representation of the double. When you swap parts, you get
print(binascii.hexlify(dob[4:]+dob[:4]))
>> b'84d7b74100000000'
and if you check the decimal value, it will give you 5.45e-315, not what you expect. Moreover,
struct.unpack('d', dob[4:]+dob[:4])[0]
>>5.44740625e-315
So I'm not sure how you could get 400000000.02384186 from the code above. However, to obtain 400000000.02384186 using your test.bin, just skip the four bytes in the beginning:
with open(fname, 'rb') as readfile:
val = readfile.read(4)
dob = readfile.read(8)
dob = dob[4:]+dob[:4]
print(binascii.hexlify(dob))
print(struct.unpack('d', dob)[0])
>>b'801a060084d7b741'
>>400000000.02384186
Binary value 0x41B7D78400061A80 corresponds to 400000000.02384186. So you first read incorrect bytes, then incorrectly swap parts and get a result close to what you expect. Considering integer value, the 400000 is 0x00061A80, which is also present in the binary, but you definitely read past that bytes, since you used them for double, so you get wrong values.

Converting broken byte string from unicode back to corresponding bytes

The following code retrieves an iterable object of strings in rows which contains a PDF byte stream. The string row was type of str. The resulting file was a PDF format and could be opened.
with open(fname, "wb") as fd:
for row in rows:
fd.write(row)
Due to a new C-Library and changes in the Python implementation the str changes to unicode. And the corresponding content changed as well so my PDF file is broken.
Starting bytes of first row object:
old row[0]: 25 50 44 46 2D 31 2E 33 0D 0A 25 E2 E3 CF D3 0D 0A ...
new row[0]: 25 50 44 46 2D 31 2E 33 0D 0A 25 C3 A2 C3 A3 C3 8F C3 93 0D 0A ...
I adjust the corresponding byte positions here so it looks like a unicode problem.
I think this is a good start but I still have a unicode string as input...
>>> "\xc3\xa2".decode('utf8') # but as input I have u"\xc3\xa2"
u'\xe2'
I already tried several calls of encode and decode so I need a more analytical way to fix this. I can't see the wood for the trees. Thank you.

When you find u"\xc3\xa2" in a Python unicode string, it often means that you have read an UTF-8 encoded file as is it was Latin1 encoded. So the best thing to do is certainly to fix the initial read.
That being said if you have to depend on broken code, the fix is still easy: you just encode the string as Latin1 and then decode it as UTF-8:
fixed_u_str = broken_u_str.encode('Latin1').decode('UTF-8')
For example:
u"\xc3\xa2\xc3\xa3".encode('Latin1').decode('utf8')
correctly gives u"\xe2\xe3" which displays as âã

This looks like you should be doing
fd.write(row.encode('utf-8'))
assuming the type of row is now unicode (this is my understanding of how you presented things).

How to read hex values at specific addresses in Python?

Say I have a file and I'm interested in reading and storing hex values at certain addresses, like the snippet below:
22660 00 50 50 04 00 56 0F 50 25 98 8A 19 54 EF 76 00
22670 75 38 D8 B9 90 34 17 75 93 19 93 19 49 71 EF 81
I want to read the value at 0x2266D, and be able to replace it with another hex value, but I can't understand how to do it. I've tried using open('filename', 'rb'), however this reads it as the ASCII representation of the values, and I don't see how to pick and choose when addresses I want to change.
Thanks!
Edit: For an example, I have
rom = open("filename", 'rb')
for i in range(5):
test = rom.next().split()
print test
rom.close()
This outputs: ['NES\x1a', 'B\x00\x00\x00\x00\x00\x00\x00\x00\x00!\x0f\x0f\x0f(\x0f!!', '!\x02\x0f\x0f', '!\x0f\x01\x08', '!:\x0f\x0f\x03!', '\x0f', '\x0f\x0f', '!', '\x0f\x0f!\x0f\x03\x0f\x12', '\x0f\x0f\x0f(\x02&%\x0f', '\x0f', '#', '!\x0f\x0f1', '!"#$\x14\x14\x14\x13\x13\x03\x04\x0f\x0f\x03\x13#!!\x00\x00\x00\x00\x00!!', '(', '\x0f"\x0f', '#\x14\x11\x12\x0f\x0f\x0f#', '\x10', "5'4\x0270&\x02\x02\x02\x02\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f126&\x13\x0f\x0f\x0f\x13&6222\x0f", '\x1c,', etc etc.
Much more than 5 bytes, and while some of it is in hex, some has been replaced with ASCII.

There's no indication that some of the bytes were replaced by their ASCII representations. Some bytes happen to be printable.
With a binary file, you can simply seek to the location offset and write the bytes in. Working with the line-iterator in the case of binary file is problematic, as there's no meaningful "lines" in the binary blob.
You can do in-place editing like follows (in fake Python):
with open("filename", "rb+") as f:
f.seek(0x2266D)
the_byte = f.read(1)
if len(the_byte) != 1:
# something's wrong; bolt out ...
else
transformed_byte = your_function(the_byte)
f.seek(-1, 1) # back one byte relative to the current position
f.write(transformed_byte)
But of course, you may want to do the edit on a copy, either in-memory (and commit later, as in the answer of #JosepValls), or on a file copy. The problem with gulping the whole file in memory is, of course, sometimes the system may choke ;) For that purpose you may want to mmap part of the file.

Given that is not a very big file (roms should fit fine in today's computer's memory), just do data = open('filename', 'rb').read(). Now you can do whatever you want to the data (if you print it, it will show ascii, but that is just data!). Unfortunately, string objects don't support item assignment, see this answer for more:
Change one character in a string in Python?
In your case:
data = data[0:0x2266C] + str(0xFF) + data[0x2266D:]

Python rawkit how read metadata values from RAW file?

I'm writing python script and I need to obtain exif information from raw photo file (.CR2 for example).
I found Python Rawkit offer the ability to do that.
with Raw(filename=image_path) as raw:
print raw.metadata
Metadata(aperture=-1.2095638073643314e+38, timestamp=4273602232L,
shutter=-1.1962713245823862e+38, flash=True,
focal_length=-1.2228562901462766e+38, height=3753,
iso=-1.182978841800441e+38,
make='Canon', model='EOS 5D Mark II',
orientation=0, width=5634)
But I'm a little bit confused, how read this values ?. For example I'm expecting iso value like 100/200/400 but what is -1.182978841800441e+38 ?
My question is not specific for iso, it's also for shutter, aperture, ...
I ckecked libraw and rawkit doc but was not able to find how read / convert this kind of values.
This part in the doc is not very detailed :
float iso_speed;
ISO sensitivity.
float shutter;
Shutter speed.
Can someone help me understand how to read these values?
Thanks
[Update]
As neo suggest, I will use ExifRead. In fact it's a better choice, I'm writting a python script. With ExifRead no need of extra C library dependency.
I was able to open Canon raw file and parse Exif but unfortunately facing a wrong value for the aperture :
EXIF ApertureValue (Ratio): 3
# My photo was taken in 2.8 (maybe a rounded value on this flag ?)
Quick answer : use Fnumber flag
EXIF FNumber (Ratio): 14/5
14/5 is in fact 2.8 (do the math)
Long answer (how I found / debug that) :
Reading this exelent link Understanding What is stored in a Canon RAW .CR2 file, How and Why ( http://lclevy.free.fr/cr2/ ) I decided to decode myself and know what is going on.
This link send me on the graal to decode a raw file cr2_poster.pdf
From that I thought the best value seems to be in my canon specific MakerNote section on the FNumber value. (All values description is here canon_tags)
Tag Id : 3 (In fact 0x0003 that you write 0x3)
Name : FNumber
I opened my file with an Hexa editor (hexedit) and ... I was totally lost.
Key things :
An offset is a address in the file that will contain your value.
Read : C8 05 in the file should be read 05C8. Example for an offset, the address is 0x5C8
With that found the MakeNote section is easy.
Quick way is to search directly the 0x927c MarkerNote (so in the file 7C 92) flag that contain the address of the MakerNote section.
If you are not able to found that, go throught the IFD section to find the EXIF subsection. And then in that subsection you will find the MakerNote section
Tag Type Count Value
7C 92 07 00 B8 A0 00 00 84 03 00 00
Offset : 84 03 00 00 -> 00 00 03 84 (0x384 address)
Go to this address and search in the MakerNote section the FNumber 0x3
Tag Type Count Value
03 00 03 00 04 00 00 00 C8 05 00 00
Go to the offset 0x5C8 to find our value (count 4 x type 3 ushort, 16 bits)
0x0x5C8 : 00 00 00 00 00 00 00 00
And ... fail, in fact my canon does not filled this section.
Reading http://www.exiv2.org/tags.html The FNumber can be found in EXIF subsection.
Do the same process to find the EXIF subsection and the tag "0x829d Exif.Image.FNumber type 5 Rational"
Rational type is composed of 64 bits (numerator and denominator ulongs) Rational_data_type
Tag Type Count Value
9D 82 05 00 01 00 00 00 34 03 00 00
And then read the 0x334 offset
1C 00 00 00 0A 00 00 00
As we can read in Hexa : 0x1C / 0XA
In decimal, do the math : 28/10 = 14/5 = 2.8
Verify I have this value in ExifRead
EXIF.py 100EOS5D/IMG_8813.CR2 -vv | grep -i 14/5
EXIF FNumber (Ratio): 14/5
And voila !
I was looking for 2.8 float and this value is stored in fraction format. So the library don't do the math and just simplify the fraction.
This is why we have 14/5 and not 2.8 as expected.

I suggest you use a library that is focused on EXIF reading. The stuff available in libraw/rawkit is really just a nice extra. I can recommend the ExifRead library. It's pure Python and also damn fast. And it gives you better to understand values.

If compatibility with many formats is more of an issue to you than performance you could call exiftool as a subprocess with -j option to give you a json string which you can turn into a dictionary.
That should set you up for most raw formats and even stuff that isn't images at all. And it is going to squeeze every last bit of exif info out of the file. However in comparison with other options it is rather sluggish (like 200x slower):
from PIL import Image
import PIL.ExifTags
import subprocess
import json
import datetime
import exifread
filePath = "someImage.jpg"
filePath = "someRawImage.CR2"
filePath = "someMovie.mov"
filePath = "somePhotoshopImage.psd"
try:
start = datetime.datetime.now()
img = Image.open(filePath)
exif_0 = {
PIL.ExifTags.TAGS[k]: v
for k, v in img.getexif().items()
if k in PIL.ExifTags.TAGS
}
end = datetime.datetime.now()
print("Pillow time:")
print(end-start)
print(str(len(exif_0)), "tags retrieved")
print (exif_0, "\n")
except:
pass
try:
start = datetime.datetime.now()
exif_1 = json.loads(subprocess.run(["/usr/local/bin/exiftool", "-j", filePath], stdout=subprocess.PIPE).stdout.decode("utf-8"))
end = datetime.datetime.now()
print("subprocess time:")
print(end-start)
print(str(len(exif_1[0])), "tags retrieved")
print(exif_1, "\n")
except:
pass
try:
start = datetime.datetime.now()
f = open(filePath, "rb")
exif_2 = exifread.process_file(f)
end = datetime.datetime.now()
print("Exifread time:")
print(end-start)
print(str(len(exif_2)), "tags retrieved")
print(exif_2, "\n")
except:
pass

How do I force recv() in Socket to NOT convert my hex values into ASCII if it can (python)

I am using python 3.4 socket interface of python-can. I am having a problem, when I receive the data via recv() or recvfrom() it converts some of the hex data in the message to ASCII if it can for example '63' becomes a 'c'. I do not want this, I want the raw hex data.
Here is a snippet part of the code:
def dissect_can_frame(frame):
can_id, can_dlc, data = struct.unpack(can_frame_fmt, frame)
global dataS
dataS = data[:can_dlc]
return (can_id, can_dlc, data[:can_dlc])
s = socket.socket(socket.AF_CAN,socket.SOCK_RAW,socket.CAN_RAW)
print(s)
s.bind((can_interface,))
#s.bind((sys.argv[1],)) #used for 'can0' as argument at initial execution
print(socket.AF_CAN,",",socket.SOCK_RAW,",",socket.CAN_RAW)
#while True:
cf, addr = s.recvfrom(4096)
print(cf,',',addr)
I get "b'\x18c\xd8\xd6\x1f\x01 \x18'" as the output section of the data instead of "18 63 D8 D6 1F 01 20 18". Do not care about the formatting but notice how '63' has become 'c' and '20' has inserted a space. Can I stop it doing this?
Is it common for socket to convert the data rather than producing the raw data?
Thank you for any help.

That's just how the data looks when it comes out of recv. If you want to convert it into a hex-looking string, then you can use format on each character:
>>> s = b'\x18c\xd8\xd6\x1f\x01 \x18'
>>> " ".join(["{:02X}".format(ord(c)) for c in s])
'18 63 D8 D6 1F 01 20 18'
Of course, this is an inconvenient format for actually doing any kind of analysis on the data. But it looks nice for display purposes.
Alternatively, there's hexlify, but that doesn't space out the values for you:
>>> import binascii
>>> binascii.hexlify(s)
'1863d8d61f012018'

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.