Convert hex string to float - python

I am trying to read from a file a bunch of hex numbers.
lines ='4005297103CE40C040059B532A7472C440061509BB9597D7400696DBCF1E35CC4007206BB5B0A67B4007AF4B08111B87400840D4766460524008D47E0FFB4ABA400969A572EBAFE7400A0107CCFDF50E'
dummy = [lines[index][i:i+16] for i in range(0, len(lines[index]),16)]
rdummy=[]
for elem in dummy[:-1]:
rdummy.append(int(elem,16))
these are 10 number of 16 digits
in particular when reading the first one, I have:
print(dummy[0])
4005297103CE40C0
now I would like to convert it to float
I have an IDL script that when reading this number gives 2.64523509
the command used in IDL is
double(4613138958682833088,0)
where it appers 0 is an offset used when converting.
is there a way to do this in python?

you probably want to use the struct package for this, something like this seems to work:
import struct
lines ='4005297103CE40C040059B532A7472C440061509BB9597D7400696DBCF1E35CC4007206BB5B0A67B4007AF4B08111B87400840D4766460524008D47E0FFB4ABA400969A572EBAFE7400A0107CCFDF50E'
for [value] in struct.iter_unpack('>d', bytes.fromhex(lines)):
print(value)
results in 2.64523509 being printed first which seems about right

Related

How to perform SHA-256 on binary values with Hashlib?

I’m using Python 2 and am attempting to performing sha256 on binary values using hashlib.
I’ve become a bit stuck as I’m quite new to it all but have cobbled together:
hashlib.sha256('0110100001100101011011000110110001101111’.decode('hex')).hexdigest()
I believe it interprets the string as hex based on substituting the hex value (‘68656c6c6f’) into the above and it returning
2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
And comparing to this answer in which ‘hello’ or ‘68656c6c6f’ is used.
I think the answer lies with the decode component but I can’t find an example for binary only ‘hex’ or ‘utf-8’
Is anyone able to suggest what needs to be changed so that the function interprets as binary values instead of hex?
Here is code that does each of the data conversions you are looking for. These steps can all be combined, but are separated here so you can see each value.
import hashlib
import binascii
binstr = '0110100001100101011011000110110001101111'
hexstr = "{0:0>4X}".format(int(binstr,2)) # '68656C6C6F'
data = binascii.a2b_hex(hexstr) # 'hello'
output = hashlib.sha256(data).hexdigest()
print output
OUTPUT:
2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

hex header of file, magic numbers, python

I have a program in Python which analyses file headers and decides which file type it is. (https://github.com/LeoGSA/Browser-Cache-Grabber)
The problem is the following:
I read first 24 bytes of a file:
with open (from_folder+"/"+i, "rb") as myfile:
header=str(myfile.read(24))
then I look for pattern in it:
if y[1] in header:
shutil.move (from_folder+"/"+i,to_folder+y[2]+i+y[3])
where y = ['/video', r'\x47\x40\x00', '/video/', '.ts']
y[1] is the pattern and = r'\x47\x40\x00'
the file has it inside, as you can see from the picture below.
the program does NOT find this pattern (r'\x47\x40\x00') in the file header.
so, I tried to print header:
You see? Python sees it as 'G#' instead of '\x47\x40'
and if i search for 'G#'+r'\x00' in header - everything is ok. It finds it.
Question: What am I doing wrong? I want to look for r'\x47\x40\x00' and find it. Not for some strange 'G#'+r'\x00'.
OR
why python sees first two numbers as 'G#' and not as '\x47\x40', though the rest of header it sees in HEX? Is there a way to fix it?
with open (from_folder+"/"+i, "rb") as myfile:
header=myfile.read(24)
header = str(binascii.hexlify(header))[2:-1]
the result I get is:
And I can work with it
4740001b0000b00d0001c100000001efff3690e23dffffff
P.S. But anyway, if anybody will explain what was the problem with 2 first bytes - I would be grateful.
In Python 3 you'll get bytes from a binary read, rather than a string.
No need to convert it to a string by str.
Print will try to convert bytes to something human readable.
If you don't want that, convert your bytes to e.g. hex representations of the integer values of the bytes by:
aBytes = b'\x00\x47\x40\x00\x13\x00\x00\xb0'
print (aBytes)
print (''.join ([hex (aByte) for aByte in aBytes]))
Output as redirected from the console:
b'\x00G#\x00\x13\x00\x00\xb0'
0x00x470x400x00x130x00x00xb0
You can't search in aBytes directly with the in operator, since aBytes isn't a string but an array of bytes.
If you want to apply a string search on '\x00\x47\x40', use:
aBytes = b'\x00\x47\x40\x00\x13\x00\x00\xb0'
print (aBytes)
print (r'\x'.join ([''] + ['%0.2x'%aByte for aByte in aBytes]))
Which will give you:
b'\x00G#\x00\x13\x00\x00\xb0'
\x00\x47\x40\x00\x13\x00\x00\xb0
So there's a number of separate issues at play here:
print tries to print something human readable, which succeeds only for the first two chars.
You can't directly search for bytearrays in bytearrays with in, so convert them to a string containing fixed length hex representations as substrings, as shown.

reading char data from binary file with numpy

I have a binary file with some character data stuck in the middle of a bunch of ints and floats. I am trying to read with numpy. The farthest I have been able to get regarding the character data is:
strbits = np.fromfile(infile,dtype='int8',count=73)
(Yes, it's a 73-character string.)
Three questions: Is my data now stored without corruption or truncation in strbits? And, can I now convert strbits into a readable string? Finally, should I be doing this in some completely different way?
UPDATE:
Here's something that works, but I would think there would be a more elegant way.
strarr = np.zeros(73,dtype='c')
for n in range(73):
strarr[n] = np.fromfile(infile,dtype='c',count=1)[0]
So now I have an array where each element is a single character from the input file.
The way you're doing it is fine. Here's how you can convert it to a string.
strbits = np.fromfile(infile, dtype=np.int8, count=73)
a_string = ''.join([chr(item) for item in strbits])

Interpreting binary files as ASCII

I have a binary file (which I've created in C) and I would like to have a look inside the file. Obviously, I won't be able to "see" anything useful as it's in binary. However I do know that contains certain number of rows with numbers in double precision. I am looking for a script to just read some values and print them so I can verify the if they are in the right range. In other words, it would be like doing head or tail in linux on an text file.
Is there a way of doing it?
Right now I've got something in Python, but it does not do what I want:
CHUNKSIZE = 8192
file = open('eigenvalues.bin', 'rb')
data = list(file.read())
print data
Use the array module to read homogenous binary-representation numbers:
from array import array
data = array('d')
CHUNKSIZE = 8192
rowcount = CHUNKSIZE / data.itemsize # number of doubles we find in CHUNKSIZE bytes
with open('eigenvalues.bin', 'rb') as eg:
data.fromfile(eg, rowcount)
The array.array type otherwise behaves just like a list, only the type of values it can hold is constricted (in this case to float).
Depending on the input data, you may need to add a data.byteswap() call after reading to switch between little and big-endian. Use sys.byteorder to see what byteorder was used to read the data. If your data was written on a platform using little-endianess, swap if your platform uses the other form, and vice-versa:
import sys
if sys.byteorder == 'big':
# data was written in little-endian form, so swap the bytes to match
data.byteswap()
You can use struct.unpack to convert binary data into a specific data type.
For example, if you want to read the first double from the binary data. (not tested, but believe this is correct)
struct.unpack("d",inputData[0:7])
http://docs.python.org/2/library/struct.html
You can see each byte of your file represented in unsigned decimal with this shell command:
od -t u1 eigenvalues.bin | less
Should you want to see a particular area and decode floating point numbers, you can use dd to extract them and od -F option to decode them, eg:
dd status=noxfer if=eigenvalues.bin bs=1 skip=800 count=16 | od -F
will show two double precision numbers stored at offset 800 and 808 in the binary file.
Note that according to the Linux tag set to your question, I assume you are using Gnu versions of dd and od.

python 32 bit float conversion

Python 2.6 on Redhat 6.3
I have a device that saves 32 bit floating point value across 2 memory registers, split into most significant word and least significant word.
I need to convert this to a float.
I have been using the following code found on SO and it is similar to code I have seen elsewhere
#!/usr/bin/env python
import sys
from ctypes import *
first = sys.argv[1]
second = sys.argv[2]
reading_1 = str(hex(int(first)).lstrip("0x"))
reading_2 = str(hex(int(second)).lstrip("0x"))
sample = reading_1 + reading_2
def convert(s):
i = int(s, 16) # convert from hex to a Python int
cp = pointer(c_int(i)) # make this into a c integer
fp = cast(cp, POINTER(c_float)) # cast the int pointer to a float pointer
return fp.contents.value # dereference the pointer, get the float
print convert(sample)
an example of the register values would be ;
register-1;16282 register-2;60597
this produces the resulting float of
1.21034872532
A perfectly cromulent number, however sometimes the memory values are something like;
register-1;16282 register-2;1147
which, using this function results in a float of;
1.46726675314e-36
which is a fantastically small number and not a number that seems to be correct. This device should be producing readings around the 1.2, 1.3 range.
What I am trying to work out is if the device is throwing bogus values or whether the values I am getting are correct but the function I am using is not properly able to convert them.
Also is there a better way to do this, like with numpy or something of that nature?
I will hold my hand up and say that I have just copied this code from examples on line and I have very little understanding of how it works, however it seemed to work in the test cases that I had available to me at the time.
Thank you.
If you have the raw bytes (e.g. read from memory, from file, over the network, ...) you can use struct for this:
>>> import struct
>>> struct.unpack('>f', '\x3f\x9a\xec\xb5')[0]
1.2103487253189087
Here, \x3f\x9a\xec\xb5 are your input registers, 16282 (hex 0x3f9a) and 60597 (hex 0xecb5) expressed as bytes in a string. The > is the byte order mark.
So depending how you get the register values, you may be able to use this method (e.g. by converting your input integers to byte strings). You can use struct for this, too; this is your second example:
>>> raw = struct.pack('>HH', 16282, 1147) # from two unsigned shorts
>>> struct.unpack('>f', raw)[0] # to one float
1.2032617330551147
The way you've converting the two ints makes implicit assumptions about endianness that I believe are wrong.
So, let's back up a step. You know that the first argument is the most significant word, and the second is the least significant word. So, rather than try to figure out how to combine them into a hex string in the appropriate way, let's just do this:
import struct
import sys
first = sys.argv[1]
second = sys.argv[2]
sample = int(first) << 16 | int(second)
Now we can just convert like this:
def convert(i):
s = struct.pack('=i', i)
return struct.unpack('=f', s)[0]
And if I try it on your inputs:
$ python floatify.py 16282 60597
1.21034872532
$ python floatify.py 16282 1147
1.20326173306

Categories

Resources