Writing numpy array to file- byte-order issue? - python

I'm trying to write a numpy array to file, but the file format is such that every value must contain only the 8 bytes required to represent a 64-bit float.
As best I can tell, ndarray.tofile(array), with array.dtype = 'float64' is not accomplishing this, so how can I do this quickly?

tofile already creates the binary file that you describe. Are you sure you are calling it correctly; if you're opening the file in your code, did you remember to open it in binary mode? Here is an example of tofile working as expected:
>>> import numpy as np
>>> a = np.array([1, 2, 3], dtype='float64')
>>> a
array([ 1., 2., 3.])
>>> a.tofile('foo')
Inspecting the file reveals it to be 24 bytes long, and with the contents corresponding to little-endian 64-bit IEEE 754 floats:
$ hexdump -C foo
00000000 00 00 00 00 00 00 f0 3f 00 00 00 00 00 00 00 40 |.......?.......#|
00000010 00 00 00 00 00 00 08 40 |.......#|
00000018

Related

How to write n bytes to a binary file in python 2.7

I am trying to use f.write(struct.pack()) to write n bytes to a binary file but not quite sure how to do that? Any example or sample would be helpful.
You don't really explain your exact problem or what you tried and which error messages you encountered:
The solution should look something like:
with open("filename", "wb") as fout:
fout.write(struct.pack(format, data, ...))
If you explain what data exactly you want to dump, then I can elaborate on the solution
If your data is just a hex string, then you do not need struct, you just use decode.
Please refer to SO question hexadecimal string to byte array in python
example for python 2.7:
hex_str = "414243444500ff"
bytestring = hex_str.decode("hex")
with open("filename", "wb") as fout:
fout.write(bytestring)
The below worked for me:
reserved = "Reserved_48_Bytes"
f.write(struct.pack("48s", reserved))
Output:
hexdump -C output.bin
00000030 52 65 73 65 72 76 65 64 5f 34 38 5f 42 79 74 65 |Reserved_48_Byte|
00000040 73 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |s...............|
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

Internet checksum -- Adding hex numbers together for checksum

I came across the following example of creating an Internet Checksum:
Take the example IP header 45 00 00 54 41 e0 40 00 40 01 00 00 0a 00 00 04 0a 00 00 05:
Adding the fields together yields the two’s complement sum 01 1b 3e.
Then, to convert it to one’s complement, the carry-over bits are added to the first 16-bits: 1b 3e + 01 = 1b 3f.
Finally, the one’s complement of the sum is taken, resulting to the checksum value e4c0.
I was wondering how the IP header is added together to get 01 1b 3e?
Split your IP header into 16-bit parts.
45 00
00 54
41 e0
40 00
40 01
00 00
0a 00
00 04
0a 00
00 05
The sum is 01 1b 3e. You might want to look at how packet header checksums are being calculated here https://en.m.wikipedia.org/wiki/IPv4_header_checksum.
The IP header is added together with carry in hexadecimal numbers of 4 digits.
i.e. the first 3 numbers that are added are 0x4500 + 0x0054 + 0x41e0 +...

pyside2-uic null bytes in output

I'm trying to convert Qt .ui files made using Qt Designer with pyside2-uic but the output starts with 2 garbage bytes then every other byte is a null.
Here's the start of the output:
FF FE 23 00 20 00 2D 00 2A 00 2D 00 20 00 63 00 6F 00 64 00 69 00 6E 00 67 00 3A 00 20 00 75 00 74 00 66 00 2D 00 38 00 20 00 2D 00 2A 00 2D 00 0D 00 0A 00 0D 00 0A 00 23 00 20 00 46 00 6F 00
If I remove the first 2 bytes and all the nulls the it works as expected.
I'm using Python 3.7 and the newest version of pyside2, is there any way to get pyside2-uic to output a valid file without having to run it through another script to pull out all the garbage?
FYI, issue seems to be UTF-8 encoding (when using -o), vs. UTF-16 LE (output redirect in PowerShell).
This also matches to above ... every byte has a 00 with it (16 bit vs. 8 bit).
This bug(?) only occurs when pyside2-uic is run in powershell and the output is redirected to a file.
If using powershell use the -o option to specify an output file. Both methods work fine from a normal command prompt.
In pyside2-uic mainwindow.ui -o MainWindow.py
Use -o instead of >

Decoding an IEEE double precision float (8 byte)

I'm decoding an AMF0 format file. The data I'm looking for is a timestamp, encoded as an array. [HH, MM, SS].
Since the data is AMF0, I can locate the start of the data by reading in the file as bytes, converting each byte to hex, and looking for the signal 08 00 00 00 03, an array of length 3.
My problem is that I don't know how to decode the 8-byte integer in each element of the array. I have the data in the same, hex-encoded format, e.g.:
08 00 00 00 03 *signals array length 3*
00 01 30 00 00 00 00 00 00 00 00 00 *signals integer*
00 01 31 00 00 00 00 00 00 00 00 00 *signals integer*
00 01 32 00 40 3C 00 00 00 00 00 00 *signals integer*
00 00 09 *signals object end*
This should be decoded as [0, 0, 28] (if minerva is to be believed).
I've been trying to use struct.unpack, but all the examples I see are for 4-byte (little endian) values.
The format specifier you are looking for is ">9xd4xd4xd3x":
>>> import struct
>>> from binascii import unhexlify
>>> struct.unpack(">9xd4xd4xd3x", unhexlify("080000000300013000000000000000000000013100000000000000000000013200403C000000000000000009"))
(0.0, 0.0, 28.0)
Broken down:
>: big endian format
5x: 5 bytes begin-of-array marker + size (ignored)
4x: 4 bytes begin-of-element marker (ignored)
d: 1 big endian IEEE-754 double
points 2-3 for other 2 elements
3x: 3 bytes end-of-array marker (ignored)
Points 1. and 2. are merged together into 9x.
As you might have noticed, struct can only ignore extra bytes, not validate. If you need more flexibility in the input format, you could use a regex matching begin/end array markers in non-greedy mode.
To decode floats use the struct-module:
>>> struct.unpack('>d','403C000000000000'.decode('hex'))[0]
28.0

Invalid characters for python output file

I have this little script:
from numpy import *
import numpy as np
import scipy.spatial as spt
X= np.loadtxt('edm')
myfile = open('edm.txt','w')
V= spt.distance.pdist(X.T,'sqeuclidean')
P = spt.distance.squareform(V)
print P
myfile.write(P)
And this matrix:
0 199.778354301
201.857546133 0
If I run my program; I get this in the terminal (according to the "print"):
[[ 0. 80657.85977805]
[ 80657.85977805 0. ]]
But in my output file; I get invalid characters as following :
��������z°¶¡±Û#z°¶¡±Û#��������
Do you know why?
Thanks
You can use NumPy savetxt method to save arrays and not worry about codification.
As in the Docs,
>>> np.savetxt('edm.txt', x) # x is an array
The file contains the binary representation of the numbers in the matrix.
$ od -t x1z edm.txt
0000000 00 00 00 00 00 00 00 00 79 a1 a6 c1 1d b1 f3 40 ........y......#
0000020 79 a1 a6 c1 1d b1 f3 40 00 00 00 00 00 00 00 00 y......#........
0000040

Categories

Resources