Matlab to Python Conversion binary file read - python

I am new to both Matlab and Python and I have to convert a program in Matlab to Python. I am not sure how to typecast the data after reading from the file in Python. The file used is a binary file.
Below is the Matlab code:
fid = fopen (filename, 'r');
fseek (fid, 0, -1);
meta = zeros (n, 9, 'single');
v = zeros (n, 128, 'single');
d = 0;
for i = 1:n
meta(i,:) = fread (fid, 9, 'float');
d = fread (fid, 1, 'int');
v(i,:) = fread (fid, d, 'uint8=>single');
end
I have written the below program in python:
fid = open(filename, 'r')
fid.seek(0 , 0)
meta = np.zeros((n,9),dtype = np.float32)
v = np.zeros((n,128),dtype = np.float32)
for i in range(n):
data_str = fid.read(9);
meta[1,:] = unpack('f', data_str)
For this unpack, I getting the error as
"unpack requires a string argument of length 4"
.
Please suggest someway to make it work.

I looked a little in the problem mainly because I need this in the near future, too. Turns out there is a very simple solution using numpy, assuming you have a matlab matrix stored like I do.
import numpy as np
def read_matrix(file_name):
return np.fromfile(file_name, dtype='<f') # little-endian single precision float
arr = read_matrix(file_path)
print arr[0:10] #sample data
print len(arr) # number of elements
The data type (dtype) you must find out yourself. Help on this is here. I used fwrite(fid,value,'single'); to store the data in matlab, if you have the same, the code above will work.
Note, that the returned variable is a list; you'll have to format it to match the original shape of your data, in my case len(arr) is 307200 from a matrix of the size 15360 x 20.

Related

Read a binary file using unpack in Python compared with a IDL method

I have a IDL procedure reading a binary file and I try to translate it into a Python routine.
The IDL code look like :
a = uint(0)
b = float(0)
c = float(0)
d = float(0)
e = float(0)
x=dblarr(nptx)
y=dblarr(npty)
z=dblarr(nptz)
openr,11,name_file_data,/f77_unformatted
readu,11,a
readu,11,b,c,d,e
readu,11,x
readu,11,y
readu,11,z
it works perfectly. So I'm writing the same thing in python but I can't find the same results (even the value of 'a' is different). Here is my code :
x=np.zeros(nptx,float)
y=np.zeros(npty,float)
z=np.zeros(nptz,float)
with open(name_file_data, "rb") as fb:
a, = struct.unpack("I", fb.read(4))
b,c,d,e = struct.unpack("ffff", fb.read(16))
x[:] = struct.unpack(str(nptx)+"d", fb.read(nptx*8))[:]
y[:] = struct.unpack(str(npty)+"d", fb.read(npty*8))[:]
z[:] = struct.unpack(str(nptz)+"d", fb.read(nptz*8))[:]
Hope it will help anyone to answer me.
Update : As suggested in the answers, I'm now trying the module "FortranFile", but I'm not sure I understood everything about its use.
from scipy.io import FortranFile
f=FortranFile(name_file_data, 'r')
a=f.read_record('H')
b=f.read_record('f','f','f','f')
However, instead of having an integer for 'a', I got : array([0, 0], dtype=uint16).
And I had this following error for 'b': Size obtained (1107201884) is not a multiple of the dtypes given (16)
According to a table of IDL data types, UINT(0) creates a 16 bit integer (i.e. two bytes). In the Python struct module, the I format character denotes a 4 byte integer, and H denotes an unsigned 16 bit integer.
Try changing the line that unpacks a to
a, = struct.unpack("H", fb.read(2))
Unfortunately, this probably won't fix the problem. You use the option /f77_unformatted with openr, which means the file contains more than just the raw bytes of the variables. (See the documentation of the OPENR command for more information about /f77_unformatted.)
You could try to use scipy.io.FortranFile to read the file, but there are no gaurantees that it will work. The binary layout of an unformatted Fortran file is compiler dependent.

How to skip bytes after reading data using numpy fromfile

I'm trying to read noncontiguous fields from a binary file in Python using numpy fromfile function. It's based on this Matlab code using fread:
fseek(file, 0, 'bof');
q = fread(file, inf, 'float32', 8);
8 indicates the number of bytes I want to skip after reading each value. I was wondering if there was a similar option in fromfile, or if there is another way of reading specific values from a binary file in Python. Thanks for your help.
Henrik
Something like this should work, untested:
import struct
floats = []
with open(filename, 'rb') as f:
while True:
buff = f.read(4) # 'f' is 4-bytes wide
if len(buff) < 4: break
x = struct.unpack('f', buff)[0] # Convert buffer to float (get from returned tuple)
floats.append(x) # Add float to list (for example)
f.seek(8, 1) # The second arg 1 specifies relative offset
Using struct.unpack()

Read or construct an array from a binary file containing both integers and doubles in Python/NumPy

I have a binary file that contains both integers and doubles. I want to access that data either in one call (something like: x = np.fromfile(f, dtype=np.int)) or sequentially (value by value). However, NumPy doesn't seem to allow to read from a binary file without specifying a type. Should I convert everything to double, or forget about NumPy?
Edit. Let's say the format of the file is something like this:
int
int int int double double double double double double double
etc.
NumPy doesn't seem to allow to read from a binary file without specifying a type
No programming language I know of pretends to be able to guess the type of raw binary data; and for good reasons. What exactly is the higher level problem you are trying to solve?
I don't think you'd need numpy for this. The basic Python binary library struct is doing the job. Convert list of tuples given at end into numpy array if so desired.
For sources see https://docs.python.org/2/library/struct.html and #martineau
Reading a binary file into a struct in Python
from struct import pack,unpack
with open("foo.bin","wb") as file:
a=pack("<iiifffffff", 1,2,3, 1.1,2.2e-2,3.3e-3,4.4e-4,5.5e-5,6.6e-6,7.7e-7 )
file.write(a)
with open("foo.bin","r") as file:
a=unpack("<iiifffffff",file.read() )
print a
output:
(1, 2, 3, 1.100000023841858, 0.02199999988079071, 0.0032999999821186066, 0.0004400000034365803, 5.500000042957254e-05, 6.599999778700294e-06, 7.699999855503847e-07)
Showing the binary file in a binary editor (Frhed):
#how to read same structure repeatedly
import struct
fn="foo2.bin"
struct_fmt = '<iiifffffff'
struct_len = struct.calcsize(struct_fmt)
struct_unpack = struct.Struct(struct_fmt).unpack_from
with open(fn,"wb") as file:
a=struct.pack("<iiifffffff", 1,2,3, 1.1,2.2e-2,3.3e-3,4.4e-4,5.5e-5,6.6e-6,7.7e-7 )
for i in range(3):
file.write(a)
results = []
with open(fn, "rb") as f:
while True:
data = f.read(struct_len)
if not data: break
s = struct_unpack(data)
results.append(s)
print results
output:
[(1, 2, 3, 1.100000023841858, 0.02199999988079071, 0.0032999999821186066, 0.0004400000034365803, 5.500000042957254e-05, 6.599999778700294e-06, 7.699999855503847e-07), (1, 2, 3, 1.100000023841858, 0.02199999988079071, 0.0032999999821186066, 0.0004400000034365803, 5.500000042957254e-05, 6.599999778700294e-06, 7.699999855503847e-07), (1, 2, 3, 1.100000023841858, 0.02199999988079071, 0.0032999999821186066, 0.0004400000034365803, 5.500000042957254e-05, 6.599999778700294e-06, 7.699999855503847e-07)]

Convert string to ndarray in python

I am reading a stream of data from an A-D converter via a socket from Python; the data come in as raw bytes. I want to format these bytes as int32 and place them into an ndarray. The read process looks something like this:
def datarecv():
global msgbuf
binlen = BURSTLEN + 4
while len(msgbuf) < binlen:
msgbuf = msgbuf + socket.recv(4096)
reply = msgbuf[0:binlen]
msgbuf = msgbuf[binlen:]
# each recv comes with a 4 byte header that I throw away...
return reply[4:]
The following is used successfully to write the received data to a file:
with open(filename, "wb') as f:
bytesremaining = framesize
for i in range(lines):
f.write(datarecv()[0:min(linesize, bytesremaining)])
bytesremaining -= linesize
I can then read back the file with something like this:
>>> data = numpy.fromfile(filename, dtype='int32')
>>> type(data)
<type 'numpy.ndarray'>
So my data variable is the format I'm looking for, I.E.
>>> data[1:10]
array([4214234234, 2342342342, 2342342342, 34534535, 345345353, 5675757,
2142423424, 35334535, 35353535, 4754745754], dtype=int32)
** BUT ** I want to omit the intermediate step of writing to a file. After I read in the raw stream of data I want to make it an ndarray so that I can manipulate the data. I can change the line from
f.write(datarecv()[0:min(linesize, bytesremaining)])
to
bigbuf = bigbuf + datarecv()[0:min(linesize, bytesremaining)]
and then I end up with a big string. It's a string of raw bytes (not ASCII) which I have to convert to 32 bit integers. I'm hung up on this last step. I hope this makes sense what I'm asking. Thanks.
You can convert bigbuf to an array with numpy.fromstring
For example:
In [21]: bigbuf = "\1\0\0\0\2\0\0\0"
In [22]: fromstring(bigbuf, dtype=np.int32)
Out[22]: array([1, 2], dtype=int32)

Python Wave Library String to Bytes

So basically I am trying to read in the information of a wave file so that I can take the byte information and create an array of time->amplitude points.
import wave
class WaveFile:
# `filename` is the name of the wav file to open
def __init__(self, fileName):
self.wf = wave.open(fileName, 'r')
self.soundBytes = self.wf.readframes(-1)
self.timeAmplitudeArray = self.__calcTimeAmplitudeArray()
def __calcTimeAmplitudeArray(self):
self.internalTimeAmpList = [] # zero out the internal representation
byteList = self.soundBytes
if((byteList[i+1] & 0x080) == 0):
amp = (byteList[i] & 0x0FF) + byteList[i+1] << 8
#more code continues.....
Error:
if((int(byteList[i+1]) & 0x080) == 0):
TypeError: unsupported operand type(s) for &: 'str' and 'int'
I have tried using int() to convert to integer type, but to no avail. I come from a Java background where this would done using the byte type, but that does not appear to be a language feature of Python. Any direction would be appreciated.
Your problem comes from the fact that the wave library is just giving you raw binary data (in the form of a string).
You'll probably need to check the form of the data with self.wf.getparams(). This returns (nchannels, sampwidth, framerate, nframes, comptype, compname). If you do have 1 channel, a sample width of 2, and no compression (fairly common type of wave), you can use the following (import numpy as np) to get the data:
byteList = np.fromstring(self.soundBytes,'<h')
This returns a numpy array with the data. You don't need to loop. You'll need something different in the second paramater if you have a different sample width. I've tested with with a simple .wav file and plot(byteList); show() (pylab mode in iPython) worked.
See Reading *.wav files in Python for other methods to do this.
Numpyless version
If you need to avoid numpy, you can do:
import array
bytelist = array.array('h')
byteList.fromstring(self.soundBytes)
This works like before (tested with plot(byteList); show()). 'h' means signed short. len, etc. works. This does import the wav file all at once, but then again .wav usually are small. Not always.
I usually use the array-module for this and the fromstring method.
My standard-pattern for operating on chunks of data is this:
def bytesfromfile(f):
while True:
raw = array.array('B')
raw.fromstring(f.read(8192))
if not raw:
break
yield raw
with open(f_in, 'rb') as fd_in:
for byte in bytesfromfile(fd_in):
# do stuff
Above 'B' denotes unsigned char, i.e. 1-byte.
If the file isn't huge, then you can just slurp it:
In [8]: f = open('foreman_cif_frame_0.yuv', 'rb')
In [9]: raw = array.array('B')
In [10]: raw.fromstring(f.read())
In [11]: raw[0:10]
Out[11]: array('B', [10, 40, 201, 255, 247, 254, 254, 254, 254, 254])
In [12]: len(raw)
Out[12]: 152064
Guido can't be wrong...
If you instead prefer numpy, I tend to use:
fd_i = open(file.bin, 'rb')
fd_o = open(out.bin, 'wb')
while True:
# Read as uint8
chunk = np.fromfile(fd_i, dtype=np.uint8, count=8192)
# use int for calculations since uint wraps
chunk = chunk.astype(np.int)
if not chunk.any():
break
# do some calculations
data = ...
# convert back to uint8 prior to writing.
data = data.astype(np.uint8)
data.tofile(fd_o)
fd_i.close()
fd_o.close()
or to read the whole-file:
In [18]: import numpy as np
In [19]: f = open('foreman_cif_frame_0.yuv', 'rb')
In [20]: data = np.fromfile(f, dtype=np.uint8)
In [21]: data[0:10]
Out[21]: array([ 10, 40, 201, 255, 247, 254, 254, 254, 254, 254], dtype=uint8)

Categories

Resources