I'm trying to read noncontiguous fields from a binary file in Python using numpy fromfile function. It's based on this Matlab code using fread:
fseek(file, 0, 'bof');
q = fread(file, inf, 'float32', 8);
8 indicates the number of bytes I want to skip after reading each value. I was wondering if there was a similar option in fromfile, or if there is another way of reading specific values from a binary file in Python. Thanks for your help.
Henrik
Something like this should work, untested:
import struct
floats = []
with open(filename, 'rb') as f:
while True:
buff = f.read(4) # 'f' is 4-bytes wide
if len(buff) < 4: break
x = struct.unpack('f', buff)[0] # Convert buffer to float (get from returned tuple)
floats.append(x) # Add float to list (for example)
f.seek(8, 1) # The second arg 1 specifies relative offset
Using struct.unpack()
Related
I have a binary file that contains both integers and doubles. I want to access that data either in one call (something like: x = np.fromfile(f, dtype=np.int)) or sequentially (value by value). However, NumPy doesn't seem to allow to read from a binary file without specifying a type. Should I convert everything to double, or forget about NumPy?
Edit. Let's say the format of the file is something like this:
int
int int int double double double double double double double
etc.
NumPy doesn't seem to allow to read from a binary file without specifying a type
No programming language I know of pretends to be able to guess the type of raw binary data; and for good reasons. What exactly is the higher level problem you are trying to solve?
I don't think you'd need numpy for this. The basic Python binary library struct is doing the job. Convert list of tuples given at end into numpy array if so desired.
For sources see https://docs.python.org/2/library/struct.html and #martineau
Reading a binary file into a struct in Python
from struct import pack,unpack
with open("foo.bin","wb") as file:
a=pack("<iiifffffff", 1,2,3, 1.1,2.2e-2,3.3e-3,4.4e-4,5.5e-5,6.6e-6,7.7e-7 )
file.write(a)
with open("foo.bin","r") as file:
a=unpack("<iiifffffff",file.read() )
print a
output:
(1, 2, 3, 1.100000023841858, 0.02199999988079071, 0.0032999999821186066, 0.0004400000034365803, 5.500000042957254e-05, 6.599999778700294e-06, 7.699999855503847e-07)
Showing the binary file in a binary editor (Frhed):
#how to read same structure repeatedly
import struct
fn="foo2.bin"
struct_fmt = '<iiifffffff'
struct_len = struct.calcsize(struct_fmt)
struct_unpack = struct.Struct(struct_fmt).unpack_from
with open(fn,"wb") as file:
a=struct.pack("<iiifffffff", 1,2,3, 1.1,2.2e-2,3.3e-3,4.4e-4,5.5e-5,6.6e-6,7.7e-7 )
for i in range(3):
file.write(a)
results = []
with open(fn, "rb") as f:
while True:
data = f.read(struct_len)
if not data: break
s = struct_unpack(data)
results.append(s)
print results
output:
[(1, 2, 3, 1.100000023841858, 0.02199999988079071, 0.0032999999821186066, 0.0004400000034365803, 5.500000042957254e-05, 6.599999778700294e-06, 7.699999855503847e-07), (1, 2, 3, 1.100000023841858, 0.02199999988079071, 0.0032999999821186066, 0.0004400000034365803, 5.500000042957254e-05, 6.599999778700294e-06, 7.699999855503847e-07), (1, 2, 3, 1.100000023841858, 0.02199999988079071, 0.0032999999821186066, 0.0004400000034365803, 5.500000042957254e-05, 6.599999778700294e-06, 7.699999855503847e-07)]
I am new to both Matlab and Python and I have to convert a program in Matlab to Python. I am not sure how to typecast the data after reading from the file in Python. The file used is a binary file.
Below is the Matlab code:
fid = fopen (filename, 'r');
fseek (fid, 0, -1);
meta = zeros (n, 9, 'single');
v = zeros (n, 128, 'single');
d = 0;
for i = 1:n
meta(i,:) = fread (fid, 9, 'float');
d = fread (fid, 1, 'int');
v(i,:) = fread (fid, d, 'uint8=>single');
end
I have written the below program in python:
fid = open(filename, 'r')
fid.seek(0 , 0)
meta = np.zeros((n,9),dtype = np.float32)
v = np.zeros((n,128),dtype = np.float32)
for i in range(n):
data_str = fid.read(9);
meta[1,:] = unpack('f', data_str)
For this unpack, I getting the error as
"unpack requires a string argument of length 4"
.
Please suggest someway to make it work.
I looked a little in the problem mainly because I need this in the near future, too. Turns out there is a very simple solution using numpy, assuming you have a matlab matrix stored like I do.
import numpy as np
def read_matrix(file_name):
return np.fromfile(file_name, dtype='<f') # little-endian single precision float
arr = read_matrix(file_path)
print arr[0:10] #sample data
print len(arr) # number of elements
The data type (dtype) you must find out yourself. Help on this is here. I used fwrite(fid,value,'single'); to store the data in matlab, if you have the same, the code above will work.
Note, that the returned variable is a list; you'll have to format it to match the original shape of your data, in my case len(arr) is 307200 from a matrix of the size 15360 x 20.
I am reading a stream of data from an A-D converter via a socket from Python; the data come in as raw bytes. I want to format these bytes as int32 and place them into an ndarray. The read process looks something like this:
def datarecv():
global msgbuf
binlen = BURSTLEN + 4
while len(msgbuf) < binlen:
msgbuf = msgbuf + socket.recv(4096)
reply = msgbuf[0:binlen]
msgbuf = msgbuf[binlen:]
# each recv comes with a 4 byte header that I throw away...
return reply[4:]
The following is used successfully to write the received data to a file:
with open(filename, "wb') as f:
bytesremaining = framesize
for i in range(lines):
f.write(datarecv()[0:min(linesize, bytesremaining)])
bytesremaining -= linesize
I can then read back the file with something like this:
>>> data = numpy.fromfile(filename, dtype='int32')
>>> type(data)
<type 'numpy.ndarray'>
So my data variable is the format I'm looking for, I.E.
>>> data[1:10]
array([4214234234, 2342342342, 2342342342, 34534535, 345345353, 5675757,
2142423424, 35334535, 35353535, 4754745754], dtype=int32)
** BUT ** I want to omit the intermediate step of writing to a file. After I read in the raw stream of data I want to make it an ndarray so that I can manipulate the data. I can change the line from
f.write(datarecv()[0:min(linesize, bytesremaining)])
to
bigbuf = bigbuf + datarecv()[0:min(linesize, bytesremaining)]
and then I end up with a big string. It's a string of raw bytes (not ASCII) which I have to convert to 32 bit integers. I'm hung up on this last step. I hope this makes sense what I'm asking. Thanks.
You can convert bigbuf to an array with numpy.fromstring
For example:
In [21]: bigbuf = "\1\0\0\0\2\0\0\0"
In [22]: fromstring(bigbuf, dtype=np.int32)
Out[22]: array([1, 2], dtype=int32)
Please let me know the best way to write 8bit values to a file in python. The values can be between 0 and 255.
I open the file as follows:
f = open('temp', 'wb')
Assume the value I am writing is an int between 0 and 255 assigned to a variable, e.g.
x = 13
This does not work:
f.write(x)
..as you experts know does not work. python complains about not being able to write ints to buffer interface. As a work around I am writing it as a hex digit. Thus:
f.write(hex(x))
..and that works but is not only space inefficient but clearly not the right python way. Please help. Thanks.
Try explicitly creating a bytes object:
f.write(bytes([x]))
You can also output a series of bytes as follows:
f.write(bytes([65, 66, 67]))
As an alternative you can use the struct module...
import struct
x = 13
with open('temp', 'wb') as f:
f.write(struct.pack('>I', x)) # Big-endian, unsigned int
To read x from the file...
with open('temp', 'rb') as f:
x, = struct.unpack(">I", f.read())
You just need to write an item of bytes data type, not integer.
I am writing a float32 to a file with numpy's tofile().
float_num = float32(3.4353)
float_num.tofile('float_test.bin')
It can be read with numpy's fromfile(), however that doesn't suit my needs and I have to read it as a raw binary with the help of the bitstring module.
So I do the following:
my_file = open('float_test.bin', 'rb')
raw_data = ConstBitStream(my_file)
float_num_ = raw_data.readlist('float:32')
print float_num
print float_num_
Output:
3.4353
-5.56134659129e+32
What could be the cause? The second output should also be 3.4353 or close.
The problem is that numpy's float32 is stored as little endian and bitstrings default implementation is bigendian. The solution is to specify little endian as the data type.
my_file = open('float_test.bin', 'rb')
raw_data = ConstBitStream(my_file)
float_num_ = raw_data.readlist('floatle:32')
print float_num
print float_num_
Output:
3.4353
3.43530011177
Reference on bitstring datatypes, here.