I have a binary reader in c++ and i am trying to write the same thing in python (beginner in both language). I read online and i see that i should use struct but i am having trouble getting it done.
My c++ code is the Following:
struct databin {
float prixc[60];
float volume[60];
};
databin qt;
//get the data for day d, minute i//
int numsteps = m_nbsteps * sizeof(qt);
int step = d*numsteps+i*sizeof(qt);
m_rf.seekg(step, ios::beg);
m_rf.read((char*) &qt, sizeof(qt));
// at the end we have the data in the object qt
I would really appreciate some help to do the same in python.
Thank you!!
update:
Sorry Mark, i did not want my message to be perceived that way. Really appreciate the time you spend.
Actually i was more looking for a starting point with "struct" as my struct in c was made of two arrays and i cant find how to have the same type of structure in python to be able to use unpack.
What i have done so far:
//get the data for day d, minute i//
d=5000
i=15
numsteps=391*480
x=[]
y=[]
step=d*numsteps+i*480
with open(file, "rb") as of:
of.seek(step, 0)
couple_bytes = of.read(480)
for j in range(0,240,4):
[x] = struct.unpack('f', couple_bytes[j:j+4])
xx.append(x)
for j in range(244,480,4):
[y] = struct.unpack('f', couple_bytes[j:j+4])
yy.append(y)
Now this works and in xx and yy i have my 2 arrays. But my goal was to have a more direct approach by defining a structure and reading it directly.
Thank you again!
The struct module has the ability to unpack many values at a time by including a count in the format string.
with open(file, "rb") as of:
of.seek(step, 0)
couple_bytes = of.read(60*4)
prixc = list(struct.unpack('60f', couple_bytes))
couple_bytes = of.read(60*4)
volume = list(struct.unpack('60f', couple_bytes))
Related
I created a binary file in C# to store floats.
BinaryFormatter formatter = new BinaryFormatter();
FileStream saveFile = File.Create(my_path);
formatter.Serialize(saveFile, data_to_store);
saveFile.Close();
The data can be read in C# well. But I cannot read it from python.
f = open(path,'rb')
nums=int(os.path.getsize(fpath)/4)
data = struct.unpack('f'*nums,f.read(4*nums))
print(data)
f.close()
data = np.array(data)
The code above did not work.
It looks like you're trying to serialise a binary array of floats.
If you do that using BinaryFormatter.Serialize(), it will write some header information too. I'm guessing you don't want that.
Also note that BinaryFormatter.Serialize() is obsolete, and Microsoft recommends that you don't use it.
If you just want to write a float array as binary to a file, with no header or other extraneous data, you can do it like this in .NET Core 3.1 or later:
float[] data = new float[10];
using FileStream saveFile = File.Create(#"d:\tmp\test.bin");
saveFile.Write(MemoryMarshal.AsBytes(data.AsSpan()));
If you're using .Net 4.x, you have to do it like this:
float[] data = new float[10];
using FileStream saveFile = File.Create(#"d:\tmp\test.bin");
foreach (float f in data)
{
saveFile.Write(BitConverter.GetBytes(f), 0, sizeof(float));
}
i.e. you have to convert each float to an array of bytes using BitConverter.GetBytes(), and write that array to the file using Stream.Write().
To read an array of floats from a binary file containing just the float bytes using .Net 4.x:
Firstly you need to know how many floats you will be reading. Let's assume that the entire file is just the bytes from writing an array of floats. In that case you can determine the number of floats by dividing the file length by sizeof(float).
Then you can read each float individually and put it into the array like so (using a BinaryReader to simplify things):
using var readFile = File.OpenRead(#"d:\tmp\test.bin");
using var reader = new BinaryReader(readFile);
int nFloats = (int)readFile.Length / sizeof(float);
float[] input = new float[nFloats];
for (int i = 0; i < nFloats; ++i)
{
input[i] = reader.ReadSingle();
}
Actually you could use BinaryWriter to simplify writing the file in the first place. Compare this with the code using BitConverter:
float[] data = new float[10];
using var saveFile = File.Create(#"d:\tmp\test.bin");
using var writer = new BinaryWriter(saveFile);
foreach (float f in data)
{
writer.Write(f);
}
Note that - unlike BinaryFormatter - BinaryReader and BinaryWriter do not read or write any extraneous header data. They only read and write what you ask them to.
I have a large file, which is outputed by my c++ code.
it save struct into file with binary format.
For example:
Struct A {
char name[32]:
int age;
double height;
};
output code is like:
std::fstream f;
for (int i = 0; i < 10000000; ++ i)
A a;
f.write(&a, sizeof(a));
I want to handle it in python with pandas DataFrame.
Is there any good methos that can read it elegantly?
Searching for read_bin I found this
issue that suggests using np.fromfile to load the data into a numpy array, then converting to a dataframe:
import numpy as np
import pandas as pd
dt = np.dtype(
[
("name", "S32"), # 32-length zero-terminated bytes
("age", "i4"), # 32-bit signed integer
("height", "f8"), # 64-bit floating-point number
],
)
records = np.fromfile("filename.bin", dt)
df = pd.DataFrame(records)
Please note that I have not tested this code, so there could be some problems in the data types I picked:
the byte order might be different (big/small endian dt = np.dtype([('big', '>i4'), ('little', '<i4')]))
the type for the char array is a null terminated byte array, that I think will result in a bytes type object in python, so you might want to convert that to string (using df['name'] = df['name'].str.decode('utf-8'))
More info on the data types can be found in the numpy docs.
Cheers!
Untested, based on a quick review of the Python struct module's documentation.
import struct
def reader(filehandle):
"""
Accept an open filehandle; read and yield tuples according to the
specified format (see the source) until the filehandle is exhausted.
"""
mystruct = struct.Struct("32sid")
while True:
buf = filehandle.read(mystruct.size)
if len(buf) == 0:
break
name, age, height = mystruct.unpack(buf)
yield name, age, height
Usage:
with open(filename, 'rb') as data:
for name, age, height in reader(data):
# do things with those values
I don't know enough about C++ endianness conventions to decide if you should add a modifier to swap around the byte order somewhere. I'm guessing if C++ and Python are both running on the same machine, you would not have to worry about this.
I have a IDL procedure reading a binary file and I try to translate it into a Python routine.
The IDL code look like :
a = uint(0)
b = float(0)
c = float(0)
d = float(0)
e = float(0)
x=dblarr(nptx)
y=dblarr(npty)
z=dblarr(nptz)
openr,11,name_file_data,/f77_unformatted
readu,11,a
readu,11,b,c,d,e
readu,11,x
readu,11,y
readu,11,z
it works perfectly. So I'm writing the same thing in python but I can't find the same results (even the value of 'a' is different). Here is my code :
x=np.zeros(nptx,float)
y=np.zeros(npty,float)
z=np.zeros(nptz,float)
with open(name_file_data, "rb") as fb:
a, = struct.unpack("I", fb.read(4))
b,c,d,e = struct.unpack("ffff", fb.read(16))
x[:] = struct.unpack(str(nptx)+"d", fb.read(nptx*8))[:]
y[:] = struct.unpack(str(npty)+"d", fb.read(npty*8))[:]
z[:] = struct.unpack(str(nptz)+"d", fb.read(nptz*8))[:]
Hope it will help anyone to answer me.
Update : As suggested in the answers, I'm now trying the module "FortranFile", but I'm not sure I understood everything about its use.
from scipy.io import FortranFile
f=FortranFile(name_file_data, 'r')
a=f.read_record('H')
b=f.read_record('f','f','f','f')
However, instead of having an integer for 'a', I got : array([0, 0], dtype=uint16).
And I had this following error for 'b': Size obtained (1107201884) is not a multiple of the dtypes given (16)
According to a table of IDL data types, UINT(0) creates a 16 bit integer (i.e. two bytes). In the Python struct module, the I format character denotes a 4 byte integer, and H denotes an unsigned 16 bit integer.
Try changing the line that unpacks a to
a, = struct.unpack("H", fb.read(2))
Unfortunately, this probably won't fix the problem. You use the option /f77_unformatted with openr, which means the file contains more than just the raw bytes of the variables. (See the documentation of the OPENR command for more information about /f77_unformatted.)
You could try to use scipy.io.FortranFile to read the file, but there are no gaurantees that it will work. The binary layout of an unformatted Fortran file is compiler dependent.
I am new to both Matlab and Python and I have to convert a program in Matlab to Python. I am not sure how to typecast the data after reading from the file in Python. The file used is a binary file.
Below is the Matlab code:
fid = fopen (filename, 'r');
fseek (fid, 0, -1);
meta = zeros (n, 9, 'single');
v = zeros (n, 128, 'single');
d = 0;
for i = 1:n
meta(i,:) = fread (fid, 9, 'float');
d = fread (fid, 1, 'int');
v(i,:) = fread (fid, d, 'uint8=>single');
end
I have written the below program in python:
fid = open(filename, 'r')
fid.seek(0 , 0)
meta = np.zeros((n,9),dtype = np.float32)
v = np.zeros((n,128),dtype = np.float32)
for i in range(n):
data_str = fid.read(9);
meta[1,:] = unpack('f', data_str)
For this unpack, I getting the error as
"unpack requires a string argument of length 4"
.
Please suggest someway to make it work.
I looked a little in the problem mainly because I need this in the near future, too. Turns out there is a very simple solution using numpy, assuming you have a matlab matrix stored like I do.
import numpy as np
def read_matrix(file_name):
return np.fromfile(file_name, dtype='<f') # little-endian single precision float
arr = read_matrix(file_path)
print arr[0:10] #sample data
print len(arr) # number of elements
The data type (dtype) you must find out yourself. Help on this is here. I used fwrite(fid,value,'single'); to store the data in matlab, if you have the same, the code above will work.
Note, that the returned variable is a list; you'll have to format it to match the original shape of your data, in my case len(arr) is 307200 from a matrix of the size 15360 x 20.
Please let me know the best way to write 8bit values to a file in python. The values can be between 0 and 255.
I open the file as follows:
f = open('temp', 'wb')
Assume the value I am writing is an int between 0 and 255 assigned to a variable, e.g.
x = 13
This does not work:
f.write(x)
..as you experts know does not work. python complains about not being able to write ints to buffer interface. As a work around I am writing it as a hex digit. Thus:
f.write(hex(x))
..and that works but is not only space inefficient but clearly not the right python way. Please help. Thanks.
Try explicitly creating a bytes object:
f.write(bytes([x]))
You can also output a series of bytes as follows:
f.write(bytes([65, 66, 67]))
As an alternative you can use the struct module...
import struct
x = 13
with open('temp', 'wb') as f:
f.write(struct.pack('>I', x)) # Big-endian, unsigned int
To read x from the file...
with open('temp', 'rb') as f:
x, = struct.unpack(">I", f.read())
You just need to write an item of bytes data type, not integer.