I created a binary file in C# to store floats.
BinaryFormatter formatter = new BinaryFormatter();
FileStream saveFile = File.Create(my_path);
formatter.Serialize(saveFile, data_to_store);
saveFile.Close();
The data can be read in C# well. But I cannot read it from python.
f = open(path,'rb')
nums=int(os.path.getsize(fpath)/4)
data = struct.unpack('f'*nums,f.read(4*nums))
print(data)
f.close()
data = np.array(data)
The code above did not work.
It looks like you're trying to serialise a binary array of floats.
If you do that using BinaryFormatter.Serialize(), it will write some header information too. I'm guessing you don't want that.
Also note that BinaryFormatter.Serialize() is obsolete, and Microsoft recommends that you don't use it.
If you just want to write a float array as binary to a file, with no header or other extraneous data, you can do it like this in .NET Core 3.1 or later:
float[] data = new float[10];
using FileStream saveFile = File.Create(#"d:\tmp\test.bin");
saveFile.Write(MemoryMarshal.AsBytes(data.AsSpan()));
If you're using .Net 4.x, you have to do it like this:
float[] data = new float[10];
using FileStream saveFile = File.Create(#"d:\tmp\test.bin");
foreach (float f in data)
{
saveFile.Write(BitConverter.GetBytes(f), 0, sizeof(float));
}
i.e. you have to convert each float to an array of bytes using BitConverter.GetBytes(), and write that array to the file using Stream.Write().
To read an array of floats from a binary file containing just the float bytes using .Net 4.x:
Firstly you need to know how many floats you will be reading. Let's assume that the entire file is just the bytes from writing an array of floats. In that case you can determine the number of floats by dividing the file length by sizeof(float).
Then you can read each float individually and put it into the array like so (using a BinaryReader to simplify things):
using var readFile = File.OpenRead(#"d:\tmp\test.bin");
using var reader = new BinaryReader(readFile);
int nFloats = (int)readFile.Length / sizeof(float);
float[] input = new float[nFloats];
for (int i = 0; i < nFloats; ++i)
{
input[i] = reader.ReadSingle();
}
Actually you could use BinaryWriter to simplify writing the file in the first place. Compare this with the code using BitConverter:
float[] data = new float[10];
using var saveFile = File.Create(#"d:\tmp\test.bin");
using var writer = new BinaryWriter(saveFile);
foreach (float f in data)
{
writer.Write(f);
}
Note that - unlike BinaryFormatter - BinaryReader and BinaryWriter do not read or write any extraneous header data. They only read and write what you ask them to.
Related
I have a large file, which is outputed by my c++ code.
it save struct into file with binary format.
For example:
Struct A {
char name[32]:
int age;
double height;
};
output code is like:
std::fstream f;
for (int i = 0; i < 10000000; ++ i)
A a;
f.write(&a, sizeof(a));
I want to handle it in python with pandas DataFrame.
Is there any good methos that can read it elegantly?
Searching for read_bin I found this
issue that suggests using np.fromfile to load the data into a numpy array, then converting to a dataframe:
import numpy as np
import pandas as pd
dt = np.dtype(
[
("name", "S32"), # 32-length zero-terminated bytes
("age", "i4"), # 32-bit signed integer
("height", "f8"), # 64-bit floating-point number
],
)
records = np.fromfile("filename.bin", dt)
df = pd.DataFrame(records)
Please note that I have not tested this code, so there could be some problems in the data types I picked:
the byte order might be different (big/small endian dt = np.dtype([('big', '>i4'), ('little', '<i4')]))
the type for the char array is a null terminated byte array, that I think will result in a bytes type object in python, so you might want to convert that to string (using df['name'] = df['name'].str.decode('utf-8'))
More info on the data types can be found in the numpy docs.
Cheers!
Untested, based on a quick review of the Python struct module's documentation.
import struct
def reader(filehandle):
"""
Accept an open filehandle; read and yield tuples according to the
specified format (see the source) until the filehandle is exhausted.
"""
mystruct = struct.Struct("32sid")
while True:
buf = filehandle.read(mystruct.size)
if len(buf) == 0:
break
name, age, height = mystruct.unpack(buf)
yield name, age, height
Usage:
with open(filename, 'rb') as data:
for name, age, height in reader(data):
# do things with those values
I don't know enough about C++ endianness conventions to decide if you should add a modifier to swap around the byte order somewhere. I'm guessing if C++ and Python are both running on the same machine, you would not have to worry about this.
I'm trying to read noncontiguous fields from a binary file in Python using numpy fromfile function. It's based on this Matlab code using fread:
fseek(file, 0, 'bof');
q = fread(file, inf, 'float32', 8);
8 indicates the number of bytes I want to skip after reading each value. I was wondering if there was a similar option in fromfile, or if there is another way of reading specific values from a binary file in Python. Thanks for your help.
Henrik
Something like this should work, untested:
import struct
floats = []
with open(filename, 'rb') as f:
while True:
buff = f.read(4) # 'f' is 4-bytes wide
if len(buff) < 4: break
x = struct.unpack('f', buff)[0] # Convert buffer to float (get from returned tuple)
floats.append(x) # Add float to list (for example)
f.seek(8, 1) # The second arg 1 specifies relative offset
Using struct.unpack()
I have a binary file that contains both integers and doubles. I want to access that data either in one call (something like: x = np.fromfile(f, dtype=np.int)) or sequentially (value by value). However, NumPy doesn't seem to allow to read from a binary file without specifying a type. Should I convert everything to double, or forget about NumPy?
Edit. Let's say the format of the file is something like this:
int
int int int double double double double double double double
etc.
NumPy doesn't seem to allow to read from a binary file without specifying a type
No programming language I know of pretends to be able to guess the type of raw binary data; and for good reasons. What exactly is the higher level problem you are trying to solve?
I don't think you'd need numpy for this. The basic Python binary library struct is doing the job. Convert list of tuples given at end into numpy array if so desired.
For sources see https://docs.python.org/2/library/struct.html and #martineau
Reading a binary file into a struct in Python
from struct import pack,unpack
with open("foo.bin","wb") as file:
a=pack("<iiifffffff", 1,2,3, 1.1,2.2e-2,3.3e-3,4.4e-4,5.5e-5,6.6e-6,7.7e-7 )
file.write(a)
with open("foo.bin","r") as file:
a=unpack("<iiifffffff",file.read() )
print a
output:
(1, 2, 3, 1.100000023841858, 0.02199999988079071, 0.0032999999821186066, 0.0004400000034365803, 5.500000042957254e-05, 6.599999778700294e-06, 7.699999855503847e-07)
Showing the binary file in a binary editor (Frhed):
#how to read same structure repeatedly
import struct
fn="foo2.bin"
struct_fmt = '<iiifffffff'
struct_len = struct.calcsize(struct_fmt)
struct_unpack = struct.Struct(struct_fmt).unpack_from
with open(fn,"wb") as file:
a=struct.pack("<iiifffffff", 1,2,3, 1.1,2.2e-2,3.3e-3,4.4e-4,5.5e-5,6.6e-6,7.7e-7 )
for i in range(3):
file.write(a)
results = []
with open(fn, "rb") as f:
while True:
data = f.read(struct_len)
if not data: break
s = struct_unpack(data)
results.append(s)
print results
output:
[(1, 2, 3, 1.100000023841858, 0.02199999988079071, 0.0032999999821186066, 0.0004400000034365803, 5.500000042957254e-05, 6.599999778700294e-06, 7.699999855503847e-07), (1, 2, 3, 1.100000023841858, 0.02199999988079071, 0.0032999999821186066, 0.0004400000034365803, 5.500000042957254e-05, 6.599999778700294e-06, 7.699999855503847e-07), (1, 2, 3, 1.100000023841858, 0.02199999988079071, 0.0032999999821186066, 0.0004400000034365803, 5.500000042957254e-05, 6.599999778700294e-06, 7.699999855503847e-07)]
I have a little problem concerning string generation in C.
The following code snippet is part of a C Extension for a Python/Tkinter app which generates images (mandelbrot, gradients and such). Before anyone asks: I don't want to power up Photoshop for such a simple task - overkill...
The problem I'm having is at the end of the snippet in the last for-loop.
This function generates a PPM image file for further processing. The main goal is to generate a string containing the raster data in binary format and pass that string back to Python and then to Tkinter image data to have a preview of the result.
At the moment I write a file to disk which is pretty slow.
The iterator-function returns a pointer to a RGB-array.
If I now write every single color-value to the file using
fputc(col[0], outfile)
it works (the section which is commeted out).
To get closer to my main goal I tried to merge the three color values into a string and write that into the file.
When I run that code from my Python app, I end up with a file containing just the header.
Could anyone please point me in the right direction? Tha whole C-thing is pretty new to me - so I'm pretty much stuck here...
static PyObject* py_mandelbrotppm(PyObject* self, PyObject* args)
{
//get filename from argument
char *filename;
PyArg_ParseTuple(args, "s", &filename);
//---------- open file for writing and create header
FILE *outfile = NULL;
outfile = fopen(filename, "w");
//---------- create ppm header
char header[17];
sprintf(header,"P6\n%d %d\n255\n", dim_x, dim_y);
fputs(header, outfile);
//---------- end of header generation
for(int y = 0;y<dim_y;y++)
{
for(int x = 0;x<dim_x;x++)
{
int *col = iterator(x,y);
char pixel[3] = {col[0], col[1], col[2]};
fputs(pixel, outfile);
/*
for(int i = 0;i<3;i++)
{
fputc(pixel[i], outfile);
}
*/
}
}
fclose(outfile);
Py_RETURN_NONE;
}
You have a couple of problems with your new code.
pixel is missing a null terminator (and space for it). Fix it like this:
char pixel[4] = {col[0], col[1], col[2], '\0'};
But I'll let you in on a little secret. Putting a bunch of ints into an array of chars is going to truncate them and do all sorts of weird, squirrly things. Maybe not for char-length numbers, but in terms of general style I wouldn't recommend it. Consider this:
...
for(int x = 0;x<dim_x;x++){
int *col = iterator(x,y);
fprintf(outfile, "%d, %d, %d", col[0], col[1], col[2]);
}
...
On the other hand, I'm a little confused as to why iterator returns ints when RGB values are from 0-255, which is precisely the range an unsigned char has:
unsigned char *col = iterator(x,y);
fprintf(outfile, "%u, %u, %u", col[0], col[1], col[2]);
so I asked everything in the title:
I have a wav file (written by PyAudio from an input audio) and I want to convert it in float data corresponding of the sound level (amplitude) to do some fourier transformation etc...
Anyone have an idea to convert WAV data to float?
I have identified two decent ways of doing this.
Method 1: using the wavefile module
Use this method if you don't mind installing some extra libraries which involved a bit of messing around on my Mac but which was easy on my Ubuntu server.
https://github.com/vokimon/python-wavefile
import wavefile
# returns the contents of the wav file as a double precision float array
def wav_to_floats(filename = 'file1.wav'):
w = wavefile.load(filename)
return w[1][0]
signal = wav_to_floats(sys.argv[1])
print "read "+str(len(signal))+" frames"
print "in the range "+str(min(signal))+" to "+str(max(signal))
Method 2: using the wave module
Use this method if you want less module install hassles.
Reads a wav file from the filesystem and converts it into floats in the range -1 to 1. It works with 16 bit files and if they are > 1 channel, will interleave the samples in the same way they are found in the file. For other bit depths, change the 'h' in the argument to struct.unpack according to the table at the bottom of this page:
https://docs.python.org/2/library/struct.html
It will not work for 24 bit files as there is no data type that is 24 bit, so there is no way to tell struct.unpack what to do.
import wave
import struct
import sys
def wav_to_floats(wave_file):
w = wave.open(wave_file)
astr = w.readframes(w.getnframes())
# convert binary chunks to short
a = struct.unpack("%ih" % (w.getnframes()* w.getnchannels()), astr)
a = [float(val) / pow(2, 15) for val in a]
return a
# read the wav file specified as first command line arg
signal = wav_to_floats(sys.argv[1])
print "read "+str(len(signal))+" frames"
print "in the range "+str(min(signal))+" to "+str(max(signal))
I spent hours trying to find the answer to this. The solution turns out to be really simple: struct.unpack is what you're looking for. The final code will look something like this:
rawdata=stream.read() # The raw PCM data in need of conversion
from struct import unpack # Import unpack -- this is what does the conversion
npts=len(rawdata) # Number of data points to be converted
formatstr='%ih' % npts # The format to convert the data; use '%iB' for unsigned PCM
int_data=unpack(formatstr,rawdata) # Convert from raw PCM to integer tuple
Most of the credit goes to Interpreting WAV Data. The only trick is getting the format right for unpack: it has to be the right number of bytes and the right format (signed or unsigned).
Most wave files are in PCM 16-bit integer format.
What you will want to:
Parse the header to known which format it is (check the link from Xophmeister)
Read the data, take the integer values and convert them to float
Integer values range from -32768 to 32767, and you need to convert to values from -1.0 to 1.0 in floating points.
I don't have the code in python, however in C++, here is a code excerpt if the PCM data is 16-bit integer, and convert it to float (32-bit):
short* pBuffer = (short*)pReadBuffer;
const float ONEOVERSHORTMAX = 3.0517578125e-5f; // 1/32768
unsigned int uFrameRead = dwRead / m_fmt.Format.nBlockAlign;
for ( unsigned int i = 0; i < uFrameCount * m_fmt.Format.nChannels; ++i )
{
short i16In = pBuffer[i];
out_pBuffer[i] = (float)i16In * ONEOVERSHORTMAX;
}
Be careful with stereo files, as the stereo PCM data in wave files is interleaved, meaning the data looks like LRLRLRLRLRLRLRLR (instead of LLLLLLLLRRRRRRRR). You may or may not need to de-interleave depending what you do with the data.
This version reads a wav file from the filesystem and converts it into floats in the range -1 to 1. It works with files of all sample widths and it will interleave the samples in the same way they are found in the file.
import wave
def read_wav_file(filename):
def get_int(bytes_obj):
an_int = int.from_bytes(bytes_obj, 'little', signed=sampwidth!=1)
return an_int - 128 * (sampwidth == 1)
with wave.open(filename, 'rb') as file:
sampwidth = file.getsampwidth()
frames = file.readframes(-1)
bytes_samples = (frames[i : i+sampwidth] for i in range(0, len(frames), sampwidth))
return [get_int(b) / pow(2, sampwidth * 8 - 1) for b in bytes_samples]
Also here is a link to the function that converts floats back to ints and writes them to desired wav file:
https://gto76.github.io/python-cheatsheet/#writefloatsamplestowavfile
The Microsoft WAVE format is fairly well documented. See https://ccrma.stanford.edu/courses/422/projects/WaveFormat/ for example. It wouldn't take much to write a file parser to open and interpret the data to get the information you require... That said, it's almost certainly been done before, so I'm sure someone will give an "easier" answer ;)