I'm trying to convert binary files written in C to and from HDF5 files using Python.
To read the binary file Python works like this:
pos=np.fromfile(f, count=npt*3, dtype='f4').reshape((npt, 3))
To write the same thing I've tried, without success, array.tofile() and now I'm tryin to use ctypes like that (stitching together different answers found on the web):
import ctypes as c
print "Loading C libraries with ctype"
libc = c.CDLL("libc.so.6") # Linux
# fopen()
libc.fopen.restype = c.c_void_p
def errcheck(res, func, args):
if not res: raise IOError
return res
libc.fopen.errcheck = errcheck
# errcheck() could be similarly defined for `fwrite`, `fclose`
c_int_p = c.POINTER(c.c_int)
c_float_p = c.POINTER(c.c_float)
c_double_p = c.POINTER(c.c_double)
def c_write(data, f, numpy_type, c_type_p, nbyte, count):
data = data.astype(numpy_type)
data_p = data.ctypes.data_as(c_type_p)
nitems = libc.fwrite(data_p, nbyte, count, f)
if nitems != data.size: # not all data were written
print "Not all data were written, exit..."
sys.exit()
c_write(pos, f, np.int32, c_int_p, 4, npart.size)
You should probably look into the struct module, it's awesome for packing and unpacking data at the lowest byte-per-byte level.
Related
I have a IDL procedure reading a binary file and I try to translate it into a Python routine.
The IDL code look like :
a = uint(0)
b = float(0)
c = float(0)
d = float(0)
e = float(0)
x=dblarr(nptx)
y=dblarr(npty)
z=dblarr(nptz)
openr,11,name_file_data,/f77_unformatted
readu,11,a
readu,11,b,c,d,e
readu,11,x
readu,11,y
readu,11,z
it works perfectly. So I'm writing the same thing in python but I can't find the same results (even the value of 'a' is different). Here is my code :
x=np.zeros(nptx,float)
y=np.zeros(npty,float)
z=np.zeros(nptz,float)
with open(name_file_data, "rb") as fb:
a, = struct.unpack("I", fb.read(4))
b,c,d,e = struct.unpack("ffff", fb.read(16))
x[:] = struct.unpack(str(nptx)+"d", fb.read(nptx*8))[:]
y[:] = struct.unpack(str(npty)+"d", fb.read(npty*8))[:]
z[:] = struct.unpack(str(nptz)+"d", fb.read(nptz*8))[:]
Hope it will help anyone to answer me.
Update : As suggested in the answers, I'm now trying the module "FortranFile", but I'm not sure I understood everything about its use.
from scipy.io import FortranFile
f=FortranFile(name_file_data, 'r')
a=f.read_record('H')
b=f.read_record('f','f','f','f')
However, instead of having an integer for 'a', I got : array([0, 0], dtype=uint16).
And I had this following error for 'b': Size obtained (1107201884) is not a multiple of the dtypes given (16)
According to a table of IDL data types, UINT(0) creates a 16 bit integer (i.e. two bytes). In the Python struct module, the I format character denotes a 4 byte integer, and H denotes an unsigned 16 bit integer.
Try changing the line that unpacks a to
a, = struct.unpack("H", fb.read(2))
Unfortunately, this probably won't fix the problem. You use the option /f77_unformatted with openr, which means the file contains more than just the raw bytes of the variables. (See the documentation of the OPENR command for more information about /f77_unformatted.)
You could try to use scipy.io.FortranFile to read the file, but there are no gaurantees that it will work. The binary layout of an unformatted Fortran file is compiler dependent.
I want to create two binary files to append numpy arrays into each one of them during a loop. I wrote the following method (I use Python 2.7):
for _ in range(5):
C = np.random.rand(1, 5)
r = np.random.rand(1, 5)
with open("C.bin", "ab") as file1, open("r.bin", "ab") as file2:
# Append to binary files
np.array(C).tofile(file1)
np.array(r).tofile(file2)
# Now printing to check if appending is successful
C = np.load("C.bin")
r = np.load("r.bin")
print (C)
print (r)
However, I keep getting this error:
Traceback (most recent call last):
File "test.py", line 15, in <module>
C = np.load("C.bin")
File "/anaconda/lib/python2.7/site-packages/numpy/lib/npyio.py", line 429, in load
"Failed to interpret file %s as a pickle" % repr(file))
IOError: Failed to interpret file 'C.bin' as a pickle
I tried to fix it but I cannot see anything more. Any help is appreciated.
NOTE: I intentionally want to use np.load because later on I will be loading the dataset from the disk into a numpy array for further processing.
You should use the save method that is built in the numpy to store the array in the files. Here what your code should look like:
for _ in range(5):
C = np.random.rand(1, 5)
r = np.random.rand(1, 5)
np.save('C', C)
np.save('r', r)
# Now printing to check if appending is successful
C = np.load("C.npy")
r = np.load("r.npy")
print (C)
print (r)
del C, r
Please refer to the documentation https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.load.html
Tying to read items from Keychain using python (Novice at Mac python)
This is where I have got hacking together several things found in googlepedia
from ctypes import CDll, byref, Structure, POINTER
from Foundation import NSDictionary
class OpaqueObject:
pass
OpaquePtr = POINTER(OpaqueObject)
Security = CDLL('/System/Library......../Security')
query = NSDictionary.dictionaryWithDictionary({<still working on this part>})
items = OpauePtr()
Security.SecItemCopyMatching(query, byref(items))
the {still working on this part}, currently reads {"foo":"bar"} which is of course an invalid query, but it should at least run
anyway it fails on the call of SecItemCopyMatching saying it doent know how to convert param1. I know the the function is defined to take CFDictionary but I expected the toll-free bridging to accept NSDictionary
Anyway I suspect this is all v bad code that is mixing 2 mac python mechanisms ctypes and PyObjc.
Toll-free bridging isn't applicable to Python ctypes. Consider using a CFDictionary instead:
CoreFoundation = CDLL('/System/Library/Frameworks/CoreFoundation.framework/CoreFoundation')
CoreFoundation.CFDictionaryCreateMutable.restype = OpaquePtr
CoreFoundation.CFStringCreateWithBytes.restype = OpaquePtr
def CFDictionaryAddStringKeyValue(d, k, v):
ck = CoreFoundation.CFStringCreateWithBytes(None, k, len(k), 0, 0)
cv = CoreFoundation.CFStringCreateWithBytes(None, v, len(v), 0, 0)
CoreFoundation.CFDictionaryAddValue(d, ck, cv)
CoreFoundation.CFRelease(ck)
CoreFoundation.CFRelease(cv)
query = CoreFoundation.CFDictionaryCreateMutable(None, 0, None, None)
CFDictionaryAddStringKeyValue(query, "foo", "bar")
Then you can just pass query into SecItemCopyMatching.
I am new to both Matlab and Python and I have to convert a program in Matlab to Python. I am not sure how to typecast the data after reading from the file in Python. The file used is a binary file.
Below is the Matlab code:
fid = fopen (filename, 'r');
fseek (fid, 0, -1);
meta = zeros (n, 9, 'single');
v = zeros (n, 128, 'single');
d = 0;
for i = 1:n
meta(i,:) = fread (fid, 9, 'float');
d = fread (fid, 1, 'int');
v(i,:) = fread (fid, d, 'uint8=>single');
end
I have written the below program in python:
fid = open(filename, 'r')
fid.seek(0 , 0)
meta = np.zeros((n,9),dtype = np.float32)
v = np.zeros((n,128),dtype = np.float32)
for i in range(n):
data_str = fid.read(9);
meta[1,:] = unpack('f', data_str)
For this unpack, I getting the error as
"unpack requires a string argument of length 4"
.
Please suggest someway to make it work.
I looked a little in the problem mainly because I need this in the near future, too. Turns out there is a very simple solution using numpy, assuming you have a matlab matrix stored like I do.
import numpy as np
def read_matrix(file_name):
return np.fromfile(file_name, dtype='<f') # little-endian single precision float
arr = read_matrix(file_path)
print arr[0:10] #sample data
print len(arr) # number of elements
The data type (dtype) you must find out yourself. Help on this is here. I used fwrite(fid,value,'single'); to store the data in matlab, if you have the same, the code above will work.
Note, that the returned variable is a list; you'll have to format it to match the original shape of your data, in my case len(arr) is 307200 from a matrix of the size 15360 x 20.
I've been using Audiolab to import sound files in the past, and it worked quite well. However:
It doesn't support some formats, like mp3, because the underlying libsndfile refuses to support them
It doesn't work in Python 2.6 under Windows, and the author is not around to fix it
-
In [2]: from scikits import audiolab
--------------------------------------------------------------------
ImportError Traceback (most recent call last)
C:\Python26\Scripts\<ipython console> in <module>()
C:\Python26\lib\site-packages\scikits\audiolab\__init__.py in <module>()
23 __version__ = _version
24
---> 25 from pysndfile import formatinfo, sndfile
26 from pysndfile import supported_format, supported_endianness, \
27 supported_encoding, PyaudioException, \
C:\Python26\lib\site-packages\scikits\audiolab\pysndfile\__init__.py in <module>()
----> 1 from _sndfile import Sndfile, Format, available_file_formats, available_encodings
2 from compat import formatinfo, sndfile, PyaudioException, PyaudioIOError
3 from compat import supported_format, supported_endianness, supported_encoding
ImportError: DLL load failed: The specified module could not be found.``
So I would like to either:
Figure out why it's not working in 2.6 (something wrong with _sndfile.pyd?) and maybe find a way to extend it to work with unsupported formats
Find a complete replacement for audiolab
Audiolab is working for me on Ubuntu 9.04 with Python 2.6.2, so it might be a Windows problem. In your link to the forum, the author also suggests that it is a Windows error.
In the past, this option has worked for me, too:
from scipy.io import wavfile
fs, data = wavfile.read(filename)
Just beware that data may have int data type, so it is not scaled within [-1,1). For example, if data is int16, you must divide data by 2**15 to scale within [-1,1).
Sox http://sox.sourceforge.net/ can be your friend for this. It can read many many different formats and output them as raw in whatever datatype you prefer. In fact, I just wrote the code to read a block of data from an audio file into a numpy array.
I decided to go this route for portability (sox is very widely available) and to maximize the flexibility of input audio types I could use. Actually, it seems from initial testing that it isn't noticeably slower for what I'm using it for... which is reading short (a few seconds) of audio from very long (hours) files.
Variables you need:
SOX_EXEC # the sox / sox.exe executable filename
filename # the audio filename of course
num_channels # duh... the number of channels
out_byps # Bytes per sample you want, must be 1, 2, 4, or 8
start_samp # sample number to start reading at
len_samp # number of samples to read
The actual code is really simple. If you want to extract the whole file, you can remove the start_samp, len_samp, and 'trim' stuff.
import subprocess # need the subprocess module
import numpy as NP # I'm lazy and call numpy NP
cmd = [SOX_EXEC,
filename, # input filename
'-t','raw', # output file type raw
'-e','signed-integer', # output encode as signed ints
'-L', # output little endin
'-b',str(out_byps*8), # output bytes per sample
'-', # output to stdout
'trim',str(start_samp)+'s',str(len_samp)+'s'] # only extract requested part
data = NP.fromstring(subprocess.check_output(cmd),'<i%d'%(out_byps))
data = data.reshape(len(data)/num_channels, num_channels) # make samples x channels
PS: Here is code to read stuff from audio file headers using sox...
info = subprocess.check_output([SOX_EXEC,'--i',filename])
reading_comments_flag = False
for l in info.splitlines():
if( not l.strip() ):
continue
if( reading_comments_flag and l.strip() ):
if( comments ):
comments += '\n'
comments += l
else:
if( l.startswith('Input File') ):
input_file = l.split(':',1)[1].strip()[1:-1]
elif( l.startswith('Channels') ):
num_channels = int(l.split(':',1)[1].strip())
elif( l.startswith('Sample Rate') ):
sample_rate = int(l.split(':',1)[1].strip())
elif( l.startswith('Precision') ):
bits_per_sample = int(l.split(':',1)[1].strip()[0:-4])
elif( l.startswith('Duration') ):
tmp = l.split(':',1)[1].strip()
tmp = tmp.split('=',1)
duration_time = tmp[0]
duration_samples = int(tmp[1].split(None,1)[0])
elif( l.startswith('Sample Encoding') ):
encoding = l.split(':',1)[1].strip()
elif( l.startswith('Comments') ):
comments = ''
reading_comments_flag = True
else:
if( other ):
other += '\n'+l
else:
other = l
if( output_unhandled ):
print >>sys.stderr, "Unhandled:",l
pass
FFmpeg supports mp3s and works on Windows (http://zulko.github.io/blog/2013/10/04/read-and-write-audio-files-in-python-using-ffmpeg/).
Reading an mp3 file:
import subprocess as sp
FFMPEG_BIN = "ffmpeg.exe"
command = [ FFMPEG_BIN,
'-i', 'mySong.mp3',
'-f', 's16le',
'-acodec', 'pcm_s16le',
'-ar', '44100', # ouput will have 44100 Hz
'-ac', '2', # stereo (set to '1' for mono)
'-']
pipe = sp.Popen(command, stdout=sp.PIPE, bufsize=10**8)
Format data into numpy array:
raw_audio = pipe.proc.stdout.read(88200*4)
import numpy
audio_array = numpy.fromstring(raw_audio, dtype="int16")
audio_array = audio_array.reshape((len(audio_array)/2,2))
In case you want to do this for MP3
Here's what I'm using: It uses pydub and scipy.
Full setup (on Mac, may differ on other systems):
import tempfile
import os
import pydub
import scipy
import scipy.io.wavfile
def read_mp3(file_path, as_float = False):
"""
Read an MP3 File into numpy data.
:param file_path: String path to a file
:param as_float: Cast data to float and normalize to [-1, 1]
:return: Tuple(rate, data), where
rate is an integer indicating samples/s
data is an ndarray(n_samples, 2)[int16] if as_float = False
otherwise ndarray(n_samples, 2)[float] in range [-1, 1]
"""
path, ext = os.path.splitext(file_path)
assert ext=='.mp3'
mp3 = pydub.AudioSegment.from_mp3(file_path)
_, path = tempfile.mkstemp()
mp3.export(path, format="wav")
rate, data = scipy.io.wavfile.read(path)
os.remove(path)
if as_float:
data = data/(2**15)
return rate, data
Credit to James Thompson's blog
I've been using PySoundFile lately instead of Audiolab. It installs easily with conda.
It does not support mp3, like most things. MP3 is no longer patented, so there's no reason why it can't support it; someone just has to write support into libsndfile.