Unable to read MAT file with scipy

Unable to read MAT file with scipy - python

I am trying to read a matlab file using scipy
import scipy.io as sio
data = sio.loadmat(filepath)
but I get the error
ValueError: Did not fully consume compressed contents of an miCOMPRESSED element. This can indicate that the .mat file is corrupted.
In Matlab I can open this file without any problem. I also tried to save it again, but nothing changed...
Can you help me?
Here: https://drive.google.com/drive/folders/0B3vXKJ_zYaCJanZfOUVIcGJyR0E
you can find 2 files saved in the same way..
I can open part_000, but not part_001.... why? :(

The problem seems to be caused by the compression. .mat files are compressed automatically from version 7 onward.
Therefore, I suggest trying to save the file in the earlier, uncompressed .mat file version 6:
save(filename, 'data', '-v6');

The problem is with scipy.io.loadmat's verify_compressed_data_integrity keyword argument, which defaults to True. It's trying to do some error checking of the headers, but can raise an error even when the data is extracted just fine. See this related GitHub issue. I'm unsure of the implications of switching this off full-time, but if you use the following, it should resolve your issue in the meantime (I can't check it against your data, it's no longer available at the provided URL).
import scipy.io as sio
data = sio.loadmat(filepath, verify_compressed_data_integrity=False)

I can load both files with Octave, and rewrite the one that causes problems
>> data1 = load('part_0001.mat');
>> save -v7 part_0002.mat -struct data1
In Python the rewritten file loads fine, just like your 0000.mat file.
In [8]: data2=loadmat('part_0002.mat')
In [10]: data2.keys()
Out[10]: dict_keys(['RealTime', 'AccNorm', 'Alt', 'FsP', 'DeviceTime', 'FsA', 'Acc', 'imatemp', 'Time', '__version__', '__globals__', '__header__'])
The rewritten file is actually a bit smaller. A V6 file is 13M, and also loadable.
>> save -v6 part_0003.mat -struct data1
So there must be some glitch in loadmat's handling of the V7 format.

sometimes the mat files become corrupt, so Matlab cant recognize the data type and unable to load it. so while saving the mat file. try to save the mat file by setting the long_field_names=True
scipy.io.savemat(filename,long_field_names=True)

Related

Scipy IO Loadmat error: ValueError: Mat 4 mopt wrong format

I have seen this question floating around without any definite answers such as here. I have .mat data converted from a different data structure and am trying to load it in python using scipy.io.loadmat. For some files, this approach works perfectly fine, but for others I get this error:
mat = sio.loadmat(i, verify_compressed_data_integrity=False)
File "/Users/aeglick/opt/anaconda3/lib/python3.8/site-packages/scipy-1.7.1-py3.8-macosx-10.9-x86_64.egg/scipy/io/matlab/mio.py", line 226, in loadmat
matfile_dict = MR.get_variables(variable_names)
File "/Users/aeglick/opt/anaconda3/lib/python3.8/site-packages/scipy-1.7.1-py3.8-macosx-10.9-x86_64.egg/scipy/io/matlab/mio4.py", line 390, in get_variables
hdr, next_position = self.read_var_header()
File "/Users/aeglick/opt/anaconda3/lib/python3.8/site-packages/scipy-1.7.1-py3.8-macosx-10.9-x86_64.egg/scipy/io/matlab/mio4.py", line 346, in read_var_header
hdr = self._matrix_reader.read_header()
File "/Users/aeglick/opt/anaconda3/lib/python3.8/site-packages/scipy-1.7.1-py3.8-macosx-10.9-x86_64.egg/scipy/io/matlab/mio4.py", line 108, in read_header
raise ValueError('Mat 4 mopt wrong format, byteswapping problem?')
ValueError: Mat 4 mopt wrong format, byteswapping problem?
I'm not sure what is causing this issue. I save the .mat files the same way every time so they should all be readable. I also tried h5py and get a similar error. Are there any suggestions on how I can read my data files?

You might be saving the .mat files in different "version" formats without realizing it. If your code calls save(...) without explicitly specifying a format, it uses the default version for your Matlab session, which is a persisted per-user preference that you can set inside the Matlab GUI. And if you don't set a default format in Preferences, the default version that save(...) uses varies with the version of Matlab.
The differences between MAT-file versions are significant. In particular, v7.3 completely changed the format to an HDF5-based one (which I don't think scipy.io.loadmat supports). See https://www.mathworks.com/help/matlab/import_export/mat-file-versions.html.
Check your actual .mat file versions. And if you want your code to be really portable, change the save(...) calls in your code to explicitly specify the MAT-file version using a '-v<whatever>' argument.

RedVox Python SDK | Not Reading in .rdvxz Files

I'm attempting to read in a series of files for processing contained in a single directory using RedVox:
input_directory = "/home/ben/Documents/Data/F1D1/21" # file location
rdvx_data = DataWindow(input_dir=input_directory, apply_correction=False, debug=True) # using RedVox to read in the files
print(os.listdir(input_directory)) # verifying the files actually exist...
# returns "['file1.rdvxz', 'file2.rdvxz', file3.rdvxz', ...etc]", they exist
# write audio portion to file
rdvx_data.to_json_file(base_dir=output_rpd_directory,
file_name=output_filename)
# this never runs, because rdvx_data.stations = [] (verified through debugging)
for station in rdvx_data.stations:
# some code here
Enabling debugging through arguments as seen above does not provide an extra details. In fact, there is no error message whatsoever. It writes the JSON file and pickle to disk, but the JSON file is full of null values and the pickle object is just a shell, no contents. So the files definitely exist, os.listdir() sees them, but RedVox does not.
I assume this is some very silly error or lack of understanding on my part. Any help is greatly appreciated. I have not worked with RedVox previously, nor do I have much understanding of what these files contain other than some audio data and some other data. I've simply been tasked with opening them to work on a model to analyze the data within.

SOLVED: Not sure why the previous code doesn't work (it was handed to me), however, I worked around the DataWindow call and went straight to calling the "redvox.api900.reader" object:
from redvox.api900 import reader
dataset_dir = "/home/*****/Documents/Data/F1D1/21/"
rdvx_files = glob(dataset_dir+"*.rdvxz")
for file in rdvx_files:
wrapped_packet = reader.read_rdvxz_file(file)
From here I can view all of the sensor data within:
if wrapped_packet.has_microphone_sensor():
microphone_sensor = wrapped_packet.microphone_sensor()
print("sample_rate_hz", microphone_sensor.sample_rate_hz())
Hope this helps anyone else who's confused.

Trouble reading npy file

I am new to python and am having trouble reading a *.npy file that somebody else saved. If I use the following commands:
import numpy as np
np.load('lat.npy')
I get the following error:
ValueError: Cannot load file containing pickled data when allow_pickle=False
So, I set allow_pickle=True:
np.load('lat.npy',allow_pickle=True)
Then, I get a different error:
OSError: Failed to interpret file 'lat.npy' as a pickle
Maybe it is relevant that I am on a PC, and the other file was written on a Mac.
Am I doing something wrong? (I am sorry if this question has been asked already.) Thank you!

I learned that my colleague's data file was written in python 2, while I am using python 3. Using the np.load command with the following options will work:
np.load('lat.npy',allow_pickle=True,fix_imports=True,encoding='latin1')
It seems I need to set all of those options, but the 'encoding' argument seems especially important. The doc for numpy.load says about the encoding argument, "Only useful when loading Python 2 generated pickled files in Python 3, which includes npy/npz files containing object arrays."

Unseekable HTTP Response

I am accessing some .mat files hosted on a SimpleHTTP server and trying to load them using Python's loadmat:
from scipy.io import loadmat
import requests
r = requests.get(requestURL,stream=True)
print loadmat(r.raw)
However, I am getting an error
io.UnsupportedOperation: seek
After looking around it seems that r.raw is a file object which is not seekable, and therefore I can't use loadmat on it. I tried the solution here (modified slightly to work with Python 2.7), but maybe I did something wrong because when I try
seekfile = SeekableHTTPFile(requestURL,debug=True)
print 'closed', seekfile.closed
It tells me that the file is closed, and therefore I can't use loadmat on that either. I can provide the code for my modified version if that would be helpful, but I was hoping that there is some better solution to my problem here.
I can't copy/download the .mat file because I don't have write permission on the server where my script is hosted. So I'm looking for a way to get a seekable file object that loadmat can use.
Edit: attempt using StringIO
import StringIO
stringfile = StringIO.StringIO()
stringfile.write(repr(r.raw.read())) # repr needed because of null characters
loaded = loadmat(stringfile)
stringfile.close()
r.raw.close()
results in:
ValueError: Unknown mat file type, version 48, 92

Read matlab file (*.mat) from zipped file without extracting to directory in Python

This specific questions stems from the attempt to handle large data sets produced by a MATLAB algorithm so that I can process them with python algorithms.
Background: I have large arrays in MATLAB (typically 20x20x40x15000 [i,j,k,frame]) and I want to use them in python. So I save the array to a *.mat file and use scipy.io.loadmat(fname) to read the *.mat file into a numpy array. However, a problem arises in that if I try to load the entire *.mat file in python, a memory error occurs. To get around this, I slice the *.mat file into pieces, so that I can load the pieces one at a time into a python array. If I divide up the *.mat by frame, I now have 15,000 *.mat files which quickly becomes a pain to work with (at least in windows). So my solution is to use zipped files.
Question: Can I use scipy to directly read a *.mat file from a zipped file without first unzipping the file to the current working directory?
Specs: Python 2.7, windows xp
Current code:
import scipy.io
import zipfile
import numpy as np
def readZip(zfilename,dim,frames):
data=np.zeros((dim[0],dim[1],dim[2],frames),dtype=np.float32)
zfile = zipfile.ZipFile( zfilename, "r" )
i=0
for info in zfile.infolist():
fname = info.filename
zfile.extract(fname)
mat=scipy.io.loadmat(fname)
data[:,:,:,i]=mat['export']
mat.clear()
i=i+1
return data
Tried code:
mat=scipy.io.loadmat(zfile.read(fname))
produces this error:
TypeError: file() argument 1 must be encoded string without NULL bytes, not str
mat=scipy.io.loadmat(zfile.open(fname))
produces this error:
fileobj.seek(0)
UnsupportedOperation: seek
Any other suggestions on handling the data are appreciated.
Thanks!

I am pretty sure that the answer to my question is NO and there are better ways to accomplish what I am trying to do.
Regardless, with the suggestion from J.F. Sebastian, I have devised a solution.
Solution: Save the data in MATLAB in the HDF5 format, namely hdf5write(fname, '/data', data_variable). This produces a *.h5 file which then can be read into python via h5py.
python code:
import h5py
r = h5py.File(fname, 'r+')
data = r['data']
I can now index directly into the data, however is stays on the hard drive.
print data[:,:,:,1]
Or I can load it into memory.
data_mem = data[:]
However, this once again gives memory errors. So, to get it into memory I can loop through each frame and add it to a numpy array.
h5py FTW!

In one of my frozen applications we bundle some files into the .bin file that py2exe creates, then pull them out like this:
z = zipfile.ZipFile(os.path.join(myDir, 'common.bin'))
data = z.read('schema-new.sql')
I am not certain if that would feed your .mat files into scipy, but I'd consider it worth a try.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unable to read MAT file with scipy - python

The problem seems to be caused by the compression. .mat files are compressed automatically from version 7 onward. Therefore, I suggest trying to save the file in the earlier, uncompressed .mat file version 6: save(filename, 'data', '-v6');

sometimes the mat files become corrupt, so Matlab cant recognize the data type and unable to load it. so while saving the mat file. try to save the mat file by setting the long_field_names=True scipy.io.savemat(filename,long_field_names=True)

Related

Scipy IO Loadmat error: ValueError: Mat 4 mopt wrong format

RedVox Python SDK | Not Reading in .rdvxz Files

Trouble reading npy file

Unseekable HTTP Response

Read matlab file (*.mat) from zipped file without extracting to directory in Python

Categories

Resources