Detecting .mat version using python

Detecting .mat version using python - python

I am aware of the two alternatives to read .mat files using python. For .mat files prior to the 7.3 version the function scipy.io.loadmat works perfectly. For .mat files from the 7.3 version, you need to use a HDF5 reader like h5py.
My question is; is there a way to find, for a given file, its version of .mat within python? This way I can create a function that reads any .mat.

EAFP (it is easier to ask forgiveness than permission)
scipy.loadmat
does check the version and raises an error if it is not supported.
try:
import scipy.io as sio
test = sio.loadmat('test.mat')
except NotImplementedError:
import h5py
with h5py.File('test.mat', 'r') as hf:
data = hf['name-of-dataset'][:]
except:
ValueError('could not read at all...')
If you want to do the checking yourself, you can use get_matfile_version() in scipy.io.matlab.miobase.py. Usage as in the first link.

Related

Python library to use .mat files [duplicate]

Is it possible to read binary MATLAB .mat files in Python?
I've seen that SciPy has alleged support for reading .mat files, but I'm unsuccessful with it. I installed SciPy version 0.7.0, and I can't find the loadmat() method.

An import is required, import scipy.io...
import scipy.io
mat = scipy.io.loadmat('file.mat')

Neither scipy.io.savemat, nor scipy.io.loadmat work for MATLAB arrays version 7.3. But the good part is that MATLAB version 7.3 files are hdf5 datasets. So they can be read using a number of tools, including NumPy.
For Python, you will need the h5py extension, which requires HDF5 on your system.
import numpy as np
import h5py
f = h5py.File('somefile.mat','r')
data = f.get('data/variable1')
data = np.array(data) # For converting to a NumPy array

First save the .mat file as:
save('test.mat', '-v7')
After that, in Python, use the usual loadmat function:
import scipy.io as sio
test = sio.loadmat('test.mat')

There is a nice package called mat4py which can easily be installed using
pip install mat4py
It is straightforward to use (from the website):
Load data from a MAT-file
The function loadmat loads all variables stored in the MAT-file into a simple Python data structure, using only Python’s dict and list objects. Numeric and cell arrays are converted to row-ordered nested lists. Arrays are squeezed to eliminate arrays with only one element. The resulting data structure is composed of simple types that are compatible with the JSON format.
Example: Load a MAT-file into a Python data structure:
from mat4py import loadmat
data = loadmat('datafile.mat')
The variable data is a dict with the variables and values contained in the MAT-file.
Save a Python data structure to a MAT-file
Python data can be saved to a MAT-file, with the function savemat. Data has to be structured in the same way as for loadmat, i.e. it should be composed of simple data types, like dict, list, str, int, and float.
Example: Save a Python data structure to a MAT-file:
from mat4py import savemat
savemat('datafile.mat', data)
The parameter data shall be a dict with the variables.

Having MATLAB 2014b or newer installed, the MATLAB engine for Python could be used:
import matlab.engine
eng = matlab.engine.start_matlab()
content = eng.load("example.mat", nargout=1)

Reading the file
import scipy.io
mat = scipy.io.loadmat(file_name)
Inspecting the type of MAT variable
print(type(mat))
#OUTPUT - <class 'dict'>
The keys inside the dictionary are MATLAB variables, and the values are the objects assigned to those variables.

There is a great library for this task called: pymatreader.
Just do as follows:
Install the package: pip install pymatreader
Import the relevant function of this package: from pymatreader import read_mat
Use the function to read the matlab struct: data = read_mat('matlab_struct.mat')
use data.keys() to locate where the data is actually stored.
The keys will usually look like: dict_keys(['__header__', '__version__', '__globals__', 'data_opp']). Where data_opp will be the actual key which stores the data. The name of this key can ofcourse be changed between different files.
Last step - Create your dataframe: my_df = pd.DataFrame(data['data_opp'])
That's it :)

There is also the MATLAB Engine for Python by MathWorks itself. If you have MATLAB, this might be worth considering (I haven't tried it myself but it has a lot more functionality than just reading MATLAB files). However, I don't know if it is allowed to distribute it to other users (it is probably not a problem if those persons have MATLAB. Otherwise, maybe NumPy is the right way to go?).
Also, if you want to do all the basics yourself, MathWorks provides (if the link changes, try to google for matfile_format.pdf or its title MAT-FILE Format) a detailed documentation on the structure of the file format. It's not as complicated as I personally thought, but obviously, this is not the easiest way to go. It also depends on how many features of the .mat-files you want to support.
I've written a "small" (about 700 lines) Python script which can read some basic .mat-files. I'm neither a Python expert nor a beginner and it took me about two days to write it (using the MathWorks documentation linked above). I've learned a lot of new stuff and it was quite fun (most of the time). As I've written the Python script at work, I'm afraid I cannot publish it... But I can give some advice here:
First read the documentation.
Use a hex editor (such as HxD) and look into a reference .mat-file you want to parse.
Try to figure out the meaning of each byte by saving the bytes to a .txt file and annotate each line.
Use classes to save each data element (such as miCOMPRESSED, miMATRIX, mxDOUBLE, or miINT32)
The .mat-files' structure is optimal for saving the data elements in a tree data structure; each node has one class and subnodes

To read mat file to pandas dataFrame with mixed data types
import scipy.io as sio
mat=sio.loadmat('file.mat')# load mat-file
mdata = mat['myVar'] # variable in mat file
ndata = {n: mdata[n][0,0] for n in mdata.dtype.names}
Columns = [n for n, v in ndata.items() if v.size == 1]
d=dict((c, ndata[c][0]) for c in Columns)
df=pd.DataFrame.from_dict(d)
display(df)

Apart from scipy.io.loadmat for v4 (Level 1.0), v6, v7 to 7.2 matfiles and h5py.File for 7.3 format matfiles, there is anther type of matfiles in text data format instead of binary, usually created by Octave, which can't even be read in MATLAB.
Both of scipy.io.loadmat and h5py.File can't load them (tested on scipy 1.5.3 and h5py 3.1.0), and the only solution I found is numpy.loadtxt.
import numpy as np
mat = np.loadtxt('xxx.mat')

Can also use the hdf5storage library. official documentation here for details on matlab version support.
import hdf5storage
label_file = "./LabelTrain.mat"
out = hdf5storage.loadmat(label_file)
print(type(out)) # <class 'dict'>

from os.path import dirname, join as pjoin
import scipy.io as sio
data_dir = pjoin(dirname(sio.__file__), 'matlab', 'tests', 'data')
mat_fname = pjoin(data_dir, 'testdouble_7.4_GLNX86.mat')
mat_contents = sio.loadmat(mat_fname)
You can use above code to read the default saved .mat file in Python.

After struggling with this problem myself and trying other libraries (I have to say mat4py is a good one as well but with a few limitations) I have built this library ("matdata2py") that can handle most variable types and most importantly for me the "string" type. The .mat file needs to be saved in the -V7.3 version. I hope this can be useful for the community.
Installation:
pip install matdata2py
How to use this lib:
import matdata2py as mtp
To load the Matlab data file:
Variables_output = mtp.loadmatfile(file_Name, StructsExportLikeMatlab = True, ExportVar2PyEnv = False)
print(Variables_output.keys()) # with ExportVar2PyEnv = False the variables are as elements of the Variables_output dictionary.
with ExportVar2PyEnv = True you can see each variable separately as python variables with the same name as saved in the Mat file.
Flag descriptions
StructsExportLikeMatlab = True/False structures are exported in dictionary format (False) or dot-based format similar to Matlab (True)
ExportVar2PyEnv = True/False export all variables in a single dictionary (True) or as separate individual variables into the python environment (False)

scipy will work perfectly to load the .mat files.
And we can use the get() function to convert it to a numpy array.
mat = scipy.io.loadmat('point05m_matrix.mat')
x = mat.get("matrix")
print(type(x))
print(len(x))
plt.imshow(x, extent=[0,60,0,55], aspect='auto')
plt.show()

To Upload and Read mat files in python
Install mat4py in python.On successful installation we get:
Successfully installed mat4py-0.5.0.
Importing loadmat from mat4py.
Save file actual location inside a variable.
Load mat file format to a data value using python
pip install mat4py
from mat4py import loadmat
boston = r"E:\Downloads\boston.mat"
data = loadmat(boston, meta=False)

Read OpenAir File using Python GDAL

I need to read OpenAir files in Python.
According to the following vector driver description, GDAL has built-in OpenAir functionality:
https://gdal.org/drivers/vector/openair.html
However there is no example code for reading such OpenAir files.
So far I have tried to read a sample file using the following lines:
from osgeo import gdal
airspace = gdal.Open('export.txt')
However it returns me the following error:
ERROR 4: `export.txt' not recognized as a supported file format.
I already looked at vectorio however no OpenAir functionality has been implemented.
Why do I get the error above?
In case anyone wants to reproduce the problem: sample OpenAir files can easily be generated using XContest:
https://airspace.xcontest.org/

Since you're dealing with vector data, you need to use ogr instead of gdal (it's normally packaged along with gdal)
So you can do:
from osgeo import ogr
ds = ogr.Open('export.txt')
layer = ds.GetLayer(0)
featureCount = layer.GetFeatureCount()
print(featureCount)
There's plenty of info out there on using ogr, but this cookbook might be helpful.

How to save Python dataset (previously exported from IDL) back to IDL format

I have a file in IDL, I import it to Python using readsav from scipy, I change a parameter in the file and I want to export / save it back to the original format, IDL readable.
This is how I import it:
from scipy.io.idl import readsav
input = readsav('Original_file.inp')

I haven't tested any of this, but here are a few options to try:
Python-to-IDL/IDL-to-Python Bridge
The Python to IDL bridge provides a way to run IDL routines within python. You could try the following
from idlpy import *
from scipy.io.idl import readsav
input = readsav('Original_file.inp')
**change parameter**
IDL.run("SAVE, /VARIABLES, FILENAME = 'New_file.sav'")
There is also an IDL to Python bridge, which might allow you to perform your desired Python operation within IDL, and skip all the loading and saving of files...
Read/Write JSON
It looks like readsav() just returns a dictionary of the contents of the IDL save file. I'm not sure of the contents of your file, so I don't know if this would work, but perhaps you could just write it as a JSON string,
import json
from scipy.io.idl import readsav
input = readsav('Original_file.inp')
**change parameter**
with open('New_file.txt', 'w') as outfile:
json.dump(modified_input, outfile)
and then read it back into IDL with JSON_PARSE() (documentation here).
Write your own hack
If all else fails, you could look at Craig Markwardt's Unofficial Format Specification
of the IDL "SAVE" File, and write some custom code to write an IDL save file directly from Python. If nothing else, it would be an interesting exercise.

Python gdal to read HDF5 with enbedded compression

I am trying to access HDF5 with the compressed image datablok. I use the classical command gdal
f = gdal.Open(path+product)
but this seems not working since the file is pointing to none has you can see below
Starting processing proba database
processing PROBAV_L1C_20131009_092303_2_V001.HDF5
None
processing PROBAV_L1C_20130925_092925_2_V001.HDF5
None
Processing complete
I would like to ask if there is someone can give me some indication how to handle hdf5 which gdal without using h5py which does not support compressed datablock as well.
Thanks

It couldn't open the file, either because it couldn't see the path, or you don't have an HDF5 driver for Python. The behaviour returning None is expected behaivour, but can be modified to raise an exception if it cannot open the file:
from osgeo import gdal
gdal.UseExceptions()
if not gdal.GetDriverByName('HDF5'):
raise Exception('HDF5 driver is not available')

I think you miss protocol before Open.
This works for me with other Proba images:
from os import gddal
path="PROBAV_L2A_20140321_031709_2_333M_V001.HDF5"
product="LEVEL2A/GEOMETRY/SAA"
f = gdal.Open("HDF5:\"{}\"://{}".format(path,product))
f.ReadAsArray()
You could also read the complete name using GetSubDatasets which returns a list of tuples:
ds = gdal.Open(path)
subdataset_read = ds.GetSubDatasets()[0]
print("Subdataset: ",subdataset_read)
ds_sub = gdal.Open(subdataset_read[0],
gdal.GA_ReadOnly)
ds_sub.ReadAsArray()

Read matlab file (*.mat) from zipped file without extracting to directory in Python

This specific questions stems from the attempt to handle large data sets produced by a MATLAB algorithm so that I can process them with python algorithms.
Background: I have large arrays in MATLAB (typically 20x20x40x15000 [i,j,k,frame]) and I want to use them in python. So I save the array to a *.mat file and use scipy.io.loadmat(fname) to read the *.mat file into a numpy array. However, a problem arises in that if I try to load the entire *.mat file in python, a memory error occurs. To get around this, I slice the *.mat file into pieces, so that I can load the pieces one at a time into a python array. If I divide up the *.mat by frame, I now have 15,000 *.mat files which quickly becomes a pain to work with (at least in windows). So my solution is to use zipped files.
Question: Can I use scipy to directly read a *.mat file from a zipped file without first unzipping the file to the current working directory?
Specs: Python 2.7, windows xp
Current code:
import scipy.io
import zipfile
import numpy as np
def readZip(zfilename,dim,frames):
data=np.zeros((dim[0],dim[1],dim[2],frames),dtype=np.float32)
zfile = zipfile.ZipFile( zfilename, "r" )
i=0
for info in zfile.infolist():
fname = info.filename
zfile.extract(fname)
mat=scipy.io.loadmat(fname)
data[:,:,:,i]=mat['export']
mat.clear()
i=i+1
return data
Tried code:
mat=scipy.io.loadmat(zfile.read(fname))
produces this error:
TypeError: file() argument 1 must be encoded string without NULL bytes, not str
mat=scipy.io.loadmat(zfile.open(fname))
produces this error:
fileobj.seek(0)
UnsupportedOperation: seek
Any other suggestions on handling the data are appreciated.
Thanks!

I am pretty sure that the answer to my question is NO and there are better ways to accomplish what I am trying to do.
Regardless, with the suggestion from J.F. Sebastian, I have devised a solution.
Solution: Save the data in MATLAB in the HDF5 format, namely hdf5write(fname, '/data', data_variable). This produces a *.h5 file which then can be read into python via h5py.
python code:
import h5py
r = h5py.File(fname, 'r+')
data = r['data']
I can now index directly into the data, however is stays on the hard drive.
print data[:,:,:,1]
Or I can load it into memory.
data_mem = data[:]
However, this once again gives memory errors. So, to get it into memory I can loop through each frame and add it to a numpy array.
h5py FTW!

In one of my frozen applications we bundle some files into the .bin file that py2exe creates, then pull them out like this:
z = zipfile.ZipFile(os.path.join(myDir, 'common.bin'))
data = z.read('schema-new.sql')
I am not certain if that would feed your .mat files into scipy, but I'd consider it worth a try.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.