Compare Matlab and Python variables - python

I am into a task where I am converting .m files to .py. But to test the code I have to dump or log values of each variables for both Python and Matlab in some log files.
Then I compare each after opening them in Excel sheet using its column property. Like what is the array index - what each index / column-row value is etc. This is very tiresome and I am not sure how we can compare variable / statements output for a specific variable programmatically in regards that it is just a .m to .py conversion.

You can run the program in Matlab and save all the variables using the save command. This saves to a .mat file. Then you can load the variables from that file into python using scipy.io.loadmat and compare them in python.
First, in matlab:
save 'data.mat' var1 var2 var3
Then in python (in the same directory, or provide a full path):
import scipy.io
vars = scipy.io.loadmat('data.mat', squeeze_me=True)
var1_matlab = vars['var1']
var2_matlab = vars['var2']
var3_matlab = vars['var3']
Note that numpy has 1D arrays, while Matlab does not (1D arrays in Matlab are actually 2D arrays where one dimension has a length of 1). This may mean that the number of dimensions in the python and scipy versions of a variable are different. squeeze_me fixes this by eliminating dimensions with a length of 1, but it may, for example, take a 2D array from Matlab that happens to just have a length of 1 in some dimension and squeezes that to a 1D python array. So you may have to do some manual dimension matching no matter what.
To get this to work, make sure matlab is configured to save files in the "MATLAB Version 5 or later" file format, though (in 2014B this is in preferences>General>MAT-Files).
If you absolutely must use version 7 files, you can try hdf5storage, which says it supports them. However, you probably have scipy already installed, and I have personally used the scipy approach and confirmed it worked but have not done the same with hdf5storage.

Related

How to load .mat file into workspace using Matlab Engine API for Python?

I have a .mat workspace file containing 4 character variables. These variables contain paths to various folders I need to be able to cd to and from relatively quickly. Usually, when using only Matlab I can load this workspace as follows (provided the .mat file is in the current directory).
load paths.mat
Currently I am experimenting with the Matlab Engine API for Python. The Matlab help docs recommend using the following Python formula to send variables to the current workspace in the desktop app:
import matlab.engine
eng = matlab.engine.start_matlab()
x = 4.0
eng.workspace['y'] = x
a = eng.eval('sqrt(y)')
print(a)
Which works well. However the whole point of the .mat file is that it can quickly load entire sets of variables the user is comfortable with. So the above is not efficient when trying to load the workspace.
I have also tried two different variations in Python:
eng.load("paths.mat")
eng.eval("load paths.mat")
The first variation successfully loads a dict variable in Python containing all four keys and values but this does not propagate to the workspace in Matlab. The second variation throws an error:
File "", line unknown SyntaxError: Error: Unexpected MATLAB
expression.
How do I load up a workspace through the engine without having to manually do it in Matlab? This is an important part of my workflow....
You didn't specify the number of output arguments from the MATLAB engine, which is a possible reason for the error.
I would expect the error from eng.load("paths.mat") to read something like
TypeError: unsupported data type returned from MATLAB
The difference in error messages may arise from different versions of MATLAB, engine API...
In any case, try specifying the number of output arguments like so,
eng.load("paths.mat", nargout=0)
This was giving me fits for a while. A few things to try. I was able to get this working on Matlab 2019a with Python 3.7. I had the most trouble trying to create a string and using the string as an argument for load and eval/evalin, so there might be some trickiness with the single or double quotes, or needing to have an additional set of quotes in the string.
Make sure the MAT file is on the Matlab Path. You can use addpath and rmpath really easily with pathlib objects:
from pathlib import Path
mat_file = Path('local/path/from/cwd/example.mat').resolve # get absolute path
eng.addpath(str(mat_file.parent))
# Execute other commands
eng.rmpath(str(mat_file.parent))
Per dML's answer, make sure to specify the nargout=0 when there are no outputs from the function, and always when calling a script. If there are 1 or more outputs you don't have to have an output in Python, and there is more than one it will be output as a tuple.
You can also turn your script into a function (just won't have access to base workspace without using evalin/assignin):
function load_example_matfile()
evalin('base','load example.mat')
end
eng.feval('load_example_matfile')
And, it does seem to work on the plain vanilla eval and load as well, but if you leave off the nargout=0 it either errors out or gives you the output of the file in python directly.
Both of these work.
eng.eval('load example.mat', nargout=0)
eng.load('example.mat', nargout=0)

Reading 1-D Variables in WRF NetCDF files with GDAL python

My question is simple.
With an wrfout file "out.nc" for example.
The file contain Geo2D, Geo3D and 1D variables.
Using GDAL package in Python 2.7, I can extract the Geo2D variables easily like this:
## T2 is 2-d variable means temperature 2 m above the ground
temp = gdal.Open('NETCDF:"'+"out.nc"+'":T2')
But when I want to use this code to extract 1d array, it failed.
## Time is 1-d array represent the timeseries throught the simulation period
time = gdal.Open('NETCDF:"'+"out.nc"+'":Time')
Nothing happened! Wish some one offer some advice to read any-dimension of WRF output variables easyily!
You can also use the NetCDF reader in scipy.io:
import scipy.io.netcdf as nc
# Open a netcdf file object and assign the data values to a variable
time = nc.netcdf_file('out.nc', 'r').variables['Time'][:]
It has the benefit of scipy being a very popular and widely installed package, while working similar to opening files in some respects.

Saving python data for an application

I need to save multiple numpy arrays along with the user input that was used to compute the data these arrays contain in a single file. I'm having a hard time finding a good procedure to use to achieve this or even what file type to use. The only thing i can think of is too put the computed arrays along with the user input into one single array and then save it using numpy.save. Does anybody know any better alternatives or good file types for my use?
You could try using Pickle to serialize your arrays.
How about using pickle and then storing pickled array objects in a storage of your choice, like database or files?
I had this problem long ago so i dont have the code near to show you, but i used a binary write in a tmp file to get that done.
EDIT: Thats is, pickle is what i used. Thanks SpankMe and RoboInventor
Numpy provides functions to save arrays to files, e.g. savez():
outfile = '/tmp/data.dat'
x = np.arange(10)
y = np.sin(x)
np.savez(outfile, x=x, y=y)
npzfile = np.load(outfile)
print npzfile['x']
print npzfile['y']

copy netcdf file using python

I would like to make a copy of netcdf file using Python.
There are very nice examples of how to read or write netcdf-file, but perhaps there is also a good way how to make the input and then output of the variables to another file.
A good-simple method would be nice, in order to get the dimensions and dimension variables to the output file with the lowest cost.
I found the answer to this question at python netcdf: making a copy of all variables and attributes but one, but I needed to change it to work with my version of python/netCDF4 (Python 2.7.6/1.0.4). If you needed to add or subtract elements, you would make the appropriate modifications.
import netCDF4 as nc
def create_file_from_source(src_file, trg_file):
src = nc.Dataset(src_file)
trg = nc.Dataset(trg_file, mode='w')
# Create the dimensions of the file
for name, dim in src.dimensions.items():
trg.createDimension(name, len(dim) if not dim.isunlimited() else None)
# Copy the global attributes
trg.setncatts({a:src.getncattr(a) for a in src.ncattrs()})
# Create the variables in the file
for name, var in src.variables.items():
trg.createVariable(name, var.dtype, var.dimensions)
# Copy the variable attributes
trg.variables[name].setncatts({a:var.getncattr(a) for a in var.ncattrs()})
# Copy the variables values (as 'f4' eventually)
trg.variables[name][:] = src.variables[name][:]
# Save the file
trg.close()
create_file_from_source('in.nc', 'out.nc')
This snippet has been tested.
If you want to only use the netCDF-4 API to copy any netCDF-4 file, even those with variables that use arbitrary user-defined types, that's a difficult problem. The netCDF4 module at netcdf4-python.googlecode.com currently lacks support for compound types that have variable-length members or variable-length types of a compound base type, for example.
The nccopy utility that is available with the netCDF-4 C distribution shows it is possible to copy an arbitrary netCDF-4 file using only the C netCDF-4 API, but that's because the C API fully supports the netCDF-4 data model. If you limit your goal to copying netCDF-4 files that only use flat types supported by the googlecode module, the algorithm used in nccopy.c should work fine and should be well-suited to a more elegant implementation in Python.
A less ambitious project that would be even easier is a Python program that would copy any netCDF "classic format" file, because the classic model supported by netCDF-3 has no user-defined types or recursive types. This program would even work for netCDF-4 classic model files that also use performance features such as compression and chunking.
Since I discovered xarray, this has been my go-to tool for everything python+netCDF related
You can easily copy a netcdf file, for example:
import xarray as xr
input = xr.open_dataset('ncfile.nc')
input.to_netcdf('copy_of_ncfile.nc')
If you are on Linux or macOS, this can be achived easily with nctoolkit (https://nctoolkit.readthedocs.io/en/latest/installing.html).
import nctoolkit as nc
data = nc.open_data("infile.nc")
data.to_nc("outfile.nc")
see How do I copy a file in python?: a netcdf file is not different from any other file so it should fit your needs

Wrap a C program in Python that reads custom file into a 2d array

I have a stand-alone c program that takes a char* file name, opens the file, reads and decodes it into a 2d array. We do not know the length of the array until the file is read. The program mallocs memory.
I would like to have a python extension that returns a 2d numpy integer array, given the file name:
a = readFile("theFileName.dat")
I would like to have python manage the memory once the array is returned.
In numpy.i, is there a directive defined I can use with %apply
Is cython better suited for this?
Other suggestion?
Copying data is OK since the files are not very large.
SIP (here too) can be used to create Python bindings for C libraries.
But that's probably an overkill; it would probably be easier to read/decode your .dat file in Python itself.

Categories

Resources