I created a 1X20 struct in Matlab. This struct has 9 fields. The struct is saved in -v7.3 version because of its dimensions (about 3 Giga). one of the fields contains 4D matrix, other contain cell arrays, meaning it is a complex struct.
I would like to know if there is a way to load this struct into Python?
MATLAB v7.3 uses HDF5 storage; scipy.io.loadmat cannot handle that
MATLAB: Differences between .mat versions
Instead you have to use numpy plus h5py
How to read a v7.3 mat file via h5py?
how to read Mat v7.3 files in python ?
and a scattering of more recent questions.
Try that, and come back with a new question it you still have problems sorting out the results.
Related
Since Numpy arrays map to C arrays and MonetDB is using C arrays as its storage model, is it possible to load data from in-memory Numpy arrays into MonetDB? This would save a round-trip to disk, i.e. writing the data from the Numpy array to disk and bulk loading it from disk into MonetDB. I'm aware of embedded Python in MonetDB but I'd rather have embedded MonetDB in Python.
The official MonetDBLite for Python implementation supports this. See the examples for inserting data. https://www.monetdb.org/blog/monetdblite-for-python
I have a list of 42000 numpy arrays (each array is 240x240) that I want to save to a file for use in another python script.
I've tried using pickle and numpy.savez_compressed and I run into Memory Errors (I have 16gb DDR3). I read that hdf5 which is commonly used for deep learning stuff cannot save lists so I'm kind of stuck.
Does anyone have any idea how I can save my data?
EDIT: I previously saved this data into a numpy array onto disk using np.save and it was around 2.3GB but my computer couldn't always handle it so it would sometimes crash if I tried to process it. I read lists might be better so I have moved to using lists of numpy arrays
Assume we have a list of numpy arrays, A, and wish to save these sequentially to a HDF5 file.
We can use the h5py library to create datasets, with each dataset corresponding to an array in A.
import h5py, numpy as np
A = [arr1, arr2, arr3] # each arrX is a numpy array
with h5py.File('file.h5', 'w', libver='latest') as f: # use 'latest' for performance
for idx, arr in enumerate(A):
dset = f.create_dataset(str(idx), shape=(240, 240), data=arr, chunks=(240, 240)
compression='gzip', compression_opts=9)
I use gzip compression here for compatibility reasons, since it ships with every HDF5 installation. You may also wish to consider blosc & lzf filters. I also set chunks equal to shape, under the assumption you intend to read entire arrays rather than partial arrays.
The h5py documentation is an excellent resource to improve your understanding of the HDF5 format, as the h5py API follows the C API closely.
I am working with information from big models, which means I have a lot of big ascii files with two float columns (lets say X and Y). However, whenever I have to read these files it takes a long time, so I thought maybe converthing them to binary files will make the reading process much faster.
I converted my asciifiles into binary files using the uu.encode(ascii_file,binary_file) command, and it worked quite well (Actually, tested the decode part and I recovered the same files).
My question is: is there anyway to read the binary files directly into python and get the data into two variables (x and y)?
Thanks!
You didn't specify how your float columns are represented in Python. The cPickle module is a fast general solution, with the drawback that it creates files readable only from Python, and that it should never be allowed to read untrusted data (received from the network). It is likely to just work with all regular datatypes, including numpy arrays.
If you can use numpy and store your data in numpy arrays, look into numpy.save and numpy.savetxt and the corresponding loading functions, which should offer performance superior to manually extracting the data.
array.array also has methods for writing array data to file, with the drawback that the array data is written in the native format and cannot be read from a different architecture.
Check out python's struct module. It's probably what you'd want to be using for reading and writing your data.
I suggest that instead of the suggested struct module, if your model is just floats/doubles (coordinates), you should see the array module, must be much faster than any ops in the struct module. The downside of it is that the collection is homogenous, you need to have first values in odd indexes, second ones in even indexes, or sequentially.
I'm trying to load a MAT file that is a cell array of structs. Each of those structs have many fields, some of which are themselves cells.
A typical call would be:
myCell{1}.myStructField{1}.myStructField
How do I load such a nested structure into Python?
Thanks for your thoughts.
scipy.io.loadmat will load the mat file if it's pre-v7.3; you can then access it like matfile['myCell'][0]['myStructField'][0]['myStructField'].
If it's v7.3 or higher, you can use h5py; after opening it, I think it'll also be f['myCell'][0]['myStructField'][0]['myStructField'], though you'll need to worry about possibly transposing the matrices because of column-major / row-major differences.
I have an existing hdf5 file with three arrays, i want to extract one of the arrays using h5py.
h5py already reads files in as numpy arrays, so just:
with h5py.File('the_filename', 'r') as f:
my_array = f['array_name'][()]
The [()] means to read the entire array in; if you don't do that, it doesn't read the whole data but instead gives you lazy access to sub-parts (very useful when the array is huge but you only need a small part of it).
For this question it is way overkill but if you have a lot of things like this to do I use a package SpacePy that makes some of this easier.
datamodel.fromHDF5() documentation This returns a dictionary of arrays stored in a similar way to how h5py handles data.