"Converting" Numpy arrays to Matlab and vice versa - python

I am looking for a way to pass NumPy arrays to Matlab.
I've managed to do this by storing the array into an image using scipy.misc.imsave and then loading it using imread, but this of course causes the matrix to contain values between 0 and 256 instead of the 'real' values.
Taking the product of this matrix divided by 256, and the maximum value in the original NumPy array gives me the correct matrix, but I feel that this is a bit tedious.
is there a simpler way?

Sure, just use scipy.io.savemat
As an example:
import numpy as np
import scipy.io
x = np.linspace(0, 2 * np.pi, 100)
y = np.cos(x)
scipy.io.savemat('test.mat', dict(x=x, y=y))
Similarly, there's scipy.io.loadmat.
You then load this in matlab with load test.
Alteratively, as #JAB suggested, you could just save things to an ascii tab delimited file (e.g. numpy.savetxt). However, you'll be limited to 2 dimensions if you go this route. On the other hand, ascii is the universial exchange format. Pretty much anything will handle a delimited text file.

A simple solution, without passing data by file or external libs.
Numpy has a method to transform ndarrays to list and matlab data types can be defined from lists. So, when can transform like:
np_a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
mat_a = matlab.double(np_a.tolist())
From matlab to python requires more attention. There is no built-in function to convert the type directly to lists. But we can access the raw data, which isn't shaped, but plain. So, we use reshape (to format correctly) and transpose (because of the different way MATLAB and numpy store data). That's really important to stress: Test it in your project, mainly if you are using matrices with more than 2 dimensions. It works for MATLAB 2015a and 2 dims.
np_a = np.array(mat_a._data.tolist())
np_a = np_a.reshape(mat_a.size).transpose()

Here's a solution that avoids iterating in python, or using file IO - at the expense of relying on (ugly) matlab internals:
import matlab
# This is actually `matlab._internal`, but matlab/__init__.py
# mangles the path making it appear as `_internal`.
# Importing it under a different name would be a bad idea.
from _internal.mlarray_utils import _get_strides, _get_mlsize
def _wrapper__init__(self, arr):
assert arr.dtype == type(self)._numpy_type
self._python_type = type(arr.dtype.type().item())
self._is_complex = np.issubdtype(arr.dtype, np.complexfloating)
self._size = _get_mlsize(arr.shape)
self._strides = _get_strides(self._size)[:-1]
self._start = 0
if self._is_complex:
self._real = arr.real.ravel(order='F')
self._imag = arr.imag.ravel(order='F')
else:
self._data = arr.ravel(order='F')
_wrappers = {}
def _define_wrapper(matlab_type, numpy_type):
t = type(matlab_type.__name__, (matlab_type,), dict(
__init__=_wrapper__init__,
_numpy_type=numpy_type
))
# this tricks matlab into accepting our new type
t.__module__ = matlab_type.__module__
_wrappers[numpy_type] = t
_define_wrapper(matlab.double, np.double)
_define_wrapper(matlab.single, np.single)
_define_wrapper(matlab.uint8, np.uint8)
_define_wrapper(matlab.int8, np.int8)
_define_wrapper(matlab.uint16, np.uint16)
_define_wrapper(matlab.int16, np.int16)
_define_wrapper(matlab.uint32, np.uint32)
_define_wrapper(matlab.int32, np.int32)
_define_wrapper(matlab.uint64, np.uint64)
_define_wrapper(matlab.int64, np.int64)
_define_wrapper(matlab.logical, np.bool_)
def as_matlab(arr):
try:
cls = _wrappers[arr.dtype.type]
except KeyError:
raise TypeError("Unsupported data type")
return cls(arr)
The observations necessary to get here were:
Matlab seems to only look at type(x).__name__ and type(x).__module__ to determine if it understands the type
It seems that any indexable object can be placed in the ._data attribute
Unfortunately, matlab is not using the _data attribute efficiently internally, and is iterating over it one item at a time rather than using the python memoryview protocol :(. So the speed gain is marginal with this approach.

scipy.io.savemat or scipy.io.loadmat does NOT work for matlab arrays --v7.3. But the good part is that matlab --v7.3 files are hdf5 datasets. So they can be read using a number of tools, including numpy.
For python, you will need the h5py extension, which requires HDF5 on your system.
import numpy as np, h5py
f = h5py.File('somefile.mat','r')
data = f.get('data/variable1')
data = np.array(data) # For converting to numpy array

Some time ago I faced the same problem and wrote the following scripts to allow easy copy and pasting of arrays back and forth from interactive sessions. Obviously only practical for small arrays, but I found it more convenient than saving/loading through a file every time:
Matlab -> Python
Python -> Matlab

Not sure if it counts as "simpler" but I found a solution to move data from a numpy arrray created in a python script which is called by matlab quite fast:
dump_reader.py (python source):
import numpy
def matlab_test2():
np_a = numpy.random.uniform(low = 0.0, high = 30000.0, size = (1000,1000))
return np_a
dump_read.m (matlab script):
clear classes
mod = py.importlib.import_module('dump_reader');
py.importlib.reload(mod);
if count(py.sys.path,'') == 0
insert(py.sys.path,int32(0),'');
end
tic
A = py.dump_reader.matlab_test2();
toc
shape = cellfun(#int64,cell(A.shape));
ls = py.array.array('d',A.flatten('F').tolist());
p = double(ls);
toc
C = reshape(p,shape);
toc
It relies on the fact that matlabs double seems be working efficiently on arrays compared to cells/matrices. Second trick is to pass the data to matlabs double in an efficient way (via pythons native array.array).
P.S. sorry for necroposting but I struggled a lot with its and this topic was one of the closest hits. Maybe it helps someone to shorten the time of struggling.
P.P.S. tested with Matlab R2016b + python 3.5.4 (64bit)

The python library Darr allows you to save your Python numpy arrays in a self-documenting and widely readable format, consisting of just binary and text files. When saving your array, it will include code to read that array in a variety of languages, including Matlab. So in essence, it is just one line to save your numpy array to disk in Python, and then copy-paste the code from the README.txt to load it into Matlab.
Disclosure: I wrote the library.

From MATLAB R2022a on, matlab.double (and matlab.int8, matlab.uint8, etc.) objects implement the buffer protocol. This means that you can pass them into NumPy array constructors. Construction in the opposite direction (which is the subject of the question here) is supported as well. That is, matlab objects can be constructed from objects that implement the buffer protocol. Thus, for instance, a matlab.double can be constructed from a NumPy double array.
UPDATE: Furthermore, from MATLAB R2022b on, objects that implement the buffer protocol (such as NumPy objects) can be passed directly into MATLAB functions that are called via Python. From the MATLAB Release Notes for R2022b, under the "External Language Interfaces" section:
import matlab.engine
import numpy
eng = matlab.engine.start_matlab()
buf = numpy.array([[1, 2, 3], [4, 5, 6]], dtype='uint16')
# Supported in R2022a and earlier: must initialize a matlab.uint16 from
# the numpy array and pass it to the function
array_as_matlab_uint16 = matlab.uint16(buf)
res = eng.sum(array_as_matlab_uint16, 1, 'native')
print(res)
# Supported as of R2022b: can pass the numpy array
# directly to the function
res = eng.sum(buf, 1, 'native')
print(res)

Let use say you have a 2D daily data with shape (365,10) for five years saved in np array np3Darrat that will have a shape (5,365,10). In python save your np array:
import scipy.io as sio #SciPy module to load and save mat-files
m['np3Darray']=np3Darray #shape(5,365,10)
sio.savemat('file.mat',m) #Save np 3D array
Then in MATLAB convert np 3D array to MATLAB 3D matix:
load('file.mat','np3Darray')
M3D=permute(np3Darray, [2 3 1]); %Convert numpy array with shape (5,365,10) to MATLAB matrix with shape (365,10,5)

In latest R2021a, you can pass a python numpy ndarray to double() and it will convert to a native matlab matrix, even when calling in console the numpy array it will suggest at the bottom "Use double function to convert to a MATLAB array"

Related

Save 3D array into a stack of 2D images in Python

I made a 3D array, which consists of numbers(0~4). What I want is to save 3D array as a stack of 2D images(if possible, save *.tiff file). What am I supposed to do?
import numpy as np
a = np.random.randint(0,5, size=(100,100,100))
a = a.astype('int8')
Actually, I made it. This is my code.
With this code, I don't need to stack a series of 2D image(array).
Make a 3D array, and save it. That is just what I did for this.
import numpy as np
from skimage.external import tifffile as tif
a = np.random.randint(0,5, size=(100,100,100))
a = a.astype('int8')
tif.imsave('a.tif', a, bigtiff=True)
This should work. I haven't tested it but I have separated color images into RGB slices using this method and it should work pretty much the same way here, assuming you don't want to do anything with those pixel values first. (They will be very close to the same color in an image).
import imageio
import numpy as np
a = np.random.randint(0,5, size=(100,100,100))
a = a.astype('int8')
for i in range(100):
newimage = a[:, :, i]
imageio.imwrite("path/to/image%d.tiff" %i, newimage)
What exactly do you mean by "stack"? As you refer to tiff as output format, I assume here you want your data in one file as a multiframe-tiff.
This can easily be done with imageio's mimwrite() function:
# import numpy as np
# a = np.random.randint(0,5, size=(100,100,100))
# a = a.astype('int8')
import imageio
imageio.mimwrite("image.tiff", a)
Note that this function relies on having the counter for your several frames as first parameter and x and y follw. See also its documentation.
However, if I'm wrong and you want to have n (e.g. 100) separate tif-files, you can also use the normal imwrite() function in a loop:
n = len(a)
for i in range(n):
imageio.imwrite(f'image_{i:03}.tiff', a[i])

Convert 3D numpy Array in Python to 3D Matrix in Matlab [duplicate]

I am looking for a way to pass NumPy arrays to Matlab.
I've managed to do this by storing the array into an image using scipy.misc.imsave and then loading it using imread, but this of course causes the matrix to contain values between 0 and 256 instead of the 'real' values.
Taking the product of this matrix divided by 256, and the maximum value in the original NumPy array gives me the correct matrix, but I feel that this is a bit tedious.
is there a simpler way?
Sure, just use scipy.io.savemat
As an example:
import numpy as np
import scipy.io
x = np.linspace(0, 2 * np.pi, 100)
y = np.cos(x)
scipy.io.savemat('test.mat', dict(x=x, y=y))
Similarly, there's scipy.io.loadmat.
You then load this in matlab with load test.
Alteratively, as #JAB suggested, you could just save things to an ascii tab delimited file (e.g. numpy.savetxt). However, you'll be limited to 2 dimensions if you go this route. On the other hand, ascii is the universial exchange format. Pretty much anything will handle a delimited text file.
A simple solution, without passing data by file or external libs.
Numpy has a method to transform ndarrays to list and matlab data types can be defined from lists. So, when can transform like:
np_a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
mat_a = matlab.double(np_a.tolist())
From matlab to python requires more attention. There is no built-in function to convert the type directly to lists. But we can access the raw data, which isn't shaped, but plain. So, we use reshape (to format correctly) and transpose (because of the different way MATLAB and numpy store data). That's really important to stress: Test it in your project, mainly if you are using matrices with more than 2 dimensions. It works for MATLAB 2015a and 2 dims.
np_a = np.array(mat_a._data.tolist())
np_a = np_a.reshape(mat_a.size).transpose()
Here's a solution that avoids iterating in python, or using file IO - at the expense of relying on (ugly) matlab internals:
import matlab
# This is actually `matlab._internal`, but matlab/__init__.py
# mangles the path making it appear as `_internal`.
# Importing it under a different name would be a bad idea.
from _internal.mlarray_utils import _get_strides, _get_mlsize
def _wrapper__init__(self, arr):
assert arr.dtype == type(self)._numpy_type
self._python_type = type(arr.dtype.type().item())
self._is_complex = np.issubdtype(arr.dtype, np.complexfloating)
self._size = _get_mlsize(arr.shape)
self._strides = _get_strides(self._size)[:-1]
self._start = 0
if self._is_complex:
self._real = arr.real.ravel(order='F')
self._imag = arr.imag.ravel(order='F')
else:
self._data = arr.ravel(order='F')
_wrappers = {}
def _define_wrapper(matlab_type, numpy_type):
t = type(matlab_type.__name__, (matlab_type,), dict(
__init__=_wrapper__init__,
_numpy_type=numpy_type
))
# this tricks matlab into accepting our new type
t.__module__ = matlab_type.__module__
_wrappers[numpy_type] = t
_define_wrapper(matlab.double, np.double)
_define_wrapper(matlab.single, np.single)
_define_wrapper(matlab.uint8, np.uint8)
_define_wrapper(matlab.int8, np.int8)
_define_wrapper(matlab.uint16, np.uint16)
_define_wrapper(matlab.int16, np.int16)
_define_wrapper(matlab.uint32, np.uint32)
_define_wrapper(matlab.int32, np.int32)
_define_wrapper(matlab.uint64, np.uint64)
_define_wrapper(matlab.int64, np.int64)
_define_wrapper(matlab.logical, np.bool_)
def as_matlab(arr):
try:
cls = _wrappers[arr.dtype.type]
except KeyError:
raise TypeError("Unsupported data type")
return cls(arr)
The observations necessary to get here were:
Matlab seems to only look at type(x).__name__ and type(x).__module__ to determine if it understands the type
It seems that any indexable object can be placed in the ._data attribute
Unfortunately, matlab is not using the _data attribute efficiently internally, and is iterating over it one item at a time rather than using the python memoryview protocol :(. So the speed gain is marginal with this approach.
scipy.io.savemat or scipy.io.loadmat does NOT work for matlab arrays --v7.3. But the good part is that matlab --v7.3 files are hdf5 datasets. So they can be read using a number of tools, including numpy.
For python, you will need the h5py extension, which requires HDF5 on your system.
import numpy as np, h5py
f = h5py.File('somefile.mat','r')
data = f.get('data/variable1')
data = np.array(data) # For converting to numpy array
Some time ago I faced the same problem and wrote the following scripts to allow easy copy and pasting of arrays back and forth from interactive sessions. Obviously only practical for small arrays, but I found it more convenient than saving/loading through a file every time:
Matlab -> Python
Python -> Matlab
Not sure if it counts as "simpler" but I found a solution to move data from a numpy arrray created in a python script which is called by matlab quite fast:
dump_reader.py (python source):
import numpy
def matlab_test2():
np_a = numpy.random.uniform(low = 0.0, high = 30000.0, size = (1000,1000))
return np_a
dump_read.m (matlab script):
clear classes
mod = py.importlib.import_module('dump_reader');
py.importlib.reload(mod);
if count(py.sys.path,'') == 0
insert(py.sys.path,int32(0),'');
end
tic
A = py.dump_reader.matlab_test2();
toc
shape = cellfun(#int64,cell(A.shape));
ls = py.array.array('d',A.flatten('F').tolist());
p = double(ls);
toc
C = reshape(p,shape);
toc
It relies on the fact that matlabs double seems be working efficiently on arrays compared to cells/matrices. Second trick is to pass the data to matlabs double in an efficient way (via pythons native array.array).
P.S. sorry for necroposting but I struggled a lot with its and this topic was one of the closest hits. Maybe it helps someone to shorten the time of struggling.
P.P.S. tested with Matlab R2016b + python 3.5.4 (64bit)
The python library Darr allows you to save your Python numpy arrays in a self-documenting and widely readable format, consisting of just binary and text files. When saving your array, it will include code to read that array in a variety of languages, including Matlab. So in essence, it is just one line to save your numpy array to disk in Python, and then copy-paste the code from the README.txt to load it into Matlab.
Disclosure: I wrote the library.
From MATLAB R2022a on, matlab.double (and matlab.int8, matlab.uint8, etc.) objects implement the buffer protocol. This means that you can pass them into NumPy array constructors. Construction in the opposite direction (which is the subject of the question here) is supported as well. That is, matlab objects can be constructed from objects that implement the buffer protocol. Thus, for instance, a matlab.double can be constructed from a NumPy double array.
UPDATE: Furthermore, from MATLAB R2022b on, objects that implement the buffer protocol (such as NumPy objects) can be passed directly into MATLAB functions that are called via Python. From the MATLAB Release Notes for R2022b, under the "External Language Interfaces" section:
import matlab.engine
import numpy
eng = matlab.engine.start_matlab()
buf = numpy.array([[1, 2, 3], [4, 5, 6]], dtype='uint16')
# Supported in R2022a and earlier: must initialize a matlab.uint16 from
# the numpy array and pass it to the function
array_as_matlab_uint16 = matlab.uint16(buf)
res = eng.sum(array_as_matlab_uint16, 1, 'native')
print(res)
# Supported as of R2022b: can pass the numpy array
# directly to the function
res = eng.sum(buf, 1, 'native')
print(res)
Let use say you have a 2D daily data with shape (365,10) for five years saved in np array np3Darrat that will have a shape (5,365,10). In python save your np array:
import scipy.io as sio #SciPy module to load and save mat-files
m['np3Darray']=np3Darray #shape(5,365,10)
sio.savemat('file.mat',m) #Save np 3D array
Then in MATLAB convert np 3D array to MATLAB 3D matix:
load('file.mat','np3Darray')
M3D=permute(np3Darray, [2 3 1]); %Convert numpy array with shape (5,365,10) to MATLAB matrix with shape (365,10,5)
In latest R2021a, you can pass a python numpy ndarray to double() and it will convert to a native matlab matrix, even when calling in console the numpy array it will suggest at the bottom "Use double function to convert to a MATLAB array"

Result of 3D FFT using pyculib is wrong

I use pyculib to perform 3D FFT on a matrix in Anaconda 3.5. I just followed the example code posted in the website. But I found something interesting and don't understand why.
Performing a 3D FFT on matrix with pyculib is correct only when using numpy.arange to create the matrix.
Here is the code:
from pyculib.fft.binding import Plan, CUFFT_C2C
import numpy as np
from numba import cuda
data = np.random.rand(26, 256, 256).astype(np.complex64)
orig = data.copy()
d_data = cuda.to_device(data)
fftplan = Plan.three(CUFFT_C2C, *data.shape)
fftplan.forward(d_data, d_data)
fftplan.inverse(d_data, d_data)
d_data.copy_to_host(data)
result = data / n
np.allclose(orig, result.real)
Finally, it turns out to be False. And the difference between orig and result is not a small number, not negligible.
I tried some other data sets (not random numbers), and get the some wrong results.
Also, I test without inverse FFT:
from pyculib.fft.binding import Plan, CUFFT_C2C
import numpy as np
from numba import cuda
from scipy.fftpack import fftn,ifftn
data = np.random.rand(26,256,256).astype(np.complex64)
orig = data.copy()
orig_fft = fftn(orig)
d_data = cuda.to_device(data)
fftplan = Plan.three(CUFFT_C2C, *data.shape)
fftplan.forward(d_data, d_data)
d_data.copy_to_host(data)
np.allclose(orig_fft, data)
The result is also wrong.
The test code on website, they use numpy.arange to create the matrix. And I tried:
n = 26*256*256
data = np.arange(n, dtype=np.complex64).reshape(26,256,256)
And the FFT result of this matrix is right.
Could anyone help to point out why?
I don't use CUDA, but I think your problem is numerical in nature. The difference lies in the two data sets you are using. random.rand has dynamic range of 0-1, and arange 0-26*256*256. The FFT attempts to resolve spatial frequency components on the order of range of values / number of points. For arange, this becomes unity, and the FFT is numerically accurate. For rand, this is 1/26*256*256 ~ 5.8e-7.
Just running FFT/IFFT on your numpy arrays without using CUDA shows similar differences.

sort/lexsort/bincount/etc. for arrays that don't fit in memory

I'm trying to scale up a library written in numpy so that it can process arrays that don't fit in memory (~10 arrays of 10 billion elements)
hdf5 (h5py) was a temporary solution, but I rely heavily on sorting and indexing (b = a[a>5]), which are both not available in h5py and are a pain to write.
Is there a library that would made these tools available?
Specifically I need basic math, sort, lexsort, argsort, bincount, np.diff, and indexing (both boolean and with the array of indices).
PyTables is designed precisely for this (also based on hdf5). First store your array to disk
import numpy as np
import tables as tb
# Write big numpy array to disk
rows, cols = 80000000, 2
h5file = tb.open_file('test.h5', mode='w', title="Test Array")
root = h5file.root
array_on_disk = h5file.create_carray(root,
'array_on_disk',tb.Float64Atom(),shape=(rows,cols))
# Fill part of the array
rand_array = np.random.rand(1000)
array_on_disk[10055:11055] = rand_array
array_on_disk[12020:13020] = 2.*rand_array
h5file.close()
Then perform your computation directly on the array (or part of it) contained in the file
h5file = tb.open_file('disk_array.h5', mode='r')
print h5file.root.array_on_disk[10050:10065,0]
# in-place sort
h5file.root.array_on_disk[100000:10000000,:].sort(axis=0)
h5file.close()

The equivalence of Matlab sprand() in Python?

I am trying to translate a Matlab code snippet into a Python one. However, I am not very sure how to correctly implement the sprand() function.
This is how the Matlab code use sprand():
% n_z is an integer, n_dw is a matrix
n_p_z_dw = cell(n_z, 1); % n(d,w) * p(z|d,w)
for z = 1:n_z
n_p_z_dw{z} = sprand(n_dw);
And this is how I implement the above logic in Python:
n_p_z_dw = [None]*n_z # n(d,w) * p(z|d,w)
density = np.count_nonzero(n_dw)/float(n_dw.size)
for i in range(0, n_z):
n_p_z_dw[i] = scipy.sparse.rand(n_d, n_w, density=density)
It seems to work, but I am not very sure about this. Any comment or suggestion?
The following should be a relatively fast way, I think, for a sparse array A:
import scipy.sparse as sparse
import numpy as np
sparse.coo_matrix((np.random.rand(A.nnz),A.nonzero()),shape=A.shape)
This will construct a COO format sparse matrix: it uses A.nonzero() as the coordinates, and A.nnz (the number of nonzero entries in A) to find the number of random numbers to generate.
I wonder, though, whether this might be a useful addition to the scipy.sparse.rand function.

Categories

Resources