Matlab cell2mat function in Python Numpy? - python

Does numpy have the cell2mat function? Here is the link to matlab. I found an implementation of something similar but it only works when we can split it evenly. Here is the link.

In a sense Python has had 'cells' at lot longer than MATLAB - list. a python list is a direct substitute for a 1d cell (or rather, cell with size 1 dimension). A 2d cell could be represented as a nested list. numpy arrays with dtype object also work. I believe that is what scipy.io.loadmat uses to render cells in .mat files.
np.array() converts a list, or lists of lists, etc, to a ndarray. Sometimes it needs help specifying the dtype. It also tries to render the input to as high a dimensional array as possible.
np.array([1,2,3])
np.array(['1',2,'abc'],dtype=object)
np.array([[1,2,3],[1,2],[3]])
np.array([[1,2],[3,4]])
And MATLAB structures map onto Python dictionaries or objects.
http://docs.scipy.org/doc/scipy/reference/generated/scipy.io.loadmat.html
loadmat can also represent structures as numpy structured (record) arrays.
There is np.concatenate that takes a list of arrays, and its convenience derivatives vstack, hstack, dstack. Mostly they tweak the dimensions of the arrays, and then concatenate on one axis.
Here's a rough approximation to the MATLAB cell2mat example:
C = {[1], [2 3 4];
[5; 9], [6 7 8; 10 11 12]}
construct ndarrays with same shapes
In [61]: c11=np.array([[1]])
In [62]: c12=np.array([[2,3,4]])
In [63]: c21=np.array([[5],[9]])
In [64]: c22=np.array([[6,7,8],[10,11,12]])
Join them with a combination of hstack and vstack - i.e. concatenate along the matching axes.
In [65]: A=np.vstack([np.hstack([c11,c12]),np.hstack([c21,c22])])
# or A=np.hstack([np.vstack([c11,c21]),np.vstack([c12,c22])])
producing:
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
Or more generally (and compactly)
In [75]: C=[[c11,c12],[c21,c22]]
In [76]: np.vstack([np.hstack(c) for c in C])

I usually use object arrays as a replacement for Matlab's cell arrays. For example:
cell_array = np.array([[np.arange(10)],
[np.arange(30,40)] ],
dtype='object')
Is a 2x1 object array containing length 10 numpy array vectors. I can perform the cell2mat functionality by:
arr = np.concatenate(cell_array).astype('int')
This returns a 2x10 int array. You can change .astype('int') to be whatever data type you need, or you could grab it from one of the objects in your cell_array,
arr = np.concatenate(cell_array).astype(cell_array[0].dtype)
Good luck!

Related

Calling .values on a column of lists returns an object array

I have a pandas data frame where the entries of one column are numpy arrays, all of length N. For some operations (eg. masking certain values before averaging) I would like to extract the values into an array, so that I have an object of shape (len(indices), N).
However, when I look at the .values object, it has shape (len(indices),), and then its first element has length N.
f = pd.DataFrame([[1,np.array([1,3,4])],[2,np.array([1,2,4])]], index=[2,5], columns=['sth','sth else'])
print np.shape(f['sth else'].values)
I presume this is a numpy question because there should be a way to reshape this array, but I don't know how to address this. I can of course write a for-loop and extract all individual subarrays, but was wondering if there was something more elegant that works.
Edit:
I would like to perform mask operations on the set of values for a certain key, i.e. something like this:
import numpy.ma as ma
print ma.masked_equal(f['sth else'].values,1)
which doesn't work, presumably because the array structure of f['sth else'].values is not good for it. The following does work:
ma.masked_equal(np.array([np.array([ 1., 3., 4.]) ,np.array([ 1., 2., 4.])]),1)
Listify your column and then convert. Otherwise, you have an array of arrays with dtype=object and it's a little hard to come back from there.
np.array(f['sth else'].values.tolist())
array([[1, 3, 4],
[1, 2, 4]])
If this doesn't work, that means you have ragged lists (unequal length) and numpy cannot construct a contiguous integer/float array in memory for you (so will fall back to a slower, python implementation).

Scipy CSR sparse matrix is actually COO?

I've been recently dealing with sparse matrices. My aim is to somehow convert an adjacency list for a graph into the CSR format, defined here: http://devblogs.nvidia.com/parallelforall/wp-content/uploads/2014/07/CSR.png.
One possible option I see, is that I simply first construct a NumPy matrix and convert it using scipy.sparse.csr_matrix. The problem is, that the CSR in SciPy is somewhat different to the one discussed in the link. My question is, is this just a discrepancy, and I need to write my own parser, or can SciPy in fact convert into CSR defined in the link.
A bit more about the problem, let's say I have a matrix:
matrix([[1, 1, 0],
[0, 0, 1],
[1, 0, 1]])
CSR format for this consists of two arrays, Column(C) and row(R). And i strive for looks like:
C: [0,1,2,0,2]
R: [0,2,3,5]
SciPy returns the:
(0, 0) 1
(0, 1) 1
(1, 2) 1
(2, 0) 1
(2, 2) 1
where second column is the same as my C, yet this is to my understanding the COO format, not the CSR. (this was done using csr_matrix(adjacency_matrix) function).
There is a difference in what is stored internally and what you see when you simply print the matrix via print(A) (where A is a csr_matrix).
In the documentation the attributes are listed. Among others there are the following three attributes:
data CSR format data array of the matrix
indices CSR format index array of the matrix
indptr CSR format index pointer array of the matrix
You can access (and manipulate) them through A.data, A.indices and A.indptr.
Bottom line: The CSR format in scipy is a "real" CSR format and you do not need to write your own parser (as long as you don't care about the in your case unnecessary data array).
Also note: A matrix in CSR format is always represented by three arrays, not two.

Insert numpy array to an empty numpy array

I am trying to create an empty numpy array and then insert newly created arrays into than one. It is important for me not to shape the first numpy array and it has to be empty and then I can be able to add new numpy arrays with different sizes into that one. Something like the following:
A = numpy.array([])
B = numpy.array([1,2,3])
C = numpy.array([5,6])
A.append(B, axis=0)
A.append(C, axis=0)
and I want A to look like this:
[[1,2,3],[5,6]]
When I do the append command I get the following error:
AttributeError: 'numpy.ndarray' object has no attribute 'append'
Any idea how this can be done?
PS: This is not similar to the questions asked before because I am not trying to concatenate two numpy arrays. I am trying to insert a numpy array to another empty numpy array. I know how to do this using lists but it has to be numpy array.
Thanks
You can't do that with numpy arrays, because a real 2D numpy is rectangular. For example, np.arange(6).reshape(2,3) return array([[0, 1, 2],[3, 4, 5]]).
if you really want to do that, try array([array([1,2,3]),array([5,6])]) which create array([array([1, 2, 3]), array([5, 6])], dtype=object) But you will loose all the numpy power with misaligned data.
You can do this by converting the arrays to lists:
In [21]: a = list(A)
In [22]: a.append(list(B))
In [24]: a.append(list(C))
In [25]: a
Out[25]: [[1, 2, 3], [5, 6]]
My intuition is that there's a much better solution (either more pythonic or more numpythonic) than this, which might be gleaned from a more complete description of your problem.
Taken from here. Maybe search for existing questions first.
numpy.append(M, a)

Numpy - 'nested' array operations and invalid slice error

I am trying to use indices stored in one set of arrays (indexPositions) to perform a simple array operation using a matrix. It is easier to explain with an example
u[(indexPositions[:,1]):(indexPositions[:,2]),(indexPositions[:,0])]=0
The object u is a big matrix whose values I want to set to zero for a given region of space. indexPositions[:,1] contains the 'lower bound' indices and indexPositions[:,2] contains the 'upper bound' indices. This reflects the fact that I want to set to zero anything in between them and therefore want to iterate between these indices.
indexPositions[:,0] contains the column index for which the aforementioned range of rows must be set to zero.
I do not understand why it is not possible to do this (I hope its clear what I'm trying to achieve). I'm sure it has something to do with python not understanding what order its supposed to do these operations in. Is there a way of specifying this? The matrix is quite huge and these operations are happening many many times so I really don't want to use a slow python loop.
Just to make sure we are talking about the same thing, I'll create a simple example:
In [77]: u=np.arange(16).reshape(4,4)
In [78]: I=np.array([[0,2,3],[1,4,2]])
In [79]: i=0
In [80]: u[I[i,0]:I[i,1],I[i,2]]
Out[80]: array([3, 7])
In [85]: i=1
In [86]: u[I[i,0]:I[i,1],I[i,2]]
Out[86]: array([ 6, 10, 14])
I'm using different column order for I, but that doesn't matter.
I selecting 2 elements from the 4th column, and 3 from the 3rd. Different lengths of results suggests that I'll have problems operation with both rows of I at once. I might have to operate on a flattened view of u.
In [93]: [u[slice(x,y),z] for x,y,z in I]
Out[93]: [array([3, 7]), array([ 6, 10, 14])]
If the lengths of the slices are all the same it's more likely that I'd be able to do all with out a loop on I rows.
I'll think about this some more, but I just want to make sure I understood the problem, and why it might be difficult.
1u[I[:,0]:I[:,1],I[:,2]] with : in the slice is defintely going to be a problem.
In [90]: slice(I[:,0],I[:,1])
Out[90]: slice(array([0, 1]), array([2, 4]), None)
Abstractly a slice object accepts arrays or lists, but the numpy indexing does not. So instead of one complex slice, you have to create 2 or more simple ones.
In [91]: [slice(x,y) for x,y in I[:,:2]]
Out[91]: [slice(0, 2, None), slice(1, 4, None)]
I've answered a similar question, one where the slice starts came from a list, but all slices had the same length. i.e. 0:3 from the 1st row, 2:5 from the 2nd, 4:7 from the 3rd etc.
Access multiple elements of an array
How can I select values along an axis of an nD array with an (n-1)D array of indices of that axis?
If the slices are all the same length, then it is possible to use broadcasting to construct the indexing arrays. But in the end the indexing will still be with arrays, not slices.
Fast slicing of numpy array multiple times
Numpy Array Slicing
deal with taking multiple slices from a 1d array, slices with differing offsets and lengths. Your problem could, I think, be cast that way. The alterantives considered all require a list comprehension to construct the slice indexes. The indexes can then be concatenated, followed by one indexing operation, or alteratively, index multiple times and concanentate the results. Timings vary with the number and length of the slices.
An example, adapted from those earlier questions, of constructing a flat index list is:
In [130]: il=[np.arange(v[0],v[1])+v[2]*u.T.shape[1] for v in I]
# [array([12, 13]), array([ 9, 10, 11])]
In [132]: u.T.flat[np.concatenate(il)]
# array([ 3, 7, 6, 10, 14])
Same values as my earlier examples, but in 1 list, not 2.
If the slice arrays have same length, then we can get back an array
In [145]: I2
Out[145]:
array([[0, 2, 3],
[1, 3, 2]])
In [146]: il=np.array([np.arange(v[0],v[1]) for v in I2])
In [147]: u[il,I2[:,2]]
Out[147]:
array([[ 3, 6],
[ 7, 10]])
In this case, il = I2[:,[0]]+np.arange(2) could be used to construct the 1st indexing array instead of the list comprehension (this is the broadcasting I mentioned earlier).

Difference in shapes of numpy array

For the array:
import numpy as np
arr2d = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> arr2d
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
>>> arr2d[2].shape
(3,)
>>> arr2d[2:,:].shape
(1, 3)
Why do I get different shapes when both statements return the 3rd row? and shouldn't the result be (1,3) in both cases since we are returning a single row with 3 columns?
Why do I get different shapes when both statements return the 3rd row?
Because with the first operation you are indexing the rows, and selecting just ONE element, which -as mentioned in the single-element indexing paragraph of a multidimensional array- returns an array with a lower dimension (a 1D array).
In the 2nd example, you are using a slice as evident by the colon. Slicing operations do not reduce the dimensions of an array. This is also logical, because imagine the array would not have 3 but 4 rows. Then arr2d[2:,:].shape would be (2,3). The developers of numpy made slicing operations consistent and therefor they (slices) never reduce the number of dimensions of the array.
and shouldn't the result be (1,3) in both cases since we are returning a single row with 3 columns?
No, just because of the previous reasons.
When doing arr2d[2], you are taking a row out of the array;
While when doing arr2d[2:, :], you are taking a subset of rows out of the array ('slicing'), in this case being the rows starting from the 3rd to the end, which is only the 3rd, but it didn't change that you are taking a subset, not an element.

Categories

Resources