Accessing columns of matlab matrix and its realization in numpy - python

I am trying to find a realization of accessing elements of numpy arrays corresponding to a feature of Matlab.
Suppose given a (2,2,2) Matlab matrix m in the form
m(:,:,1) = [1,2;3,4]
m(:,:,2) = [5,6;7,8]
Even though this is a 3-d array, Matlab allows accessing its column in the fashion like
m(:,1) = [1;3]
m(:,2) = [2;4]
m(:,3) = [5;7]
m(:,4) = [6;8]
I am curious to know that if numpy supports such indexing so that given the following array
m = array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
One can also access columns in the fashion as Matlab listed above.

My answer to this question is as following, suppose given the array listed as in the question
m = array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
One can create a list, I call it m_list in the form such that
m_list = [m[i][:,j] for i in range(m.shape[0]) for j in range(m.shape[-1])]
This will output m_list in the form such that
m_list = [array([1, 3]), array([2, 4]), array([7, 9]), array([ 8, 10])]
Now we can access elements of m_list exactly as the fashion as Matlab as listed in the question.

In [41]: m = np.arange(1,9).reshape(2,2,2)
In [42]: m
Out[42]:
array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
Indexing the equivalent blocks:
In [47]: m[0,:,0]
Out[47]: array([1, 3])
In [48]: m[0,:,1]
Out[48]: array([2, 4])
In [49]: m[1,:,0]
Out[49]: array([5, 7])
In [50]: m[1,:,1]
Out[50]: array([6, 8])
We can reshape, to "flatten" one pair of dimensions:
In [84]: m = np.arange(1,9).reshape(2,2,2)
In [85]: m.reshape(2,4)
Out[85]:
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
In [87]: m.reshape(2,4)[:,2]
Out[87]: array([3, 7])
and throw in a transpose:
In [90]: m.transpose(1,0,2).reshape(2,4)
Out[90]:
array([[1, 2, 5, 6],
[3, 4, 7, 8]])
MATLAB originally was strictly 2d. Then sometime around v3.9 (2000) they allowed for more, but in a kludgy way. They added a way to index the trailing dimension as though it was multidimensional. In another recent SO I noticed that when reshaping to (2,2,1,1) the result remained (2,2). Trailing size 1 dimensions are squeeze out.
I suspect the m(:,3) is a consequence of that as well.
Testing a 4d MATLAB
>> m=reshape(1:36,2,3,3,2);
>> m(:,:,1)
ans =
1 3 5
2 4 6
>> reshape(m,2,3,6)(:,:,1)
ans =
1 3 5
2 4 6
>> m(:,17)
ans =
33
34
>> reshape(m,2,18)(:,17)
ans =
33
34

Related

numpy array slicing index

import numpy as np
a=np.array([ [1,2,3],[4,5,6],[7,8,9]])
How can I get zeroth index column? Expecting output [[1],[2],[3]] a[...,0] gives 1D array. Maybe next question answers this question.
How to get last 2 columns of a? a[...,1:2] gives second column only, a[...,2:3] gives last 2 columns, but a[...,3] is invalid dimension. So, how does it work?
By the way, operator ... and : have same meaning? a[...,0] and a[:,0] give same output. Can someone comment here?
numpy indexing is built on python list conventions, but extended to multi-dimensions and multi-element indexing. It is powerful, but complex, but sooner or later you should read a full indexing documentation, one that distinguishes between 'basic' and 'advanced' indexing.
Like range and arange, slice index has a 'open' stop value
In [111]: a = np.arange(1,10).reshape(3,3)
In [112]: a
Out[112]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
Indexing with a scalar reduces the dimension, regardless of where:
In [113]: a[1,:]
Out[113]: array([4, 5, 6])
In [114]: a[:,1]
Out[114]: array([2, 5, 8])
That also means a[1,1] returns 5, not np.array([[5]]).
Indexing with a slice preserves the dimension:
In [115]: a[1:2,:]
Out[115]: array([[4, 5, 6]])
so does indexing with a list or array (though this makes a copy, not a view):
In [116]: a[[1],:]
Out[116]: array([[4, 5, 6]])
... is a generalized : - use as many as needed.
In [117]: a[...,[1]]
Out[117]:
array([[2],
[5],
[8]])
You can adjust dimensions with newaxis or reshape:
In [118]: a[:,1,np.newaxis]
Out[118]:
array([[2],
[5],
[8]])
Note that trailing : are automatic. a[1] is the same as a[1,:]. But leading ones must be explicit.
List indexing also removes a 'dimension/nesting layer'
In [119]: alist = [[1,2,3],[4,5,6]]
In [120]: alist[0]
Out[120]: [1, 2, 3]
In [121]: alist[0][0]
Out[121]: 1
In [122]: [l[0] for l in alist] # a column equivalent
Out[122]: [1, 4]
import numpy as np
a=np.array([ [1,2,3],[4,5,6],[7,8,9]])
a[:,0] # first colomn
>>> array([1, 4, 7])
a[0,:] # first row
>>> array([1, 2, 3])
a[:,0:2] # first two columns
>>> array([[1, 2],
[4, 5],
[7, 8]])
a[0:2,:] # first two rows
>>> array([[1, 2, 3],
[4, 5, 6]])

Numpy sort two arrays together with one array as the keys in axis 1 [duplicate]

I'm trying to get the indices to sort a multidimensional array by the last axis, e.g.
>>> a = np.array([[3,1,2],[8,9,2]])
And I'd like indices i such that,
>>> a[i]
array([[1, 2, 3],
[2, 8, 9]])
Based on the documentation of numpy.argsort I thought it should do this, but I'm getting the error:
>>> a[np.argsort(a)]
IndexError: index 2 is out of bounds for axis 0 with size 2
Edit: I need to rearrange other arrays of the same shape (e.g. an array b such that a.shape == b.shape) in the same way... so that
>>> b = np.array([[0,5,4],[3,9,1]])
>>> b[i]
array([[5,4,0],
[9,3,1]])
Solution:
>>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
array([[1, 2, 3],
[2, 8, 9]])
You got it right, though I wouldn't describe it as cheating the indexing.
Maybe this will help make it clearer:
In [544]: i=np.argsort(a,axis=1)
In [545]: i
Out[545]:
array([[1, 2, 0],
[2, 0, 1]])
i is the order that we want, for each row. That is:
In [546]: a[0, i[0,:]]
Out[546]: array([1, 2, 3])
In [547]: a[1, i[1,:]]
Out[547]: array([2, 8, 9])
To do both indexing steps at once, we have to use a 'column' index for the 1st dimension.
In [548]: a[[[0],[1]],i]
Out[548]:
array([[1, 2, 3],
[2, 8, 9]])
Another array that could be paired with i is:
In [560]: j=np.array([[0,0,0],[1,1,1]])
In [561]: j
Out[561]:
array([[0, 0, 0],
[1, 1, 1]])
In [562]: a[j,i]
Out[562]:
array([[1, 2, 3],
[2, 8, 9]])
If i identifies the column for each element, then j specifies the row for each element. The [[0],[1]] column array works just as well because it can be broadcasted against i.
I think of
np.array([[0],
[1]])
as 'short hand' for j. Together they define the source row and column of each element of the new array. They work together, not sequentially.
The full mapping from a to the new array is:
[a[0,1] a[0,2] a[0,0]
a[1,2] a[1,0] a[1,1]]
def foo(a):
i = np.argsort(a, axis=1)
return (np.arange(a.shape[0])[:,None], i)
In [61]: foo(a)
Out[61]:
(array([[0],
[1]]), array([[1, 2, 0],
[2, 0, 1]], dtype=int32))
In [62]: a[foo(a)]
Out[62]:
array([[1, 2, 3],
[2, 8, 9]])
The above answers are now a bit outdated, since new functionality was added in numpy 1.15 to make it simpler; take_along_axis (https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.take_along_axis.html) allows you to do:
>>> a = np.array([[3,1,2],[8,9,2]])
>>> np.take_along_axis(a, a.argsort(axis=-1), axis=-1)
array([[1 2 3]
[2 8 9]])
I found the answer here, with someone having the same problem. They key is just cheating the indexing to work properly...
>>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
array([[1, 2, 3],
[2, 8, 9]])
You can also use linear indexing, which might be better with performance, like so -
M,N = a.shape
out = b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
So, a.argsort(1)+(np.arange(M)[:,None]*N) basically are the linear indices that are used to map b to get the desired sorted output for b. The same linear indices could also be used on a for getting the sorted output for a.
Sample run -
In [23]: a = np.array([[3,1,2],[8,9,2]])
In [24]: b = np.array([[0,5,4],[3,9,1]])
In [25]: M,N = a.shape
In [26]: b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
Out[26]:
array([[5, 4, 0],
[1, 3, 9]])
Rumtime tests -
In [27]: a = np.random.rand(1000,1000)
In [28]: b = np.random.rand(1000,1000)
In [29]: M,N = a.shape
In [30]: %timeit b[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
10 loops, best of 3: 133 ms per loop
In [31]: %timeit b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
10 loops, best of 3: 96.7 ms per loop

Asymmetric slicing python

Consider the following matrix:
X = np.arange(9).reshape(3,3)
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
Let say I want to subset the following array
array([[0, 4, 2],
[3, 7, 5]])
It is possible with some indexing of rows and columns, for instance
col=[0,1,2]
row = [[0,1],[1,2],[0,1]]
Then if I store the result in a variable array I can do it with the following code:
array=np.zeros([2,3],dtype='int64')
for i in range(3):
array[:,i]=X[row[i],col[i]]
Is there a way to broadcast this kind of operation ? I have to do this as a data cleaning stage for a large file ~ 5 Gb, and I would like to use dask to parallelize it. But in a first time if I could avoid using a for loop I would feel great.
For arrays with NumPy's advanced-indexing, it would be -
X[row, np.asarray(col)[:,None]].T
Sample run -
In [9]: X
Out[9]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [10]: col=[0,1,2]
...: row = [[0,1],[1,2],[0,1]]
In [11]: X[row, np.asarray(col)[:,None]].T
Out[11]:
array([[0, 4, 2],
[3, 7, 5]])

numpy vectorize dimension increasing function

I would like to create a function that has input: x.shape==(2,2), and outputs y.shape==(2,2,3).
For example:
#np.vectorize
def foo(x):
#This function doesn't work like I want
return x,x,x
a = np.array([[1,2],[3,4]])
print(foo(a))
#desired output
[[[1 1 1]
[2 2 2]]
[[3 3 3]
[4 4 4]]]
#actual output
(array([[1, 2],
[3, 4]]), array([[1, 2],
[3, 4]]), array([[1, 2],
[3, 4]]))
Or maybe:
#np.vectorize
def bar(x):
#This function doesn't work like I want
return np.array([x,2*x,5])
a = np.array([[1,2],[3,4]])
print(bar(a))
#desired output
[[[1 2 5]
[2 4 5]]
[[3 6 5]
[4 8 5]]]
Note that foo is just an example. I want a way to map over a numpy array (which is what vectorize is supposed to do), but have that map take a 0d object and shove a 1d object in its place. It also seems to me that the dimensions here are arbitrary, as one might wish to take a function that takes a 1d object and returns a 3d object, vectorize it, call it on a 5d object, and get back a 7d object.... However, my specific use case only requires vectorizing a 0d to 1d function, and mapping it appropriately over a 2d array.
It would help, in your question, to show both the actual result and your desired result. As written that isn't very clear.
In [79]: foo(np.array([[1,2],[3,4]]))
Out[79]:
(array([[1, 2],
[3, 4]]), array([[1, 2],
[3, 4]]), array([[1, 2],
[3, 4]]))
As indicated in the vectorize docs, this has returned a tuple of arrays, corresponding to the tuple of values that your function returned.
Your bar returns an array, where as vectorize expected it to return a scalar (or single value):
In [82]: bar(np.array([[1,2],[3,4]]))
ValueError: setting an array element with a sequence.
vectorize takes an otypes parameter that sometimes helps. For example if I say that bar (without the wrapper) returns an object, I get:
In [84]: f=np.vectorize(bar, otypes=[object])
In [85]: f(np.array([[1,2],[3,4]]))
Out[85]:
array([[array([1, 2, 5]), array([2, 4, 5])],
[array([3, 6, 5]), array([4, 8, 5])]], dtype=object)
A (2,2) array of (3,) arrays. The (2,2) shape matches the shape of the input.
vectorize has a relatively new parameter, signature
In [90]: f=np.vectorize(bar, signature='()->(n)')
In [91]: f(np.array([[1,2],[3,4]]))
Out[91]:
array([[[1, 2, 5],
[2, 4, 5]],
[[3, 6, 5],
[4, 8, 5]]])
In [92]: _.shape
Out[92]: (2, 2, 3)
I haven't used this much, so am still getting a feel for how it works. When I've tested it, it is slower than the original scalar version of vectorize. Neither offers any speed advantage of explicit loops. However vectorize does help when 'broadcasting', allowing you to use a variety of input shapes. That's even more useful when your function takes several inputs, not just one as in this case.
In [94]: f(np.array([1,2]))
Out[94]:
array([[1, 2, 5],
[2, 4, 5]])
In [95]: f(np.array(3))
Out[95]: array([3, 6, 5])
For best speed, you want to use existing numpy whole-array functions where possible. For example your foo case can be done with:
In [97]: np.repeat(a[:,:,None],3, axis=2)
Out[97]:
array([[[1, 1, 1],
[2, 2, 2]],
[[3, 3, 3],
[4, 4, 4]]])
np.stack([a]*3, axis=2) also works.
And your bar desired result:
In [100]: np.stack([a, 2*a, np.full(a.shape, 5)], axis=2)
Out[100]:
array([[[1, 2, 5],
[2, 4, 5]],
[[3, 6, 5],
[4, 8, 5]]])
2*a takes advantage of the whole-array multiplication. That's true 'numpy-onic' thinking.
Just repeating the value into another dimension is quite simple:
import numpy as np
x = a = np.array([[1,2],[3,4]])
y = np.repeat(x[:,:,np.newaxis], 3, axis=2)
print y.shape
print y
(2L, 2L, 3L)
[[[1 1 1]
[2 2 2]]
[[3 3 3]
[4 4 4]]]
This seems to work for the "f R0 -> R1 mapped over a nd array giving a (n+1)d one"
def foo(x):
return np.concatenate((x,x))
np.apply_along_axis(foo,2,x.reshape(list(x.shape)+[1]))
doesn't generalize all that well, though

How to return some column items in a NumPy array?

I want print some items in 2D NumPy array.
For example:
a = [[1, 2, 3, 4],
[5, 6, 7, 8]]
a = numpy.array(a)
My questions:
How can I return just (1 and 2)? As well as (5 and 6)?
And how can I keep the dimension as [2, 2]
The following:
a[:, [0, 1]]
will select only the first two columns (with index 0 and 1). The result will be:
array([[1, 2],
[5, 6]])
You can use slicing to get necessary parts of the numpy array.
To get 1 and 2 you need to select 0's row and the first two columns, i.e.
>>> a[0, 0:2]
array([1, 2])
Similarly for 5 and 6
>>> a[1, 0:2]
array([5, 6])
You can also select a 2x2 subarray, e.g.
>>> a[:,0:2]
array([[1, 2],
[5, 6]])
You can do like this,
In [44]: a[:, :2]
Out[44]:
array([[1, 2],
[5, 6]])

Categories

Resources