Iterate over columns of array as column vectors - python

Is there a way to iterate over the columns of a 2D numpy array such that the iterators remain column vectors?
i.e.
>>> A = np.arange(9).reshape((3,3))
[[0 1 2]
[3 4 5]
[6 7 8]]
>>> np.hstack([a in some_way_of_iterating(A)])
[[0 1 2]
[3 4 5]
[6 7 8]]
This is useful, for example, when I want to pass the column vectors into a function that transforms the individual column vector without having to clutter stuff with reshapes

How about simple transpose:
B = np.hstack([a.reshape(-1,1) for a in A.T])
You require .reshape(-1,1) to get size of n X 1 instead of just n

In [39]: A = np.arange(1,10).reshape(3,3)
In [40]: A
Out[40]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
Iteration on an array operates on the first dimension. It's much like iterating on a nested list - but slower. And like the list case it too reduces the dimension.
You could iterate on the range, and use advanced indexing, [i] to maintain the 2d, "column vector" shape:
In [41]: [A[:,[i]] for i in range(3)]
Out[41]:
[array([[1],
[4],
[7]]),
array([[2],
[5],
[8]]),
array([[3],
[6],
[9]])]
Or iterate on the transpose - but this still requires some form of reshape. I prefer the None/newaxis syntax.
In [42]: [a[:,None] for a in A.T]
Out[42]:
[array([[1],
[4],
[7]]),
array([[2],
[5],
[8]]),
array([[3],
[6],
[9]])]
Indexing and reshape can be combined with:
In [43]: A[:,0,None]
Out[43]:
array([[1],
[4],
[7]])
Or with slicing:
In [44]: A[:,1:2]
Out[44]:
array([[2],
[5],
[8]])
There is a difference that may matter. A[:,[i]] makes a copy, A[:,i,None] is a view.
This may be the time to reread the basic numpy indexing docs.
https://numpy.org/doc/stable/reference/arrays.indexing.html

An ugly but another possible way with index and transpose:
np.hstack([A[:,i][np.newaxis].T for i in range(len(A.T))])
I am using np.newaxis to facilitate the transpose. Based on #hpaulj suggestion this can be significantly cleaned off:
np.hstack([A[:,i,np.newaxis] for i in range(A.shape[1])])
Output:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])

Related

Multi-dimensional array notation in Python

I have two arrays A and i with dimensions (1, 3, 3) and (1, 2, 2) respectively. I want to define a new array I which gives the elements of A based on i. The current and desired outputs are attached.
import numpy as np
i=np.array([[[0,0],[1,2],[2,2]]])
A = np.array([[[1,2,3],[4,5,6],[7,8,9]]], dtype=float)
I=A[0,i]
print([I])
The current output is
[array([[[[1.000000000, 2.000000000, 3.000000000],
[1.000000000, 2.000000000, 3.000000000]],
[[4.000000000, 5.000000000, 6.000000000],
[7.000000000, 8.000000000, 9.000000000]],
[[7.000000000, 8.000000000, 9.000000000],
[7.000000000, 8.000000000, 9.000000000]]]])]
The desired output is
[array(([[[1],[6],[9]]]))
In [131]: A.shape, i.shape
Out[131]: ((1, 3, 3), (1, 3, 2))
That leading size 1 dimension just adds a [] layer, and complicates indexing (a bit):
In [132]: A[0]
Out[132]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
This is the indexing that I think you want:
In [133]: A[0,i[0,:,0],i[0,:,1]]
Out[133]: array([1, 6, 9])
If you really need a trailing size 1 dimension, add it after:
In [134]: A[0,i[0,:,0],i[0,:,1]][:,None]
Out[134]:
array([[1],
[6],
[9]])
From the desired numbers, I deduced that you wanted to use the 2 columns of i as indices to two different dimensions of A:
In [135]: i[0]
Out[135]:
array([[0, 0],
[1, 2],
[2, 2]])
Another way to do the same thing:
In [139]: tuple(i.T)
Out[139]:
(array([[0],
[1],
[2]]),
array([[0],
[2],
[2]]))
In [140]: A[0][tuple(i.T)]
Out[140]:
array([[1],
[6],
[9]])
You must enter
I=A[0,:1,i[:,1]]
You can use numpy's take for that.
However, take works with a flat index, so you will need to use [0, 5, 8] for your indexes instead.
Here is an example:
>>> I = [A.shape[2] * x + y for x,y in i[0]] # Convert to flat indexes
>>> I = np.expand_dims(I, axis=(1,2))
>>> A.take(I)
array([[[1.]],
[[6.]],
[[9.]]])

numpy array slicing index

import numpy as np
a=np.array([ [1,2,3],[4,5,6],[7,8,9]])
How can I get zeroth index column? Expecting output [[1],[2],[3]] a[...,0] gives 1D array. Maybe next question answers this question.
How to get last 2 columns of a? a[...,1:2] gives second column only, a[...,2:3] gives last 2 columns, but a[...,3] is invalid dimension. So, how does it work?
By the way, operator ... and : have same meaning? a[...,0] and a[:,0] give same output. Can someone comment here?
numpy indexing is built on python list conventions, but extended to multi-dimensions and multi-element indexing. It is powerful, but complex, but sooner or later you should read a full indexing documentation, one that distinguishes between 'basic' and 'advanced' indexing.
Like range and arange, slice index has a 'open' stop value
In [111]: a = np.arange(1,10).reshape(3,3)
In [112]: a
Out[112]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
Indexing with a scalar reduces the dimension, regardless of where:
In [113]: a[1,:]
Out[113]: array([4, 5, 6])
In [114]: a[:,1]
Out[114]: array([2, 5, 8])
That also means a[1,1] returns 5, not np.array([[5]]).
Indexing with a slice preserves the dimension:
In [115]: a[1:2,:]
Out[115]: array([[4, 5, 6]])
so does indexing with a list or array (though this makes a copy, not a view):
In [116]: a[[1],:]
Out[116]: array([[4, 5, 6]])
... is a generalized : - use as many as needed.
In [117]: a[...,[1]]
Out[117]:
array([[2],
[5],
[8]])
You can adjust dimensions with newaxis or reshape:
In [118]: a[:,1,np.newaxis]
Out[118]:
array([[2],
[5],
[8]])
Note that trailing : are automatic. a[1] is the same as a[1,:]. But leading ones must be explicit.
List indexing also removes a 'dimension/nesting layer'
In [119]: alist = [[1,2,3],[4,5,6]]
In [120]: alist[0]
Out[120]: [1, 2, 3]
In [121]: alist[0][0]
Out[121]: 1
In [122]: [l[0] for l in alist] # a column equivalent
Out[122]: [1, 4]
import numpy as np
a=np.array([ [1,2,3],[4,5,6],[7,8,9]])
a[:,0] # first colomn
>>> array([1, 4, 7])
a[0,:] # first row
>>> array([1, 2, 3])
a[:,0:2] # first two columns
>>> array([[1, 2],
[4, 5],
[7, 8]])
a[0:2,:] # first two rows
>>> array([[1, 2, 3],
[4, 5, 6]])

Selecting specific groups of rows from numpy array [duplicate]

Given:
test = numpy.array([[1, 2], [3, 4], [5, 6]])
test[i] gives the ith row (e.g. [1, 2]). How do I access the ith column? (e.g. [1, 3, 5]). Also, would this be an expensive operation?
To access column 0:
>>> test[:, 0]
array([1, 3, 5])
To access row 0:
>>> test[0, :]
array([1, 2])
This is covered in Section 1.4 (Indexing) of the NumPy reference. This is quick, at least in my experience. It's certainly much quicker than accessing each element in a loop.
>>> test[:,0]
array([1, 3, 5])
this command gives you a row vector, if you just want to loop over it, it's fine, but if you want to hstack with some other array with dimension 3xN, you will have
ValueError: all the input arrays must have same number of dimensions
while
>>> test[:,[0]]
array([[1],
[3],
[5]])
gives you a column vector, so that you can do concatenate or hstack operation.
e.g.
>>> np.hstack((test, test[:,[0]]))
array([[1, 2, 1],
[3, 4, 3],
[5, 6, 5]])
And if you want to access more than one column at a time you could do:
>>> test = np.arange(9).reshape((3,3))
>>> test
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> test[:,[0,2]]
array([[0, 2],
[3, 5],
[6, 8]])
You could also transpose and return a row:
In [4]: test.T[0]
Out[4]: array([1, 3, 5])
Although the question has been answered, let me mention some nuances.
Let's say you are interested in the first column of the array
arr = numpy.array([[1, 2],
[3, 4],
[5, 6]])
As you already know from other answers, to get it in the form of "row vector" (array of shape (3,)), you use slicing:
arr_col1_view = arr[:, 1] # creates a view of the 1st column of the arr
arr_col1_copy = arr[:, 1].copy() # creates a copy of the 1st column of the arr
To check if an array is a view or a copy of another array you can do the following:
arr_col1_view.base is arr # True
arr_col1_copy.base is arr # False
see ndarray.base.
Besides the obvious difference between the two (modifying arr_col1_view will affect the arr), the number of byte-steps for traversing each of them is different:
arr_col1_view.strides[0] # 8 bytes
arr_col1_copy.strides[0] # 4 bytes
see strides and this answer.
Why is this important? Imagine that you have a very big array A instead of the arr:
A = np.random.randint(2, size=(10000, 10000), dtype='int32')
A_col1_view = A[:, 1]
A_col1_copy = A[:, 1].copy()
and you want to compute the sum of all the elements of the first column, i.e. A_col1_view.sum() or A_col1_copy.sum(). Using the copied version is much faster:
%timeit A_col1_view.sum() # ~248 µs
%timeit A_col1_copy.sum() # ~12.8 µs
This is due to the different number of strides mentioned before:
A_col1_view.strides[0] # 40000 bytes
A_col1_copy.strides[0] # 4 bytes
Although it might seem that using column copies is better, it is not always true for the reason that making a copy takes time too and uses more memory (in this case it took me approx. 200 µs to create the A_col1_copy). However if we needed the copy in the first place, or we need to do many different operations on a specific column of the array and we are ok with sacrificing memory for speed, then making a copy is the way to go.
In the case we are interested in working mostly with columns, it could be a good idea to create our array in column-major ('F') order instead of the row-major ('C') order (which is the default), and then do the slicing as before to get a column without copying it:
A = np.asfortranarray(A) # or np.array(A, order='F')
A_col1_view = A[:, 1]
A_col1_view.strides[0] # 4 bytes
%timeit A_col1_view.sum() # ~12.6 µs vs ~248 µs
Now, performing the sum operation (or any other) on a column-view is as fast as performing it on a column copy.
Finally let me note that transposing an array and using row-slicing is the same as using the column-slicing on the original array, because transposing is done by just swapping the shape and the strides of the original array.
A[:, 1].strides[0] # 40000 bytes
A.T[1, :].strides[0] # 40000 bytes
To get several and indepent columns, just:
> test[:,[0,2]]
you will get colums 0 and 2
>>> test
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
>>> ncol = test.shape[1]
>>> ncol
5L
Then you can select the 2nd - 4th column this way:
>>> test[0:, 1:(ncol - 1)]
array([[1, 2, 3],
[6, 7, 8]])
This is not multidimensional. It is 2 dimensional array. where you want to access the columns you wish.
test = numpy.array([[1, 2], [3, 4], [5, 6]])
test[:, a:b] # you can provide index in place of a and b

How does numpy three dimensiona slicing and indexing and ellipsis work?

I'm having a hard time understanding how some of numpy's slicing and indexing works
First one is the following:
>>> x = np.array([[[1],[2],[3]], [[4],[5],[6]]])
>>> x.shape
(2, 3, 1)
>>> x[1:2]
array([[[4],
[5],
[6]]])
According to the documentation,
If the number of objects in the selection tuple is less than N , then
: is assumed for any subsequent dimensions.
So does that means [[1], [2], [3]] , [[4], [5], [6]] is a 2x3 array itself?
And how does
x[1:2]
return
array([[[4],
[5],
[6]]])
?
The second is ellipsis,
>>> x[...,0]
array([[1, 2, 3],
[4, 5, 6]])
Ellipsis expand to the number of : objects needed to make a selection
tuple of the same length as x.ndim. There may only be a single
ellipsis present.
Why does [...,0] means?
For your first question, it means that x of shape (2, 3, 1) has 2 slices of 3x1 arrays.
In [40]: x
Out[40]:
array([[[1],
[2], # <= slice 1 of shape 3x1
[3]],
[[4],
[5], # <= slice 2 of shape 3x1
[6]]])
Now, when you execute x[1:2], it just hands you over the first slice but not including the second slice since in Python & NumPy it's always left inclusive and right exclusive (something like half-open interval, i.e. [1,2) )
In [42]: x[1:2]
Out[42]:
array([[[4],
[5],
[6]]])
This is why you just get the first slice.
For your second question,
In [45]: x.ndim
Out[45]: 3
So, when you use ellipsis, it just stretches out your array to size 3.
In [47]: x[...,0]
Out[47]:
array([[1, 2, 3],
[4, 5, 6]])
The above code means, you take both slices from the array x, and stretch it row-wise.
But instead, if you do
In [49]: x[0, ..., 0]
Out[49]: array([1, 2, 3])
Here, you just take the first slice from x and stretch it row-wise.
Now, when you execute x[1:2], it just hands you over the first slice.
My question is shouldn't it be second slice. As the output is slice 2
In [42]: x[1:2]
Out[42]:
array([[[4],
[5],
[6]]])

Best way to iterate through a numpy array returning the columns as 2d arrays

EDIT: Thank you all for the good solutions, I think if I'd had to pick one, it would be A[:,[0]]
I collected 7 approaches now and put them into an IPython notebook. The timeit benchmarks are not suprising: they are all roughly the same in terms of speed.
Thanks a lot for your suggestion!
I a looking for a good way to iterate through the columns of a matrix and return them as 1xd column vectors. I have some ideas, but I don't think that those are good solutions. I think I am missing something here. Which way would you recommend? E.g., let's say I have the following matrix and want to return the first column as a column vector:
A = np.array([ [1,2,3], [4,5,6], [7,8,9] ])
>>> A
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
By default, numpy returns it like this:
>>> A[:,0]
array([1, 4, 7])
>>> A[:,0].shape
(3,)
And what I want is this:
array([[1],
[4],
[7]])
with .shape = (3,1)
Transpose doesn't work to return it as a column vector.
>>> A[:,0].T
array([1, 4, 7])
>>> A[:,0]
array([1, 4, 7])
I would have to create a new axis every time
>>> A[:,0][:,np.newaxis].shape
(3, 1)
>>> A[:,0][:,np.newaxis]
array([[1],
[4],
[7]])
Or after doing some experimenting, I came up with other workarounds like this:
>>> A[:,0:1]
array([[1],
[4],
[7]])
>>> A[:,0].reshape(A.shape[1],1)
array([[1],
[4],
[7]])
My favorite solution is the slicing. You have different solutions :
A[:,0:1] # not so clear
A[:,:1] # black magic
A[:,[0]] # clearest syntax imho
Concerning the reshape solution, you can enhance the syntax like this :
A[:,0].reshape(A.shape[1],1)
A[:,0].reshape(-1,1)
You can also merge the following :
A[:,0][:,np.newaxis] # ->
A[:,0,np.newaxis] # or
A[:,np.newaxis,0]
One way would be to use numpy.row_stack or numpy.vstack:
In [91]: np.row_stack(A[:,0])
Out[91]:
array([[1],
[4],
[7]])
In [92]: np.vstack(A[:,0])
Out[92]:
array([[1],
[4],
[7]])
You can use column_stack:
>>> A
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
>>> np.column_stack((A[:,0],))
array([[1],
[4],
[7]])
>>> # ^^^^^^^ a tuple
Just make sure that you are feeding it a 1 element tuple for a single column or you are getting something different:
>>> np.column_stack(A[:,0])
array([[1, 4, 7]])

Categories

Resources