Appending numpy arrays by column [duplicate] - python

I have a 60000 by 200 numpy array. I want to make it 60000 by 201 by adding a column of 1's to the right (so every row is [prev, 1]).
Concatenate with axis = 1 doesn't work because it seems like concatenate requires all input arrays to have the same dimension.
How should I do this?

Let me just throw in a very simple example with much smaller size. The principle should be the same.
a = np.zeros((6,2))
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
b = np.ones((6,1))
array([[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.]])
np.hstack((a,b))
array([[ 0., 0., 1.],
[ 0., 0., 1.],
[ 0., 0., 1.],
[ 0., 0., 1.],
[ 0., 0., 1.],
[ 0., 0., 1.]])

Using numpy index trick to append a 1D vector to a 2D array
a = np.zeros((6,2))
# array([[ 0., 0.],
# [ 0., 0.],
# [ 0., 0.],
# [ 0., 0.],
# [ 0., 0.],
# [ 0., 0.]])
b = np.ones(6) # or np.ones((6,1))
#array([1., 1., 1., 1., 1., 1.])
np.c_[a,b]
# array([[0., 0., 1.],
# [0., 0., 1.],
# [0., 0., 1.],
# [0., 0., 1.],
# [0., 0., 1.],
# [0., 0., 1.]])

Under cover all the stack variants (including append and insert) end up doing a concatenate. They just precede it with some sort of array reshape.
In [60]: A = np.arange(12).reshape(3,4)
In [61]: np.concatenate([A, np.ones((A.shape[0],1),dtype=A.dtype)], axis=1)
Out[61]:
array([[ 0, 1, 2, 3, 1],
[ 4, 5, 6, 7, 1],
[ 8, 9, 10, 11, 1]])
Here I made a (3,1) array of 1s, to match the (3,4) array. If I wanted to add a new row, I'd make a (1,4) array.
While the variations are handy, if you are learning, you should become familiar with concatenate and the various ways of constructing arrays that match in number of dimensions and necessary shapes.

The first thing to think about is that numpy arrays are really not meant to change size. So you should ask yourself, can you create your original matrix as 60k x 201 and then fill the last column afterwards. This is usually best.
If you really must do this, see
How to add column to numpy array

I think the numpy method column_stack is more interesting because you do not need to create a column numpy array to stack it in the matrix of interest. With the column_stack you just need to create a normal numpy array.

Related

In a pytorch tensor, return an array of indices of the rows of specific value

Given the below tensor that has vectors of all zeros and vectors with ones and zeros:
tensor([[0., 0., 0., 0.],
[0., 1., 1., 0.],
[0., 0., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 0.],
[0., 0., 1., 0.],
[1., 0., 0., 1.],
[0., 0., 0., 0.],...])
How can I have an array of indices of the vectors with ones and zeros so the output is like this:
indices = tensor([ 1, 3, 5, 6,...])
Update
A way to do it is:
indices = torch.unique(torch.nonzero(y>0,as_tuple=True)[0])
But I'm not sure if there's a better way to do it.
An alternative way is to use torch.Tensor.any coupled with torch.Tensor.nonzero:
>>> x.any(1).nonzero()[:,0]
tensor([1, 3, 5, 6])
Otherwise, since the tensor contains only positive value, you can sum the columns and mask:
>>> x.sum(1).nonzero()[:,0]
tensor([1, 3, 5, 6])

How to interpret numpy advanced indexing solution

I have a piece of numpy code that I know works. I know this because I have tested it in my generic case successfully. However, I arrived at the solution after two hours of back and forth referencing the docs and trial and error. I can't grasp how I would know to do this intuitively.
The setup:
a = np.zeros((5,5,3))
The goal: Set to 1 indices 0,1 of axis 1, 0,1 of axis 2, all of axis 3 and indices 3,4 of axis 1, 3,4 of axis 2, all of axis 3
Clearer goal: Set block 1 and 2's first two rows to 1 and block 3 and 4's last two rows to 1
The result:
ax1 =np.array([np.array([0,1]),np.array([3,4])])
ax1 =np.array([x[:,np.newaxis] for x in ax1])
ax2 = np.array([[[0,1]],[[3,4]]])
a[ax1,ax2,:] = 1
a
Output:
array([[[1., 1., 1.],
[1., 1., 1.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],
[[1., 1., 1.],
[1., 1., 1.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],
[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],
[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[1., 1., 1.],
[1., 1., 1.]],
[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[1., 1., 1.],
[1., 1., 1.]]])
I'm inclined to believe I should be able to look at the shape of the matrix in question, the shape of the indices, and the index operation to intuitively know the output. However, I can't put the story together in my head. Like, what's the final shape of the subspace it is altering? How would you explain how this works?
The shapes:
input: (5, 5, 3)
ind1: (2, 2, 1)
ind2: (2, 1, 2)
final_op: input[ind1, ind2, :]
With shapes
ind1: (2, 2, 1)
ind2: (2, 1, 2)
they broadcast together to select a (2,2,2) space
In [4]: ax1
Out[4]:
array([[[0],
[1]],
[[3],
[4]]])
In [5]: ax2
Out[5]:
array([[[0, 1]],
[[3, 4]]])
So for the 1st dimension (blocks) it is selecting blocks 0,1,3,and 4. In the second dimension it is also selecting these rows.
Together that's the first 2 rows of the first 2 blocks, and the last 2 rows of the last 2 blocks. That's where the 1s appear in your result.
A simpler way of creating the index arrays:
In [7]: np.array([[0,1],[3,4]])[:,:,None] # (2,2) expanded to (2,2,1)
In [8]: np.array([[0,1],[3,4]])[:,None,:] # expand to (2,1,2)
This is how broadcasting expands them:
In [10]: np.broadcast_arrays(ax1,ax2)
Out[10]:
[array([[[0, 0], # block indices
[1, 1]],
[[3, 3],
[4, 4]]]),
array([[[0, 1], # row indices
[0, 1]],
[[3, 4],
[3, 4]]])]
This may make the pattern clearer:
In [15]: a[ax1,ax2,:] = np.arange(1,5).reshape(2,2,1)
In [16]: a[:,:,0]
Out[16]:
array([[1., 2., 0., 0., 0.],
[3., 4., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 1., 2.],
[0., 0., 0., 3., 4.]])

Selection of elements from numpy array columns based on row index

I have a bidimensional array A and a list of indexes idx, for example :
A = np.array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.],
[ 0., -1., 0.],
[ 0., 0., 5.]])
idx = np.array([2, 1, 0, 1, 2])
and I'm trying to select the elements of A indexed by idx along the column axis (in this example : array([0., 1., 0., -1., 5.])). How can I do this without loops ?
Thank you !
A[np.arange(np.size(idx)), idx]
gives array([ 0., 1., 0., -1., 5.])
From the Advanced Indexing part of the documentation:
When the index consists of as many integer arrays as the array being
indexed has dimensions, the indexing is straight forward, but
different from slicing. [...] This is best understood with an example.
Indexing 2D numpy arrays can be a bit confusing.
You need: A[np.arange(0, A.shape[0]), idx]

Initialize empty vector with 3 dimensions

I want to initialize an empty vector with 3 columns that I can add to. I need to perform some l2 norm distance calculations on the rows after I have added to it, and I'm having the following problem.
I start with an initial empty array:
accepted_clusters = np.array([])
Then I add my first 1x3 set of values to this:
accepted_clusters = np.append(accepted_clusters, X_1)
returning:
[ 0.47843416 0.50829221 0.51484499]
Then I add a second set of 1x3 values in the same way, and I get the following:
[ 0.47843416 0.50829221 0.51484499 0.89505277 0.8359252 0.21434642]
However, what I want is something like this:
[ 0.47843416 0.50829221 0.51484499]
[ 0.89505277 0.8359252 0.21434642]
.. and so on
This would enable me to calculate distances between the rows. Ideally, the initial empty vector would be of undefined length, but something like a 10x3 of zeros would also work if the code for that is easy.
The most straightforward way is to use np.vstack:
In [9]: arr = np.array([1,2,3])
In [10]: x = np.arange(20, 23)
In [11]: arr = np.vstack([arr, x])
In [12]: arr
Out[12]:
array([[ 1, 2, 3],
[20, 21, 22]])
Note, your entire approach has major code smell, doing the above in a loop will give you quadratic complexity. Perhaps you should work with a list and then convert to an array at the end (which will at least be linear-time). Or maybe rethink your approach entirely.
Or, as you imply, you could pre-allocate your array:
In [18]: result = np.zeros((10, 3))
In [19]: result
Out[19]:
array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]])
In [20]: result[0] = x
In [21]: result
Out[21]:
array([[ 20., 21., 22.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]])
You can try using vstack to add rows.
accepted_clusters=np.vstack([accepted_clusters,(0.89505277, 0.8359252, 0.21434642)])

Get corner values in Python numpy ndarray

I'm trying to access the corner values of a numpy ndarray. I'm absolutely stumped as for methodology. Any help would be greatly appreciated.
For example, from the below array I'd like a return value of array([1,0,0,5]) or array([[1,0],[0,5]]).
array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 5.],
[ 0., 0., 5., 5.]])
To add variety to the answers, you can get a view (not a copy) of the corner items doing:
corners = a[::a.shape[0]-1, ::a.shape[1]-1]
Or, for a generic n-dimensional array:
corners = a[tuple(slice(None, None, j-1) for j in a.shape)]
Doing this, you can modify the original array by modifying the view:
>>> a = np.arange(9).reshape(3, 3)
>>> a
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> corners = a[tuple(slice(None, None, j-1) for j in a.shape)]
>>> corners
array([[0, 2],
[6, 8]])
>>> corners += 1
>>> a
array([[1, 1, 3],
[3, 4, 5],
[7, 7, 9]])
EDIT Ah, you want a flat list of corner values... That cannot in general be achieved with a view, so #IanH's answer is what you are looking for.
How about
A[[0,0,-1,-1],[0,-1,0,-1]]
where A is the array.
Use np.ix_ to construct the indices.
>>> a
array([[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 5.],
[0., 0., 5., 5.]])
>>> corners = np.ix_((0,-1),(0,-1))
>>> a[corners]
array([[1., 0.],
[0., 5.]])
You can manually specify the corners (using negative indexes):
a = numpy.array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 5.],
[ 0., 0., 5., 5.]])
result = numpy.array([a[0][0],a[0][-1],a[-1][0],a[-1][-1]])
# result will contain array([ 1., 0., 0., 5.])
result = numpy.array([a[0][0],a[0][-1],a[-1][0],a[-1][-1]])
# result will contain array([[ 1., 0.],
# [ 0., 5.]])

Categories

Resources