Selection of elements from numpy array columns based on row index - python

I have a bidimensional array A and a list of indexes idx, for example :
A = np.array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.],
[ 0., -1., 0.],
[ 0., 0., 5.]])
idx = np.array([2, 1, 0, 1, 2])
and I'm trying to select the elements of A indexed by idx along the column axis (in this example : array([0., 1., 0., -1., 5.])). How can I do this without loops ?
Thank you !

A[np.arange(np.size(idx)), idx]
gives array([ 0., 1., 0., -1., 5.])
From the Advanced Indexing part of the documentation:
When the index consists of as many integer arrays as the array being
indexed has dimensions, the indexing is straight forward, but
different from slicing. [...] This is best understood with an example.

Indexing 2D numpy arrays can be a bit confusing.
You need: A[np.arange(0, A.shape[0]), idx]

Related

From list of dataframe to array of array in python

as the title says, I have this list called "list", containing multiple Dataframes (shape 120 x 120) with some numeric data, added from a previous list.
...
df_sum = list_dataframe[0]
for i in range (1, len(list_dataframe)):
df_sum = df_sum.add(list_dataframe[i])
list.append(df_sum)
Let's say that "list" contains 800 dataframes, so every index of this list contains a dataframe. I want to:
create an array with the same length of "list"
take every dataframe in "list", one by one, convert it into a Numpy array (120 x 120, so a matrix)
add every Numpy array (120 x 120) into the array created (800).
So i want to obtain an array (with a length of 800, same of list), where every index contains one of the 800 Numpy array (matrix).
I have already used .to_numpy() function applied to the list with a for loop,
for i in range(len(list)):
list[i] = list[i].to_numpy()
but it generates a strange structure, like an array of array of array where the second one contains only one element, that is the dataframe converted into an array:
>>> list
>>>[array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]),
array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
How can I do that?
You're on the right track. If you call np.array on your resulting list, it will create one large array that has shape (800, 120, 120). An example using a list comprehension instead of a for-loop:
import numpy as np
import pandas as pd
my_list = [pd.DataFrame(np.random.randint(10, size=(120, 120))) for _ in range(800)]
out = np.array([df.to_numpy() for df in my_list])
>>> out.shape
(800, 120, 120)

Best way to convert a tensor from a condensed representation

I have a Tensor that is in a condensed format representing a sparse 3-D matrix. I need to convert it to a normal matrix (the one that it is actually representing).
So, in my case, each row of any 2-D slice of my matrix can only contain one non-zero element. As data, then, I have for each of these rows, the value, and the index where it appears. For example, the tensor
inp = torch.tensor([[ 1, 2],
[ 3, 4],
[-1, 0],
[45, 1]])
represents a 4x5 matrix (first dimension comes from the first dimension of the tensor, second comes from the metadata) A, where A[0][2] = 1, A[1][4] = 3, A[2][0] = -1, A[3][1] = 45.
This is just one 2-D slice of my Matrix, and I have a variable number of these.
I was able to do this for a 2-D slice as described above in the following way using sparse_coo_tensor:
>>> torch.sparse_coo_tensor(torch.stack([torch.arange(0, 4), inp.t()[1]]), inp.t()[0], [4,5]).to_dense()
tensor([[ 0, 0, 1, 0, 0],
[ 0, 0, 0, 0, 3],
[-1, 0, 0, 0, 0],
[ 0, 45, 0, 0, 0]])
Is this the best way to accomplish this? Is there a simpler, more readable alternative?
How do I extend this to a 3-D matrix without looping?
For a 3-D matrix, you can imagine the input to be something like
inp_list = torch.stack([inp, inp, inp, inp])
and the desired output would be the above output stacked 4 times.
I feel like I should be able to do something if I create an index array correctly, but I cannot think of a way to do this without using some kind of looping.
OK, after a lot of experiments with different types of indexing, I got this to work. Turns out, the answer was in Advanced Indexing. Unfortunately, PyTorch documentation doesn't go in the details of Advanced Indexing. Here is a link for it in the Numpy documentation.
For the problem described above, this command did the trick:
>>> k_lst = torch.zeros([4,4,5])
>>> k_lst[torch.arange(4).unsqueeze(1), torch.arange(4), inp_list[:,:,1]] = inp_list[:,:,0].float()
>>> k_lst
tensor([[[ 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 3.],
[-1., 0., 0., 0., 0.],
[ 0., 45., 0., 0., 0.]],
[[ 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 3.],
[-1., 0., 0., 0., 0.],
[ 0., 45., 0., 0., 0.]],
[[ 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 3.],
[-1., 0., 0., 0., 0.],
[ 0., 45., 0., 0., 0.]],
[[ 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 3.],
[-1., 0., 0., 0., 0.],
[ 0., 45., 0., 0., 0.]]])
Which is exactly what I wanted.
I learned quite a few things searching for this, and I want to share this for anyone who stumbles on this question. So, why does this work? The answer lies in the way Broadcasting works. If you look at the shapes of the different index tensors involved, you'd see that they are (of necessity) broadcastable.
>>> torch.arange(4).unsqueeze(1).shape, torch.arange(4).shape, inp_list[:,:,1].shape
(torch.Size([4, 1]), torch.Size([4]), torch.Size([4, 4]))
Clearly, to access an element of a 3-D tensor such as k_lst here, we need 3 indexes - one for each dimension. If you give 3 tensors of same shapes to the [] operator, it can get a bunch of legal indexes by matching corresponding elements from the 3 tensors.
If the 3 tensors are of different shapes, but broadcastable (as is the case here), it copies the relevant rows/columns of the lacking tensors the requisite number of times to get tensors with the same shapes.
Ultimately, in my case, if we go into how the different values got assigned, this would be equivalent to doing
k_lst[0,0,inp_list[0,0,1]] = inp_list[0,0,0].float()
k_lst[0,1,inp_list[0,1,1]] = inp_list[0,1,0].float()
k_lst[0,2,inp_list[0,2,1]] = inp_list[0,2,0].float()
k_lst[0,3,inp_list[0,3,1]] = inp_list[0,3,0].float()
k_lst[1,0,inp_list[1,0,1]] = inp_list[1,0,0].float()
k_lst[1,1,inp_list[1,1,1]] = inp_list[1,1,0].float()
.
.
.
k_lst[3,3,inp_list[3,3,1]] = inp_list[3,3,0].float()
This format reminds me of torch.Tensor.scatter(), but if it can be used to solve this problem, I haven't figured out how yet.
I believe what you're saying is that you have a sparse tensor and want to convert it. Start with tf.sparse.to_dense and follow that with tensorflow.Tensor.eval()

Appending numpy arrays by column [duplicate]

I have a 60000 by 200 numpy array. I want to make it 60000 by 201 by adding a column of 1's to the right (so every row is [prev, 1]).
Concatenate with axis = 1 doesn't work because it seems like concatenate requires all input arrays to have the same dimension.
How should I do this?
Let me just throw in a very simple example with much smaller size. The principle should be the same.
a = np.zeros((6,2))
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
b = np.ones((6,1))
array([[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.]])
np.hstack((a,b))
array([[ 0., 0., 1.],
[ 0., 0., 1.],
[ 0., 0., 1.],
[ 0., 0., 1.],
[ 0., 0., 1.],
[ 0., 0., 1.]])
Using numpy index trick to append a 1D vector to a 2D array
a = np.zeros((6,2))
# array([[ 0., 0.],
# [ 0., 0.],
# [ 0., 0.],
# [ 0., 0.],
# [ 0., 0.],
# [ 0., 0.]])
b = np.ones(6) # or np.ones((6,1))
#array([1., 1., 1., 1., 1., 1.])
np.c_[a,b]
# array([[0., 0., 1.],
# [0., 0., 1.],
# [0., 0., 1.],
# [0., 0., 1.],
# [0., 0., 1.],
# [0., 0., 1.]])
Under cover all the stack variants (including append and insert) end up doing a concatenate. They just precede it with some sort of array reshape.
In [60]: A = np.arange(12).reshape(3,4)
In [61]: np.concatenate([A, np.ones((A.shape[0],1),dtype=A.dtype)], axis=1)
Out[61]:
array([[ 0, 1, 2, 3, 1],
[ 4, 5, 6, 7, 1],
[ 8, 9, 10, 11, 1]])
Here I made a (3,1) array of 1s, to match the (3,4) array. If I wanted to add a new row, I'd make a (1,4) array.
While the variations are handy, if you are learning, you should become familiar with concatenate and the various ways of constructing arrays that match in number of dimensions and necessary shapes.
The first thing to think about is that numpy arrays are really not meant to change size. So you should ask yourself, can you create your original matrix as 60k x 201 and then fill the last column afterwards. This is usually best.
If you really must do this, see
How to add column to numpy array
I think the numpy method column_stack is more interesting because you do not need to create a column numpy array to stack it in the matrix of interest. With the column_stack you just need to create a normal numpy array.

numpy 2d and 1d addition flat

While using the example from NumPy Book while starting out with NumPy I noted an example:
a = zeros((4, 5))
b = ones(6)
add(b, b, a[1:3, 0:3].flat)
print(a)
returns
array([[0, 0, 0, 0, 0]
[2, 2, 2, 0, 0]
[2, 2, 2, 0, 0]
[0, 0, 0, 0, 0]])
However, when I try this code, it results in the following error:
add(b, b, a[1:3, 0:3].flat)
TypeError: return arrays must be of ArrayType"
Could anyone please shed some light on this problem?
If you have 2 arguments for numpy.add they are taken as the two operands that are added. If you give 3 arguments the first two are the ones that are added and the third one is the result. Well actually not the result but the array where the result should be saved in.
So you added b with b and wanted to store it in a[1:3, 0:3].flat.
Let's just try to np.add(b, b) which gives
import numpy as np
a = np.zeros((4, 5))
b = np.ones(6)
np.add(b, b)
# returns array([ 2., 2., 2., 2., 2., 2.])
So now I tried a[1:3, 0:3].flat which returns <numpy.flatiter at 0x22204e80c10>. This means that it returns an iterator so it's no array. But we don't need an iterator we want an array. There is a method called ravel(). So trying a[1:3, 0:3].ravel() returns:
array([ 0., 0., 0., 0., 0., 0.])
so we have an array. Especially the array is also usable for storing the result (same shape!). So I tried:
np.add(b, b, a[1:3, 0:3].ravel())
# array([ 2., 2., 2., 2., 2., 2.])
But let's see if a has changed:
a
#array([[ 0., 0., 0., 0., 0.],
# [ 0., 0., 0., 0., 0.],
# [ 0., 0., 0., 0., 0.],
# [ 0., 0., 0., 0., 0.]])
So a hasn't changed. That's because ravel() only returns a view (assignment would propagate to the unravelled array) if possible otherwise it returns a copy. And saving the result in a copy is rather pointless because the whole point of the out parameter is that the operation is done in-place. I'm only guessing why a copy is made but I think it's because we take a portion out of a bigger array where the portion is not contiguous in the memory.
So I would propose that you don't use the out parameter in this case but use the return of the np.add and store it inside the specified region in a:
a[1:3, 0:3] = np.add(b, b).reshape(2,3) # You need to reshape here!
a
#array([[ 0., 0., 0., 0., 0.],
# [ 2., 2., 2., 0., 0.],
# [ 2., 2., 2., 0., 0.],
# [ 0., 0., 0., 0., 0.]])
Also a[1:3, 0:3].flat = np.add(b, b) works.
I think the book is either outdated and it worked with an older numpy version or it never worked at all and it was a mistake in the book.

Inserting a row at a specific location in a 2d array in numpy?

I have a 2d array in numpy where I want to insert a new row. Following question Numpy - add row to array can help. We can use numpy.vstack, but it stacks at the start or at the end. Can anyone please help in this regard.
You are probably looking for numpy.insert
>>> import numpy as np
>>> a = np.zeros((2, 2))
>>> a
array([[ 0., 0.],
[ 0., 0.]])
# In the following line 1 is the index before which to insert, 0 is the axis.
>>> np.insert(a, 1, np.array((1, 1)), 0)
array([[ 0., 0.],
[ 1., 1.],
[ 0., 0.]])
>>> np.insert(a, 1, np.array((1, 1)), 1)
array([[ 0., 1., 0.],
[ 0., 1., 0.]])

Categories

Resources