This question already has answers here:
NumPy selecting specific column index per row by using a list of indexes
(7 answers)
Closed 1 year ago.
Given an array like below:
np.arange(12).reshape(4,3)
Out[119]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
I want to select a single element from each of the rows using a list of indices [0, 2, 1, 2] to create a 4x1 array of [0, 5, 7, 11].
Is there any easy way to do this indexing. The closest I could see was the gather method in pytorch.
>>> import torch
>>> import numpy as np
>>> s = np.arange(12).reshape(4,3)
>>> s = torch.tensor(s)
>>> s
tensor([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
>>> idx = torch.tensor([0, 2, 1, 2])
>>> torch.gather(s,-1 ,idx.unsqueeze(-1))
tensor([[ 0],
[ 5],
[ 7],
[11]])
torch.gather(s,-1 ,idx.unsqueeze(-1))
arr[[0,1,2,3], [0,2,1,2]]
or if you prefer np.arange(4) for the 1st indexing array.
Please try to run the following code.
import numpy as np
x = [[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]]
index_array = [0, 2, 1, 2]
index = 0
result = []
for item in x:
result.append(item[index_array[index]])
index += 1
print (result)
Here is the result.
[0, 5, 7, 11]
>
Related
This question already has answers here:
Resizing and stretching a NumPy array
(3 answers)
Closed 2 years ago.
Suppose I have the following feature matrix X (ie. with 4 rows and 3 features):
X = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
(array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]]),
How do I duplicate say, the 1st and 2nd row twice, the 3rd row 3 times and no duplication in the 4th row, ie. I want to get this:
(array([[ 1, 2, 3],
[ 1, 2, 3]
[ 4, 5, 6],
[ 4, 5, 6]
[ 7, 8, 9],
[ 7, 8, 9]
[ 7, 8, 9]
[10, 11, 12]]),
Is there a way for me to multiply the features matrix X with an array of weights, say something like this:
np.array([2,2,3,1])
Thanks in advance.
You can do it through indexing:
repeats = np.array([2,2,3,1])
indices = np.repeat(range(len(X)), repeats)
X[indices]
Where indices is
array([0, 0, 1, 1, 2, 2, 2, 3])
np.repeat(X, [2, 2, 3, 1], axis=0)
Suppose I have a 2D numpy array, say 5*3. Now I would like to map each element i in it to a new array [i, i*i], so the resulting array is 5*3*2.
What is the most efficient (and elegant) way to achieve this purpose?
A naive solution using for:
a = np.arange(15).reshape(5, 3)
r = []
for row in a:
_row = []
for i in row:
_row.append([i, i*i])
r.append(_row)
return np.array(r)
You could use np.dstack to stack both arrays depth wise:
np.dstack([a, a**2])
a = np.arange(15).reshape(5, 3)
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
np.dstack([a, a**2])
array([[[ 0, 0],
[ 1, 1],
[ 2, 4]],
[[ 3, 9],
[ 4, 16],
[ 5, 25]],
...
I would like to know if there is any fast way to sum each row of a first array with all rows of a second array. In this case both arrays have the same number of colulmns. For instance if array1.shape = (n,c) and array2.shape = (m,c), the resulting array would be an array3.shape = ((n*m), c)
Look at the example below:
array1 = np.array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
array2 = np.array([[0, 1, 2],
[3, 4, 5]])
The result would be:
array3 = np.array([[0, 2, 4],
[3, 5, 7]
[3, 5, 7]
[6, 8, 10]
[6, 8, 10]
[9, 11, 13]])
The only way I see I can do this is to repeat each row of one of the arrays the number of rows of the other array. For instance, by doing np.repeat(array1, len(array2), axis=0) and then sum this array with array2. This is not very practical however if the number of rows is too big. The other way would be with a for loop but this is too slow.
Any other better way to do it..?
Thanks in advance.
Extend array1 to 3D so that it becomes broadcastable against 2D array2 and then perform broadcasted addition and a final reshape is needed for desired output -
In [30]: (array1[:,None,:] + array2).reshape(-1,array1.shape[1])
Out[30]:
array([[ 0, 2, 4],
[ 3, 5, 7],
[ 3, 5, 7],
[ 6, 8, 10],
[ 6, 8, 10],
[ 9, 11, 13]])
You could try the following inline code if you haven't already. This is the simplest and probably also the quickest on a single thread.
>>> import numpy as np
>>> array1 = np.array([[0, 1, 2],
... [3, 4, 5],
... [6, 7, 8]])
>>>
>>> array2 = np.array([[0, 1, 2],
... [3, 4, 5]])
>>> array3 = np.array([i+j for i in array1 for j in array2])
>>> array3
array([[ 0, 2, 4],
[ 3, 5, 7],
[ 3, 5, 7],
[ 6, 8, 10],
[ 6, 8, 10],
[ 9, 11, 13]])
>>>
If you are looking for speed up by treading, you could consider using CUDA or multithreading. This suggestion goes a bit out of scope of your question but gives you an idea of what can be done to speed up matrix operations.
How can I isolate rows in a 2d numpy matrix that match a specific criteria? For example if I have some data and I only want to look at the rows where the 0 index has a value of 5 or less how would I retrieve those values?
I tried this approach:
import numpy as np
data = np.matrix([
[10, 8, 2],
[1, 4, 5],
[6, 5, 7],
[2, 2, 10]])
#My attempt to retrieve all rows where index 0 is less than 5
small_data = (data[:, 0] < 5)
The output is:
matrix([
[False],
[ True],
[False],
[ True]], dtype=bool)
However I'd like the output to be:
[[1, 4, 5],
[2, 2, 10]]
Another approach may be for me to loop through the matrix rows and if the 0 index is smaller than 5 append the row to a list but I am hoping there is a better way than that.
Note: I'm using Python 2.7.
First: Don't use np.matrix, use normal np.arrays.
import numpy as np
data = np.array([[10, 8, 2],
[1, 4, 5],
[6, 5, 7],
[2, 2, 10]])
Then you can always use boolean indexing (based on the boolean array you get when you do comparisons) to get the desired rows:
>>> data[data[:, 0] < 5]
array([[ 1, 4, 5],
[ 2, 2, 10]])
or integer array indexing:
>>> data[np.where(data[:, 0] < 5)]
array([[ 1, 4, 5],
[ 2, 2, 10]])
That way you got a logical array that you can use to select the desired rows.
>>> data = np.matrix([
[10, 8, 2],
[1, 4, 5],
[6, 5, 7],
[2, 2, 10]])
>>> data = np.array(data)
>>> data[(data[:, 0] < 5), :]
array([[ 1, 4, 5],
[ 2, 2, 10]])
You can also use np.squeeze to filter the rows.
>>> ind = np.squeeze(np.asarray(data [:,0]))<5
>>> data[ind,:]
array([[ 1, 4, 5],
[ 2, 2, 10]])
use the following code.
data[data[small_data,:]]
That would work
I'm having a bit of a difficulty. I'm trying to vectorize some code in python in order to make it faster. I have an array which I sort (A) and get the index list (Ind). I have another array (B) which I would like to sort by the index list, without using loops which I think bottlenecks the computation.
A = array([[2, 1, 9],
[1, 1, 5],
[7, 4, 1]])
Ind = np.argsort(A)
This is the result of Ind:
Ind = array([[1, 0, 2],
[0, 1, 2],
[2, 1, 0]], dtype=int64)
B is the array i would like to sort by Ind:
B = array([[ 6, 3, 9],
[ 1, 5, 3],
[ 2, 7, 13]])
I would like to use Ind to rearrange my elements in B as such (B rows sorted by A rows indexes):
B = array([[ 3, 6, 9],
[ 1, 5, 3],
[13, 7, 2]])
Any Ideas? I would be glad to get any good suggestion. I want to mention I am using millions of values, I mean arrays of 30000*5000.
Cheers,
Robert
I would do something like this:
import numpy as np
from numpy import array
A = array([[2, 1, 9],
[1, 1, 5],
[7, 4, 1]])
Ind = np.argsort(A)
B = array([[ 3, 6, 9],
[ 1, 5, 3],
[13, 7, 2]])
# an array of the same shape as A and B with row numbers for each element
rownums = np.tile(np.arange(3), (3, 1)).T
new_B = np.take(B, rownums * 3 + Ind)
print(new_B)
# [[ 6 3 9]
# [ 1 5 3]
# [ 2 7 13]]
You can replace the magic number 3 with the array shape.