a and b are torch tensor No repeating elements
a shape is [n,2] like:
[[1,2]
[2,3]
[4,6]
...]
b is[m,2] like:
[[1,2]
[4,6]
....
]
how to get the index of b in a, example:
a = [[1,2]
[2,4]
[6,7]
]
b = [[1,2]
[6,7]]
the index should be (0,3), we can use gpu,
I can think of the following trick that can work for you.
Since we have two tensors with different numbers of rows (n and m), first we transform them into the same shape (m x n x 2) and then subtract. If two rows match, then after subtraction, the entire row will be zero. Then, we need to identify the indices of those rows.
n = a.shape[0] # 3
m = b.shape[0] # 2
_a = a.unsqueeze(0).repeat(m, 1, 1) # m x n x 2
_b = b.unsqueeze(1).repeat(1, n, 1) # m x n x 2
match = (_a - _b).sum(-1) # m x n
indices = (match == 0).nonzero()
if indices.nelement() > 0: # empty tensor check
row_indices = indices[:, 1]
else:
row_indices = []
print(row_indices)
Sample Input/Output
Example 1
a = torch.tensor([[1, 2], [2, 4], [6, 7]])
b = torch.tensor([[1, 2], [6, 7]])
tensor([0, 2])
Example 2
a = torch.tensor([[1, 2], [2, 4], [6, 7]])
b = torch.tensor([[1, 3], [6, 7]])
tensor([2])
Example 3
a = torch.tensor([[1, 2], [2, 4], [6, 7]])
b = torch.tensor([[1, 2], [6, 5], [8, 9]])
tensor([0])
Example 4
a = torch.tensor([[1, 2], [2, 4], [6, 7]])
b = torch.tensor([[1, 3], [6, 5], [8, 9]])
[]
Here #jpp 's, numpy solution is almost your answer after this
You just need to get indices using nonzero and flatten tensor using flatten to get expected shape.
a = torch.tensor([[1, 2], [2, 4], [6, 7]])
b = torch.tensor([[1, 2], [6, 7]])
(a[:, None] == b).all(-1).any(-1).nonzero().flatten()
tensor([0, 2])
Related
Let the 2-dimensional array is as below:
In [1]: a = [[1, 2], [3, 4], [5, 6], [1, 2], [7, 8]]
a = np.array(a)
a, type(a)
Out [1]: (array([[1, 2],
[3, 4],
[5, 6],
[1, 2],
[7, 8]]),
numpy.ndarray)
I have tried to do this procedure:
In [2]: a = a[a != [1, 2])
a = np.reshape(a, (int(a.size/2), 2) # I have to do this since on the first line in In [2] change the dimension to 1 [3, 4, 5, 6, 7, 8] (the initial array is 2-dimensional array)
a
Out[2]: array([[3, 4],
[5, 6],
[7, 8]])
My question is, is there any function in NumPy that can directly do that?
Updated Question
Here's the semi-full source code that I've been working on:
from sklearn import datasets
data = datasets.load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['Target'] = pd.DataFrame(data.target)
bucket = df[df['Target'] == 0]
bucket = bucket.iloc[:,[0,1]].values
lp, rp = leftestRightest(bucket)
bucket = np.array([x for x in bucket if list(x) != lp])
bucket = np.array([x for x in bucket if list(x) != rp])
Notes:
leftestRightest(arg) is a function that returns 2 one-dimensional NumPy arrays of size 2 (which are lp and rp). For instances, lp = [1, 3], rp = [2, 4] and the parameter is 2-dimensional NumPy array
There should be a more delicate approach, but here what I have come up with:
np.array([x for x in a if list(x) != [1,2]])
Output
[[3, 4], [5, 6], [7, 8]]
Note that I wouldn't recommend working with list comprehensions in the large array since it would be highly time-consuming.
You're approach is correct, but the mask needs to be single-dimensional:
a[(a != [1, 2]).all(-1)]
Output:
array([[3, 4],
[5, 6],
[7, 8]])
Alternatively, you can collect the elements and infer the dimension with -1:
a[a != [1, 2]].reshape(-1, 2)
the boolean condition creates a 2D array of True/False. You have to apply and operation across the columns to make sure the match is not a partial match. Consider a row [5,2] in your above array, the script you wrote will add 5 and ignore 2 in the resultant 1D array. It can be done as follows:
a[np.all(a != [1, 2],axis=1)]
I have a 2d numpy array of integers, and now I want to change all the elements greater than 5 to 5.
For example,
[[2, 6],
[7, 3]]
to
[[2, 5],
[5, 3]]
Now, my current approach is to access all the elements using two for loops and then checking if each element is greater then 5, like this:
h, w = arr.shape[:2]
for x in range(h):
for y in range(w):
if arr[x,y] > 5:
arr[x,y] = 5
Is there any more pythonic approach for doing this?
Use np.clip() It can clip both on lower and upper value. We just pass None as lower value, to clip only on upper one.
>>> import numpy as np
>>> arr = np.array([[2, 6], [7, 3]])
>>> arr
array([[2, 6],
[7, 3]])
>>> np.clip(arr, None, 5)
array([[2, 5],
[5, 3]])
>>>
This question already has answers here:
Indexing one array by another in numpy
(4 answers)
Closed 3 years ago.
I have a list of lists of indices for an numpy array, but do not quite come to the wanted result when using them.
n = 3
a = np.array([[8, 1, 6],
[3, 5, 7],
[4, 9, 2]])
np.random.seed(7)
idx = np.random.choice(np.arange(n), size=(n, n-1))
# array([[0, 1],
# [2, 0],
# [1, 2]])
In this case I want:
element 0 and 1 of row 0
element 2 and 0 of row 1
element 1 and 2 of row 2
My list has n sublists and all of those lists have the same length.
I want that each sublist is only used once and not for all axis.
# Wanted result
# b = array[[8, 1],
# [7, 3],
# [9, 2]])
I can achieve this but it seems rather cumbersome with a lot of repeating and reshaping.
# Possibility 1
b = a[:, idx]
# array([[[8, 1], | [[3, 5], | [[4, 9],
# [6, 8], | [7, 3], | [2, 4],
# [1, 6]], | [5, 7]], | [9, 2]])
b = b[np.arange(n), np.arange(n), :]
# Possibility 2
b = a[np.repeat(range(n), n-1), idx.ravel()]
# array([8, 1, 7, 3, 9, 2])
b = b.reshape(n, n-1)
Are there easier ways?
You can use np.take_along_axis here:
np.take_along_axis(a, idx, 1)
array([[8, 1],
[7, 3],
[9, 2]])
Or using broadcasting:
a[np.arange(a.shape[0])[:,None], idx]
array([[8, 1],
[7, 3],
[9, 2]])
Note that your using integer array indexing here, you need to specify over which axis and rows you want to index using idx.
Suppose I have an array [1,2,3,4,5,6,7,8], and the array is composed of two samples [1,2,3,4], and [5,6,7,8]. For each sample, I want to do a slicing window with window size n. And if there are not enough elements, pad the result with the last elements. Each row in the return value should be the sliced window starting from the element in that row.
For example:
if n=3, then the result should be:
[[1,2,3],
[2,3,4],
[3,4,4],
[4,4,4],
[5,6,7],
[6,7,8],
[7,8,8],
[8,8,8]]
How can I achieve this with efficient slicing instead of a for loop? Thanks.
Similar approach of #hpaulj using some numpy built-in functionalities
import numpy as np
samples = [[1,2,3,4],[5,6,7,8]]
ws = 3 #window size
# add padding
samples = [s + [s[-1]]*(ws-1) for s in samples]
# rolling window function for arrays
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1]-window+1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
result = sum([rolling_window(np.array(s), ws).tolist() for s in samples ], [])
result
[[1, 2, 3],
[2, 3, 4],
[3, 4, 4],
[4, 4, 4],
[5, 6, 7],
[6, 7, 8],
[7, 8, 8],
[8, 8, 8]]
A python list approach:
In [201]: order = [1,3,2,3,5,8]
In [202]: samples = [[1,2,3,4],[5,6,7,8]]
expand samples to take care of the padding issue:
In [203]: samples = [row+([row[-1]]*n) for row in samples]
In [204]: samples
Out[204]: [[1, 2, 3, 4, 4, 4, 4], [5, 6, 7, 8, 8, 8, 8]]
define a function:
def foo(i, samples):
for row in samples:
try:
j = row.index(i)
except ValueError:
continue
return row[j:j+n]
In [207]: foo(3,samples)
Out[207]: [3, 4, 4]
In [208]: foo(9,samples) # non-found case isn't handled well
for all the order elements:
In [209]: [foo(i,samples) for i in order]
Out[209]: [[1, 2, 3], [3, 4, 4], [2, 3, 4], [3, 4, 4], [5, 6, 7], [8, 8, 8]]
I have a simple oneliner :
import numpy as np
samples = np.array([[1,2,3,4],[5,6,7,8]])
n,d = samples.shape
ws = 3
result = samples[:,np.minimum(np.arange(d)[:,None]+np.arange(ws)[None,:],d-1)]
The output is :
No loop, only broadcasting. This makes it probably the most efficient way of doing it. The dimension of the output is not exactly what you asked for, but it is easy to correct with a simple np.reshape
[[[1 2 3]
[2 3 4]
[3 4 4]
[4 4 4]]
[[5 6 7]
[6 7 8]
[7 8 8]
[8 8 8]]]
I have numpy 2d array having duplicate values.
I am searching the array like this.
In [104]: import numpy as np
In [105]: array = np.array
In [106]: a = array([[1, 2, 3],
...: [1, 2, 3],
...: [2, 5, 6],
...: [3, 8, 9],
...: [4, 8, 9],
...: [4, 2, 3],
...: [5, 2, 3])
In [107]: num_list = [1, 4, 5]
In [108]: for i in num_list :
...: print(a[np.where(a[:,0] == num_list)])
...:
[[1 2 3]
[1 2 3]]
[[4 8 9]
[4 2 3]]
[[5 2 3]]
The input is list having number similar to column 0 values.
The end result I want is the resulting rows in any format like array, list or tuple for example
array([[1, 2, 3],
[1, 2, 3],
[4, 8, 9],
[4, 2, 3],
[5, 2, 3]])
My code works fine but doesn't seem pythonic. Is there any better searching strategy with multiple values?
like a[np.where(a[:,0] == l)] where only one time lookup is done to get all the values.
my real array is large
Approach #1 : Using np.in1d -
a[np.in1d(a[:,0], num_list)]
Approach #2 : Using np.searchsorted -
num_arr = np.sort(num_list) # Sort num_list and get as array
# Get indices of occurrences of first column in num_list
idx = np.searchsorted(num_arr, a[:,0])
# Take care of out of bounds cases
idx[idx==len(num_arr)] = 0
out = a[a[:,0] == num_arr[idx]]
You can do
a[numpy.in1d(a[:, 0], num_list), :]