I'm new with numpy, trying to understand how to search for 2d array in another 2d array. I don't need indexes, just True/False
For example I've an array with shape 10x10, all ones and somewhere it has 2x2 zeroes:
ar = np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 0, 1, 1, 0, 1, 1, 0, 1],
[1, 1, 0, 1, 1, 0, 1, 1, 0, 1],
[1, 1, 1, 1, 0, 0, 1, 1, 1, 1],
[1, 1, 1, 1, 0, 0, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 0, 1, 1, 1, 1],
[1, 1, 1, 1, 0, 0, 1, 1, 1, 1],
[1, 1, 0, 1, 1, 0, 1, 1, 1, 2],
[1, 1, 0, 1, 1, 0, 1, 1, 1, 1]]
)
and I have another array I want to find
ar2 = np.zeros((2,2))
I tried functions like isin and where, but they all search for any elements, not for entire shape of array.
Here's what I've come to - iterate over rows and cols, slice 2x2 array and compare it with zeroes array:
for r, c in np.ndindex(ar.shape):
if r-1>=0 and c-1>=0 and np.array_equal(ar[r - 1:r + 1, c - 1:c + 1], ar2):
print(f'found it {r}:{c}')
I'm not sure if this is the best solution, but at least it works. Maybe there is some easier and faster way to search for 2x2 zeroes?
I think using scikit image library can be one of the best ways to do so:
from skimage.util import view_as_windows
view_ = view_as_windows(ar, (2, 2))
res_temp = np.all((view_ == ar2[None, ...]), (-2, -1))
result = np.nonzero(res_temp)
# (array([4], dtype=int64), array([4], dtype=int64))
This will get indices. For same result as your code, indices must be added by one.
Based on this answer by Brenlla, I made this function which works with 2d arrays:
def find_array_in_array_2d(ar, ar2):
# Find all matches with first element of ar2
match_idx = np.nonzero(ar[:-ar2.shape[0]+1, :-ar2.shape[1]+1] == ar2[0, 0])
# Check remaining indices of ar2
for i, j in list(np.ndindex(ar2.shape))[1:]:
# End if no possible matches left
if len(match_idx[0]) == 0:
break
# Index into ar offset by i, j
nz2 = (match_idx[0] + i, match_idx[1] + j)
# Find remaining matches with selected element
to_keep = np.nonzero(ar[nz2] == ar2[i, j])[0]
match_idx = match_idx[0][to_keep], match_idx[1][to_keep]
return match_idx
print(find_array_in_array_2d(ar, ar2))
(array([4]), array([4]))
I think it will be faster than your method if ar is big and ar2 is small and especially when ar does not contain many values which are also in ar2.
Related
Let there be a numpy array of shape [M], dtype int32 and (random) values in range [0, N), e.g.:
M = 8
N = 5
a = np.random.randint(0, N, [M]) # a = [1, 1, 2, 4, 0, 1, 1, 3]
From this array I need to create a matrix m of shape [M, N], dtype int32 and values 0 or 1, where m[i,j] = 0 if j < a[i], otherwise 1. Following the example:
m = some_magic(a) # m = [[0, 1, 1, 1, 1],
# [0, 1, 1, 1, 1],
# [0, 0, 1, 1, 1],
# [0, 0, 0, 0, 1],
# [1, 1, 1, 1, 1],
# [0, 1, 1, 1, 1],
# [0, 1, 1, 1, 1],
# [0, 0, 0, 1, 1]]
My dysfunctional version of some_magic starts with initializing the matrix to zeros (using np.zeros), and then proceeding to set the appropriate members to 1.
m = np.zeros([M, N])
This next part though I cannot properly figure out. Accessing single members, for example, every second member, or a fixed slice, is easy, and achievable by
m[np.arange(M), C1:C2]
where C1 and C2 are integer constants,
m[np.arange(M), a:]
which, as far as I've thought, should yield the correct result, fails with the error being
Only integer scalar arrays can be converted to a scalar index.
Can you please point me to the right direction? Thank you very much.
Here's a solution using broadcasting.
(a[:, None] <= np.arange(N)).view('i1')
# np.less_equal.outer(a, np.arange(N)).view('i1')
array([[0, 1, 1, 1, 1],
[0, 1, 1, 1, 1],
[0, 0, 1, 1, 1],
[0, 0, 0, 0, 1],
[1, 1, 1, 1, 1],
[0, 1, 1, 1, 1],
[0, 1, 1, 1, 1],
[0, 0, 0, 1, 1]], dtype=int8)
I'm not sure if slicing like that is possible. I suggest you create indices instead and then operate on those:
M = 8
N = 5
#a = np.random.randint(0, N, [M])
a = np.array([1, 1, 2, 4, 0, 1, 1, 3])
from0toN = np.expand_dims(np.arange(N),0) # [[0,1,2,3,4]]
m = np.repeat(from0toN, M, axis=0)
#array([[0, 1, 2, 3, 4],
# ...,
# [0, 1, 2, 3, 4]])
boolean = m >= np.expand_dims(a,1)
onesAndZeroes = boolean.astype(int)
"""
array([[0, 1, 1, 1, 1],
[0, 1, 1, 1, 1],
[0, 0, 1, 1, 1],
[0, 0, 0, 0, 1],
[1, 1, 1, 1, 1],
[0, 1, 1, 1, 1],
[0, 1, 1, 1, 1],
[0, 0, 0, 1, 1]])
"""
I have a 2d numpy array as : arr= np.array([[2,5,10],[6,2,9]]). Now I want to convert this into 3d numpy array as I will place same number of 1's towards z-axis or the 3rd dimension, at that place replacing the element. For example, in place of 2, we will place two 1's and all other elements will be zero. So I place of two 1 and eight 0, since the matrix will be of size 2*3*10.
Is it possible? If yes, How can we achieve this?
you can try something like this :
arr3d= np.zeros((arr.shape[0] , arr.shape[1], max(map(max, arr))))
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
print(i,j,arr[i][j])
for k in range(arr[i][j]):
arr3d[i,j,k]=1
I know 3 loops :\
edited after suggestions from #hpaulj
This may be what you're asking...
Use numpy.reshape. This takes the array and reshapes it like so:
array = numpy.array([[1,4,1], [3, 1, 4]])
numpy.reshape(array, (array.shape[0], array.shape[1], 1).
array is now numpy.array([[[1,4,1], [3, 1, 4]]])
The 1 at the end is basically adding an extra dimension to the array. Shape just means the length of an X-Y-Z-whatever dimensional thing...
See the numpy docs for reshape at https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html.
Hope I helped!
Use broadcasting like so:
>>> x = np.array([[2,5,10],[6,2,9]])
>>>
>>> (x[..., None] > np.arange(10)).view('i1')
array([[[1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]],
[[1, 1, 1, 1, 1, 1, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 0]]], dtype=int8)
I have a very large numpy.array of integers, where each integer is in the range [0, 31].
I would like to count, for every pair of integers (a, b) in the range [0, 31] (e.g. [0, 1], [7, 9], [18, 0]) how often b occurs right after a.
This would give me a (32, 32) matrix of counts.
I'm looking for an efficient way to do this with numpy. Raw python loops would be too slow.
Here's one way...
To make the example easier to read, I'll use a maximum value of 9 instead of 31:
In [178]: maxval = 9
Make a random input for the example:
In [179]: np.random.seed(123)
In [180]: x = np.random.randint(0, maxval+1, size=100)
Create the result, initially all 0:
In [181]: counts = np.zeros((maxval+1, maxval+1), dtype=int)
Now add 1 to each coordinate pair, using numpy.add.at to ensure that duplicates are counted properly:
In [182]: np.add.at(counts, (x[:-1], x[1:]), 1)
In [183]: counts
Out[183]:
array([[2, 1, 1, 0, 1, 0, 1, 1, 1, 1],
[2, 1, 1, 3, 0, 2, 1, 1, 1, 1],
[0, 2, 1, 1, 4, 0, 2, 0, 0, 0],
[1, 1, 1, 3, 3, 3, 0, 0, 1, 2],
[1, 1, 0, 1, 1, 0, 2, 2, 2, 0],
[1, 0, 0, 0, 0, 0, 1, 1, 0, 2],
[0, 4, 2, 3, 1, 0, 2, 1, 0, 1],
[0, 1, 1, 1, 0, 0, 2, 0, 0, 3],
[1, 2, 0, 1, 0, 0, 1, 0, 0, 0],
[2, 0, 2, 2, 0, 0, 2, 2, 0, 0]])
For example, the number of times 6 is followed by 1 is
In [188]: counts[6, 1]
Out[188]: 4
We can verify that with the following expression:
In [189]: ((x[:-1] == 6) & (x[1:] == 1)).sum()
Out[189]: 4
You can use numpy's built-in diff routine together with boolean arrays.
import numpy as np
test_array = np.array([1, 2, 3, 1, 2, 4, 5, 1, 2, 6, 7])
a, b = (1, 2)
sum(np.bitwise_and(test_array[:-1] == a, np.diff(test_array) == b - a))
# 3
If your array is multi-dimensional, you will need to flatten it first or make some small modifications to the code above.
I have an MxN array. I want to zero out all the values after an element in a row is zero or less.
For example the 2x12 array
111110011111
112321341411
should turn into
111110000000
112321341411
Thanks!
It may not be the most efficient method, but I've used np.cumsum for these types of things.
>>> import numpy as np
>>> dat = np.array([[1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1],
[1, 1, 2, 3, 2, 1, 3, 4, 1, 4, 1, 1], ])
>>> dat[np.cumsum(dat <= 0, 1, dtype='bool')] = 0
>>> print(dat)
array([[1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 2, 3, 2, 1, 3, 4, 1, 4, 1, 1]])
#Jaime just pointed out that, np.logical_or.accumulate(dat <= 0, axis=1), is probably better than np.cumsum.
May be you or someone else need alternative solution without using numpy.
>>> dat = ['111110011111','112321341411','000000000000', '123456789120']
>>> def zero(dat):
result = []
for row in dat:
pos = row.find('0')
if pos > 0:
result.append(row[0:pos] + ('0' * (len(row) - pos)))
else:
result.append(row)
return result
>>> res = zero(dat)
>>> res
['111110000000', '112321341411', '000000000000', '123456789120']
>>> dat
['111110011111', '112321341411', '000000000000', '123456789120']
I have a matrix named xs:
array([[1, 1, 1, 1, 1, 0, 1, 0, 0, 2, 1],
[2, 1, 0, 0, 0, 1, 2, 1, 1, 2, 2]])
Now I want to replace the zeros by the nearest previous element in the same row (Assuming that the first column must be nonzero.).
The rough solution as following:
In [55]: row, col = xs.shape
In [56]: for r in xrange(row):
....: for c in xrange(col):
....: if xs[r, c] == 0:
....: xs[r, c] = xs[r, c-1]
....:
In [57]: xs
Out[57]:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1],
[2, 1, 1, 1, 1, 1, 2, 1, 1, 2, 2]])
Any help will be greatly appreciated.
If you can use pandas, replace will explicitly show the replacement in one instruction:
import pandas as pd
import numpy as np
a = np.array([[1, 1, 1, 1, 1, 0, 1, 0, 0, 2, 1],
[2, 1, 0, 0, 0, 1, 2, 1, 1, 2, 2]])
df = pd.DataFrame(a, dtype=np.float64)
df.replace(0, method='pad', axis=1)
My version, based on step-by-step rolling and masking of initial array, no additional libraries required (except numpy):
import numpy as np
a = np.array([[1, 1, 1, 1, 1, 0, 1, 0, 0, 2, 1],
[2, 1, 0, 0, 0, 1, 2, 1, 1, 2, 2]])
for i in xrange(a.shape[1]):
a[a == 0] = np.roll(a,i)[a == 0]
if not (a == 0).any(): # when all of zeros
break # are filled
print a
## [[1 1 1 1 1 1 1 1 1 2 1]
## [2 1 1 1 1 1 2 1 1 2 2]]
Without going crazy with complicated indexing tricks that figure out consecutive zeros, you could have a while loop that goes for as many iterations as consecutive zeros there are in your array:
zero_rows, zero_cols = np.where(xs == 0)
while zero_cols :
xs[zero_rows, zero_cols] = xs[zero_rows, zero_cols-1]
zero_rows, zero_cols = np.where(xs == 0)