Searching numpy array for for pattern

Searching numpy array for for pattern - python

I'd like to find a value in a numpy array given a search pattern. For instance for the given array a, I want to retrieve a result of 1 when using the search pattern s because 1 is the element at index 0 of a[:,1] (=array([1, 0, 0, 1])) and the elements of a[1:,1] match s (i.e. (a[1:,1] == s).all() == True => return a[0,1]).
Another example would be s=[1, 0, 1] for which I would expect a search result of 2 (match at 4th column starting (1-based)). 2 would also be the search result for s=[2, 0, 0], etc.
>>> import numpy as np
>>> a = np.asarray([[0, 1, 2, 2, 2, 2, 2, 2], [0, 0, 1, 1, 2, 2, 3, 3], [0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 1, 0, 1, 0, 1]])
>>> a
array([[0, 1, 2, 2, 2, 2, 2, 2],
[0, 0, 1, 1, 2, 2, 3, 3],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 1, 0, 1, 0, 1]])
>>> s = np.asarray([0, 0, 1])
I came up with a[0, np.where((a[1:,:].transpose() == s).all(axis=-1))[0][0]], but thought there must be something more elegant...
Additionally, it would be great if I could do this operation with one call on multiple search patters, so that I retrieve the 0-element for which the values of index 1 to index 3 match.

Single search pattern
Here's one approach with help from broadcasting and slicing -
a[0,(a[1:] == s[:,None]).all(0)]
Multiple search patterns
For multiple search patterns (stored as 2D array), we just need to broadcast as before and look for ANY match at the end -
a[0,((a[1:] == s[...,None]).all(1)).any(0)]
Here's a sample run -
In [327]: a
Out[327]:
array([[0, 1, 2, 2, 2, 2, 2, 2],
[0, 0, 1, 1, 2, 2, 3, 3],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 1, 0, 1, 0, 1]])
In [328]: s
Out[328]:
array([[1, 0, 1],
[2, 0, 0]])
In [329]: a[0,((a[1:] == s[...,None]).all(1)).any(0)]
Out[329]: array([2, 2])

Related

How to create arrays with combinations between certain indexes of a fixed length and fixed sum

For example:
array = [4,3,2,0,0,0,0,0,0]
The 0th index should only have combinations with 3rd index and 6th index.
The 1st index should only have combinations with 4th index and 7th index.
The 2nd index should only have combinations with 5th index and 8th index.
(sum should stay the same between these indexes).
Then output should be:
[1,2,2,1,1,0,2,0,0]
[2,1,1,1,1,1,1,1,0]...
In both these combinations, sum between the respective indexes (listed above) remain the same.

Using the findPairs function resulting from the answer to your previous question:
from itertools import product
def findPairs(sum_value, len_value):
lst = range(sum_value + 1)
return [
pair
for pair in product(lst, repeat=len_value)
if sum(pair) == sum_value
]
import itertools
combinations = itertools.product(findPairs(array[0], 3), findPairs(array[1], 3), findPairs(array[2], 3))
result = [list(itertools.chain(*zip(p1, p2, p3))) for p1, p2, p3 in combinations]
print(result[0:10])
[[0, 0, 0, 0, 0, 0, 4, 3, 2], [0, 0, 0, 0, 0, 1, 4, 3, 1],
[0, 0, 0, 0, 0, 2, 4, 3, 0], [0, 0, 1, 0, 0, 0, 4, 3, 1],
[0, 0, 1, 0, 0, 1, 4, 3, 0], [0, 0, 2, 0, 0, 0, 4, 3, 0],
[0, 0, 0, 0, 1, 0, 4, 2, 2], [0, 0, 0, 0, 1, 1, 4, 2, 1],
[0, 0, 0, 0, 1, 2, 4, 2, 0], [0, 0, 1, 0, 1, 0, 4, 2, 1]]
...

Python, Numpy. Find values in 2d array and replace neighbors with 1

I have a 10x10 array with zeros and ones.
I would like to:
find the position of each cell with a value of 1.
replace all the neighbors with 1. neighbors= any cell to a n=1 distance (also diagonal).
Example:
array([[0, 1, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 1, 0, 0],
[1, 0, 0, 0, 0],
[0, 0, 0, 1, 1]])
output:
array([[1, 1, 1, 1, 0],
[1, 1, 1, 1, 0],
[1, 1, 1, 1, 0],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]])
I am trying finding indexes but It does not work:
a=np.where(a==1)+1
From other post I also try getting the neighbors with this function:
def n_closest(x,n,d=1):
return x[n[0]-d:n[0]+d+1,n[1]-d:n[1]+d+1]
But this does not work for the edges
Thanks

If you don't mind using scipy, a 2D convolution will solve the problem quickly:
import numpy as np
from scipy import signal
# Input array
X = np.array([[0, 1, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 1, 0, 0],
[1, 0, 0, 0, 0],
[0, 0, 0, 1, 1]])
# We apply a 2D convolution with a 3x3 kernel and we check which value are bigger than 0.
R = (signal.convolve2d(X,np.ones((3,3)),mode='same')>0).astype(int)
# R = array([[1, 1, 1, 0, 0],
# [1, 1, 1, 1, 0],
# [1, 1, 1, 1, 0],
# [1, 1, 1, 1, 1],
# [1, 1, 1, 1, 1]])
# Finally we extract the index
x,y = np.where(R)

Setting indicators based in index per row in numpy

I am looking for an efficient way to set a indicators from zero to a known number (which differs for each row).
e.g.
a =
array([[1, 1, 1, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 0]])
and I know the vector with the index when a goes from 1 to zero.
b = [3, 1, 6, 2, 8]
Rather than filling all the rows of a using a for-loop, I want to know if there is a fast way to set these indicators.

Use outer-comparison on ranged array vs. b -
In [16]: ncols = 9
In [17]: b
Out[17]: [3, 1, 6, 2, 8]
In [19]: np.greater.outer(b,np.arange(ncols)).view('i1')
Out[19]:
array([[1, 1, 1, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 0]], dtype=int8)
Other similar ways to express the same -
(np.asarray(b)[:,None] > np.arange(ncols)).view('i1')
(np.asarray(b)[:,None] > np.arange(ncols)).astype(int)
With b being an array, simplifies further, as we can skip the array conversion with np.asarray(b).

Simplest way I can think of is:
result=[]
for row in array:
result.append(row.tolist().index(0))
print(result)
[3, 1, 6, 2, 8]
The reason this works is, that list has a method called index, which tells the first occurrence of a specific item in the list. So I am iterating over this 2-dimentional array, converting each of it to list and using index of 0 on each.
You can store these values into another list and append to it for each row and that's it.

You can use broadcasting to do an outer comparison:
b = np.asarray([3, 1, 6, 2, 8])
a = (np.arange(b.max() + 1) < b[:, None]).astype(int)
# array([[1, 1, 1, 0, 0, 0, 0, 0, 0],
# [1, 0, 0, 0, 0, 0, 0, 0, 0],
# [1, 1, 1, 1, 1, 1, 0, 0, 0],
# [1, 1, 0, 0, 0, 0, 0, 0, 0],
# [1, 1, 1, 1, 1, 1, 1, 1, 0]])

scipy.ndimage.label: include error margin

After reading an interesting topic on scipy.ndimage.label (Variable area threshold for identifying objects - python), I'd like to include an 'error margin' in the labelling.
In the above linked discussion:
How can the blue dot on top be included, too (let's say it is wrongly disconnected from the orange, biggest, object)?
I found the structure attribute, which should be able to include that dot by changing the array (from np.ones(3,3,3) to anything more than that (I'd like it to be 3D). However, adjusting the 'structure' attribute to a larger array does not seem to work, unfortunately. It either gives an error of dimensions (RuntimeError: structure and input must have equal rank
) or it does not change anything..
Thanks!
this is the code:
labels, nshapes = ndimage.label(a, structure=np.ones((3,3,3)))
in which a is a 3D array.

Here's a possible approach that uses scipy.ndimage.binary_dilation. It is easier to see what is going on in a 2D example, but I'll show how to generalize to 3D at the end.
In [103]: a
Out[103]:
array([[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 1, 0, 0],
[1, 1, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 1, 1],
[1, 1, 1, 0, 0, 0, 0]])
In [104]: from scipy.ndimage import label, binary_dilation
Extend each "shape" by one pixel down and to the right:
In [105]: b = binary_dilation(a, structure=np.array([[0, 0, 0], [0, 1, 1], [0, 1, 1]])).astype(int)
In [106]: b
Out[106]:
array([[0, 0, 0, 1, 1, 0, 0],
[0, 0, 0, 1, 1, 0, 0],
[1, 1, 1, 0, 1, 1, 0],
[1, 1, 1, 0, 1, 1, 1],
[1, 1, 1, 0, 0, 1, 1],
[1, 1, 1, 1, 0, 1, 1]])
Apply label to the padded array:
In [107]: labels, numlabels = label(b)
In [108]: numlabels
Out[108]: 2
In [109]: labels
Out[109]:
array([[0, 0, 0, 1, 1, 0, 0],
[0, 0, 0, 1, 1, 0, 0],
[2, 2, 2, 0, 1, 1, 0],
[2, 2, 2, 0, 1, 1, 1],
[2, 2, 2, 0, 0, 1, 1],
[2, 2, 2, 2, 0, 1, 1]], dtype=int32)
By multiplying a by labels, we get the desired array of labels of a:
In [110]: alab = labels*a
In [111]: alab
Out[111]:
array([[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[2, 2, 0, 0, 1, 0, 0],
[2, 2, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 1, 1],
[2, 2, 2, 0, 0, 0, 0]])
(This assumes that the values in a are 0 or 1. If they are not, you can use alab = labels * (a > 0).)
For a 3D input, you have to change the structure argument to binary_dilation:
struct = np.zeros((3, 3, 3), dtype=int)
struct[1:, 1:, 1:] = 1
b = binary_dilation(a, structure=struct).astype(int)

Count how often integer y occurs right after integer x in a numpy array

I have a very large numpy.array of integers, where each integer is in the range [0, 31].
I would like to count, for every pair of integers (a, b) in the range [0, 31] (e.g. [0, 1], [7, 9], [18, 0]) how often b occurs right after a.
This would give me a (32, 32) matrix of counts.
I'm looking for an efficient way to do this with numpy. Raw python loops would be too slow.

Here's one way...
To make the example easier to read, I'll use a maximum value of 9 instead of 31:
In [178]: maxval = 9
Make a random input for the example:
In [179]: np.random.seed(123)
In [180]: x = np.random.randint(0, maxval+1, size=100)
Create the result, initially all 0:
In [181]: counts = np.zeros((maxval+1, maxval+1), dtype=int)
Now add 1 to each coordinate pair, using numpy.add.at to ensure that duplicates are counted properly:
In [182]: np.add.at(counts, (x[:-1], x[1:]), 1)
In [183]: counts
Out[183]:
array([[2, 1, 1, 0, 1, 0, 1, 1, 1, 1],
[2, 1, 1, 3, 0, 2, 1, 1, 1, 1],
[0, 2, 1, 1, 4, 0, 2, 0, 0, 0],
[1, 1, 1, 3, 3, 3, 0, 0, 1, 2],
[1, 1, 0, 1, 1, 0, 2, 2, 2, 0],
[1, 0, 0, 0, 0, 0, 1, 1, 0, 2],
[0, 4, 2, 3, 1, 0, 2, 1, 0, 1],
[0, 1, 1, 1, 0, 0, 2, 0, 0, 3],
[1, 2, 0, 1, 0, 0, 1, 0, 0, 0],
[2, 0, 2, 2, 0, 0, 2, 2, 0, 0]])
For example, the number of times 6 is followed by 1 is
In [188]: counts[6, 1]
Out[188]: 4
We can verify that with the following expression:
In [189]: ((x[:-1] == 6) & (x[1:] == 1)).sum()
Out[189]: 4

You can use numpy's built-in diff routine together with boolean arrays.
import numpy as np
test_array = np.array([1, 2, 3, 1, 2, 4, 5, 1, 2, 6, 7])
a, b = (1, 2)
sum(np.bitwise_and(test_array[:-1] == a, np.diff(test_array) == b - a))
# 3
If your array is multi-dimensional, you will need to flatten it first or make some small modifications to the code above.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Searching numpy array for for pattern - python

Related

How to create arrays with combinations between certain indexes of a fixed length and fixed sum

Python, Numpy. Find values in 2d array and replace neighbors with 1

Setting indicators based in index per row in numpy

scipy.ndimage.label: include error margin

Count how often integer y occurs right after integer x in a numpy array

Categories

Resources