Related
I'm new with numpy, trying to understand how to search for 2d array in another 2d array. I don't need indexes, just True/False
For example I've an array with shape 10x10, all ones and somewhere it has 2x2 zeroes:
ar = np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 0, 1, 1, 0, 1, 1, 0, 1],
[1, 1, 0, 1, 1, 0, 1, 1, 0, 1],
[1, 1, 1, 1, 0, 0, 1, 1, 1, 1],
[1, 1, 1, 1, 0, 0, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 0, 1, 1, 1, 1],
[1, 1, 1, 1, 0, 0, 1, 1, 1, 1],
[1, 1, 0, 1, 1, 0, 1, 1, 1, 2],
[1, 1, 0, 1, 1, 0, 1, 1, 1, 1]]
)
and I have another array I want to find
ar2 = np.zeros((2,2))
I tried functions like isin and where, but they all search for any elements, not for entire shape of array.
Here's what I've come to - iterate over rows and cols, slice 2x2 array and compare it with zeroes array:
for r, c in np.ndindex(ar.shape):
if r-1>=0 and c-1>=0 and np.array_equal(ar[r - 1:r + 1, c - 1:c + 1], ar2):
print(f'found it {r}:{c}')
I'm not sure if this is the best solution, but at least it works. Maybe there is some easier and faster way to search for 2x2 zeroes?
I think using scikit image library can be one of the best ways to do so:
from skimage.util import view_as_windows
view_ = view_as_windows(ar, (2, 2))
res_temp = np.all((view_ == ar2[None, ...]), (-2, -1))
result = np.nonzero(res_temp)
# (array([4], dtype=int64), array([4], dtype=int64))
This will get indices. For same result as your code, indices must be added by one.
Based on this answer by Brenlla, I made this function which works with 2d arrays:
def find_array_in_array_2d(ar, ar2):
# Find all matches with first element of ar2
match_idx = np.nonzero(ar[:-ar2.shape[0]+1, :-ar2.shape[1]+1] == ar2[0, 0])
# Check remaining indices of ar2
for i, j in list(np.ndindex(ar2.shape))[1:]:
# End if no possible matches left
if len(match_idx[0]) == 0:
break
# Index into ar offset by i, j
nz2 = (match_idx[0] + i, match_idx[1] + j)
# Find remaining matches with selected element
to_keep = np.nonzero(ar[nz2] == ar2[i, j])[0]
match_idx = match_idx[0][to_keep], match_idx[1][to_keep]
return match_idx
print(find_array_in_array_2d(ar, ar2))
(array([4]), array([4]))
I think it will be faster than your method if ar is big and ar2 is small and especially when ar does not contain many values which are also in ar2.
Let there be a numpy array of shape [M], dtype int32 and (random) values in range [0, N), e.g.:
M = 8
N = 5
a = np.random.randint(0, N, [M]) # a = [1, 1, 2, 4, 0, 1, 1, 3]
From this array I need to create a matrix m of shape [M, N], dtype int32 and values 0 or 1, where m[i,j] = 0 if j < a[i], otherwise 1. Following the example:
m = some_magic(a) # m = [[0, 1, 1, 1, 1],
# [0, 1, 1, 1, 1],
# [0, 0, 1, 1, 1],
# [0, 0, 0, 0, 1],
# [1, 1, 1, 1, 1],
# [0, 1, 1, 1, 1],
# [0, 1, 1, 1, 1],
# [0, 0, 0, 1, 1]]
My dysfunctional version of some_magic starts with initializing the matrix to zeros (using np.zeros), and then proceeding to set the appropriate members to 1.
m = np.zeros([M, N])
This next part though I cannot properly figure out. Accessing single members, for example, every second member, or a fixed slice, is easy, and achievable by
m[np.arange(M), C1:C2]
where C1 and C2 are integer constants,
m[np.arange(M), a:]
which, as far as I've thought, should yield the correct result, fails with the error being
Only integer scalar arrays can be converted to a scalar index.
Can you please point me to the right direction? Thank you very much.
Here's a solution using broadcasting.
(a[:, None] <= np.arange(N)).view('i1')
# np.less_equal.outer(a, np.arange(N)).view('i1')
array([[0, 1, 1, 1, 1],
[0, 1, 1, 1, 1],
[0, 0, 1, 1, 1],
[0, 0, 0, 0, 1],
[1, 1, 1, 1, 1],
[0, 1, 1, 1, 1],
[0, 1, 1, 1, 1],
[0, 0, 0, 1, 1]], dtype=int8)
I'm not sure if slicing like that is possible. I suggest you create indices instead and then operate on those:
M = 8
N = 5
#a = np.random.randint(0, N, [M])
a = np.array([1, 1, 2, 4, 0, 1, 1, 3])
from0toN = np.expand_dims(np.arange(N),0) # [[0,1,2,3,4]]
m = np.repeat(from0toN, M, axis=0)
#array([[0, 1, 2, 3, 4],
# ...,
# [0, 1, 2, 3, 4]])
boolean = m >= np.expand_dims(a,1)
onesAndZeroes = boolean.astype(int)
"""
array([[0, 1, 1, 1, 1],
[0, 1, 1, 1, 1],
[0, 0, 1, 1, 1],
[0, 0, 0, 0, 1],
[1, 1, 1, 1, 1],
[0, 1, 1, 1, 1],
[0, 1, 1, 1, 1],
[0, 0, 0, 1, 1]])
"""
I have a matrix with the cell values only 0 or 1.
I want to count how many ones or zeros are there in the same row or column to a given cell.
For example, the value matrix[r][c] is 1, so I want to know how many ones are there in the same row. This code does that:
count_in_row = 0
value = matrix[r][c]
for i in matrix[r]:
if i == value:
count_in_row += 1
The for cycle iterates through the same row and counts all ones (cells with the same value).
What if I want to do the same process with columns? Will I iterate through the whole matrix or it is possible through just one column?
PS: I don't want to use numpy, transpose or zip; better with composite cycle.
You have not specified what the datatype of your matrix is. If it is a list of lists, then there is no way to "get just one column", but the code still is similar (assuming that r and c are of type int):
I added the functionality to only count the cells adjacent to the cell in question (above, below, left and right; does NOT consider diagonals); this is done checking that the difference between indexes is not greater than 1.
count_in_row = 0
count_in_col = 0
value = matrix[r][c]
for j in range(len(matrix[r])):
if abs(j - c) <= 1: # only if it is adjacent
if matrix[r][j] == value:
count_in_row += 1
for i in range(len(matrix)):
if abs(i - r) <= 1: # only if it is adjacent
if matrix[i][c] == value:
count_in_col += 1
Or if following the way you started it (whole rows and columns, not only adjacent ones):
for col_val in matrix[r]:
if col_val == value:
count_in_row += 1
for row in matrix:
if row[c] == value:
count_in_col += 1
If you will be doind this for a lot of cells, then there are better ways to do that (even without numpy, but numpy is defenitively a very good option).
You can create a list for rows and cols and simply iterate over your matrix once while adding up the correct parts:
Create demodata:
import random
random.seed(42)
matrix = []
for n in range(10):
matrix.append(random.choices([0,1],k=10))
print(*matrix,sep="\n")
Output:
[1, 0, 0, 0, 1, 1, 1, 0, 0, 0]
[0, 1, 0, 0, 1, 1, 0, 1, 1, 0]
[1, 1, 0, 0, 1, 0, 0, 0, 1, 1]
[1, 1, 1, 1, 0, 1, 1, 1, 1, 1]
[1, 0, 0, 0, 0, 0, 0, 0, 1, 0]
[0, 0, 0, 1, 1, 1, 0, 1, 0, 0]
[1, 1, 1, 1, 1, 1, 0, 0, 0, 0]
[0, 1, 1, 0, 1, 0, 1, 0, 0, 0]
[1, 0, 1, 1, 0, 0, 1, 1, 0, 0]
[0, 1, 1, 0, 0, 0, 1, 1, 1, 1]
Count things:
rows = [] # empty list for rows - you can simply sum over each row
cols = [0]*len(matrix[0]) # list of 0 that you can increment while iterating your matrix
for row in matrix:
for c,col in enumerate(row): # enumerate gives you the (index,value) tuple
rows.append( sum(x for x in row) ) # simply sum over row
cols[c] += col # adds either 0 or 1 to the col-index
print("rows:",rows)
print("cols:",cols)
Output:
rows: [4, 5, 5, 9, 2, 4, 6, 4, 5, 6] # row 0 == 4, row 1 == 5, ...
cols: [6, 6, 5, 4, 6, 5, 5, 5, 5, 3] # same for cols
Less code but taking 2 full passes over your matrix using zip() to transpose the data:
rows = [sum(r) for r in matrix]
cols = [sum(c) for c in zip(*matrix)]
print("rows:",rows)
print("cols:",cols)
Output: (the same)
rows: [4, 5, 5, 9, 2, 4, 6, 4, 5, 6]
cols: [6, 6, 5, 4, 6, 5, 5, 5, 5, 3]
You would have to time it, but the overhead of two full iteration and the zipping might be still worth it, as the zip() way is inheritently more optimized then looping over a list. Tradeoff might only be worth it for / up to / up from certain matrix sizes ...
I will not solve that for you, but maybe hint in the right direction...
# assuming a list of lists of equal length
# without importing any modules
matrix = [
[1, 0, 0, 0],
[1, 1, 0, 0],
[1, 1, 1, 0],
[1, 1, 1, 1],
]
sum_rows = [sum(row) for row in matrix]
print(sum_rows) # [1, 2, 3, 4]
sum_columns = [sum(row[i] for row in matrix) for i in range(len(matrix[0]))]
print(sum_columns) # [4, 3, 2, 1]
This is a solution with just one for loop:
count_in_row = 0
count_in_column = 0
value = matrix[r][c]
for index, row in enumerate(matrix):
if index == r:
count_in_row = row.count(value)
if row[c] == value:
count_in_column += 1
print(count_in_row, count_in_column)
With numpy it's 1 command (each direction) and much faster
import numpy as np
A = np.array([[1, 0, 0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 1, 1, 0, 1, 1, 0],
[1, 1, 0, 0, 1, 0, 0, 0, 1, 1],
[1, 1, 1, 1, 0, 1, 1, 1, 1, 1],
[1, 0, 0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 1, 1, 0, 1, 0, 0],
[1, 1, 1, 1, 1, 1, 0, 0, 0, 0],
[0, 1, 1, 0, 1, 0, 1, 0, 0, 0],
[1, 0, 1, 1, 0, 0, 1, 1, 0, 0],
[0, 1, 1, 0, 0, 0, 1, 1, 1, 1]])
rowsum = A.sum(axis=1)
colsum = A.sum(axis=0)
print("A ="); print(A);print()
print("rowsum:",rowsum)
print("colsum:",colsum)
rowsum: [4 5 5 9 2 4 6 4 5 6]
colsum: [6 6 5 4 6 5 5 5 5 3]
I have a very large numpy.array of integers, where each integer is in the range [0, 31].
I would like to count, for every pair of integers (a, b) in the range [0, 31] (e.g. [0, 1], [7, 9], [18, 0]) how often b occurs right after a.
This would give me a (32, 32) matrix of counts.
I'm looking for an efficient way to do this with numpy. Raw python loops would be too slow.
Here's one way...
To make the example easier to read, I'll use a maximum value of 9 instead of 31:
In [178]: maxval = 9
Make a random input for the example:
In [179]: np.random.seed(123)
In [180]: x = np.random.randint(0, maxval+1, size=100)
Create the result, initially all 0:
In [181]: counts = np.zeros((maxval+1, maxval+1), dtype=int)
Now add 1 to each coordinate pair, using numpy.add.at to ensure that duplicates are counted properly:
In [182]: np.add.at(counts, (x[:-1], x[1:]), 1)
In [183]: counts
Out[183]:
array([[2, 1, 1, 0, 1, 0, 1, 1, 1, 1],
[2, 1, 1, 3, 0, 2, 1, 1, 1, 1],
[0, 2, 1, 1, 4, 0, 2, 0, 0, 0],
[1, 1, 1, 3, 3, 3, 0, 0, 1, 2],
[1, 1, 0, 1, 1, 0, 2, 2, 2, 0],
[1, 0, 0, 0, 0, 0, 1, 1, 0, 2],
[0, 4, 2, 3, 1, 0, 2, 1, 0, 1],
[0, 1, 1, 1, 0, 0, 2, 0, 0, 3],
[1, 2, 0, 1, 0, 0, 1, 0, 0, 0],
[2, 0, 2, 2, 0, 0, 2, 2, 0, 0]])
For example, the number of times 6 is followed by 1 is
In [188]: counts[6, 1]
Out[188]: 4
We can verify that with the following expression:
In [189]: ((x[:-1] == 6) & (x[1:] == 1)).sum()
Out[189]: 4
You can use numpy's built-in diff routine together with boolean arrays.
import numpy as np
test_array = np.array([1, 2, 3, 1, 2, 4, 5, 1, 2, 6, 7])
a, b = (1, 2)
sum(np.bitwise_and(test_array[:-1] == a, np.diff(test_array) == b - a))
# 3
If your array is multi-dimensional, you will need to flatten it first or make some small modifications to the code above.
I have an MxN array. I want to zero out all the values after an element in a row is zero or less.
For example the 2x12 array
111110011111
112321341411
should turn into
111110000000
112321341411
Thanks!
It may not be the most efficient method, but I've used np.cumsum for these types of things.
>>> import numpy as np
>>> dat = np.array([[1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1],
[1, 1, 2, 3, 2, 1, 3, 4, 1, 4, 1, 1], ])
>>> dat[np.cumsum(dat <= 0, 1, dtype='bool')] = 0
>>> print(dat)
array([[1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 2, 3, 2, 1, 3, 4, 1, 4, 1, 1]])
#Jaime just pointed out that, np.logical_or.accumulate(dat <= 0, axis=1), is probably better than np.cumsum.
May be you or someone else need alternative solution without using numpy.
>>> dat = ['111110011111','112321341411','000000000000', '123456789120']
>>> def zero(dat):
result = []
for row in dat:
pos = row.find('0')
if pos > 0:
result.append(row[0:pos] + ('0' * (len(row) - pos)))
else:
result.append(row)
return result
>>> res = zero(dat)
>>> res
['111110000000', '112321341411', '000000000000', '123456789120']
>>> dat
['111110011111', '112321341411', '000000000000', '123456789120']