numpy 2d boolean array count consecutive True sizes

numpy 2d boolean array count consecutive True sizes - python

I'm interested in finding out individual sizes of the 'True' patches in a boolean array. For instance in the boolean matrix:
[[1, 0, 0, 0],
[0, 1, 1, 0],
[0, 1, 0, 0],
[0, 1, 0, 0]]
The output would be:
[[1, 0, 0, 0],
[0, 4, 4, 0],
[0, 4, 0, 0],
[0, 4, 0, 0]]
I'm aware that I can do this recursively, but I'm also under the impression that python array operations are costly on large scale and is there an available library function for this?

Here's a quick and simple complete solution:
import numpy as np
import scipy.ndimage.measurements as mnts
A = np.array([
[1, 0, 0, 0],
[0, 1, 1, 0],
[0, 1, 0, 0],
[0, 1, 0, 0]
])
# labeled is a version of A with labeled clusters:
#
# [[1 0 0 0]
# [0 2 2 0]
# [0 2 0 0]
# [0 2 0 0]]
#
# clusters holds the number of different clusters: 2
labeled, clusters = mnts.label(A)
# sizes is an array of cluster sizes: [0, 1, 4]
sizes = mnts.sum(A, labeled, index=range(clusters + 1))
# mnts.sum always outputs a float array, so we'll convert sizes to int
sizes = sizes.astype(int)
# get an array with the same shape as labeled and the
# appropriate values from sizes by indexing one array
# with the other. See the `numpy` indexing docs for details
labeledBySize = sizes[labeled]
print(labeledBySize)
output:
[[1 0 0 0]
[0 4 4 0]
[0 4 0 0]
[0 4 0 0]]
The trickiest line above is the "fancy" numpy indexing:
labeledBySize = sizes[labeled]
in which one array is used to index the other. See the numpy indexing docs (section "Index arrays") for details on why this works.
I also wrote a version of the above code as a single compact function that you can try out yourself online. It includes a test case based on a random array.

Related

Python/NumPy: find the first index of zero, then replace all elements with zero after that for each row

I have an numpy array like this:
a = np.array([[1, 0, 1, 1, 1],
[1, 1, 1, 1, 0],
[1, 0, 0, 1, 1],
[1, 0, 1, 0, 1]])
Question 1:
As shown in the title, I want to replace all elements with zero after the first zero appeared. The result should be like this :
a = np.array([[1, 0, 0, 0, 0],
[1, 1, 1, 1, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0]])
Question 2: how to slice different columns for each row like this example?
As I am dealing with an array with large size. If any one could find an efficient way to solve this please. Thank you very much.

One way to accomplish question 1 is to use numpy.cumprod
>>> np.cumprod(a, axis=1)
array([[1, 0, 0, 0, 0],
[1, 1, 1, 1, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0]])

Question 1:
You could iterate over the array like so:
for i in range(a.shape[0]):
j = 0
row = a[i]
while row[j]>0:
j += 1
row[j+1:] = 0
This will change the array in-place. If you are interested in very high performance, the answers to this question could be of use to find the first zero faster. np.where scans the entire array for this and therefore is not optimal for the task.
Actually, the fastest solution will depend a bit on the distribution of your array entries: If there are many floats in there and rarely is there ever a zero, the while loops in the code above will interrupt late on average, requiring to write only "a few" zeros. If however there are only two possible entries like in your sample array and these occur with a similar probability (i.e. ~50%), there would be a lot of zeros to be written to a, and the following will be faster:
b = np.zeros(a.shape)
for i in range(a.shape[0]):
j = 0
a_row = a[i]
b_row = b[i]
while a_row[j]>0:
b_row[j] = a_row[j]
j += 1
Question 2:
If you mean to slice each row individually on a similar criterion dealing with a first occurence of some kind, you could simply adapt this iteration pattern. If the criterion is more global (like finding the maximum of the row, for example) built-in methods like np.where exist that will be more efficient, but it probably would depend a bit on the criterion itself which choice is best.

Question 1: An efficient way to do this would be the following.
import numpy as np
a = np.array([[1, 0, 1, 1, 1],
[1, 1, 1, 1, 0],
[1, 0, 0, 1, 1],
[1, 0, 1, 0, 1]])
for row in a:
zeros = np.where(row == 0)[0]
if (len(zeros)):# Check if zero exists
row[zeros[0]:] = 0
print(a)
Output:
[[1 0 0 0 0]
[1 1 1 1 0]
[1 0 0 0 0]
[1 0 0 0 0]]
Question 2: Using the same array, for each row rowIdx, you can have a array of columns colIdxs that you want to extract from.
rowIdx = 2
colIdxs = [1, 3, 4]
print(a[rowIdx, colIdxs])
Output:
[0 1 1]

I prefer Ayrat's creative answer for the first question, but if you need to slice different columns for different rows in large size, this could help you:
indexer = tuple(np.s_[i:a.shape[1]] for i in (a==0).argmax(axis=1))
for i,j in enumerate(indexer):
a[i,j]=0
indexer:
(slice(1, 5, None), slice(4, 5, None), slice(1, 5, None), slice(1, 5, None))
or:
indexer = (a==0).argmax(axis=1)
for i in range(a.shape[0]):
a[i,indexer[i]:]=0
indexer:
[1 4 1 1]
output:
[[1 0 0 0 0]
[1 1 1 1 0]
[1 0 0 0 0]
[1 0 0 0 0]]

Speed up numpy integer-array indexing for depth

Suppose I have an array
[[0 2 1]
[1 0 1]
[2 1 1]]
and I want to convert it into a tensor of the form
[[[1 0 0]
[0 1 0]
[0 0 0]]
[[0 0 1]
[1 0 1]
[0 1 1]]
[[0 1 0]
[0 0 0]
[1 0 0]]]
Where each depth layer (index i) is a binary mask showing where i appears in the input.
I have written code for this which works correctly but is too slow for any use. Can I replace the loop in this function with another vectorized operation?
def im2segmap(im, depth):
tensor = np.zeros((im.shape[0], im.shape[1], num_classes))
for c in range(depth):
rows, cols = np.argwhere(im==c).T
tensor[c, rows, cols] = 1
return tensor

Use broadcasting -
(a==np.arange(num_classes)[:,None,None]).astype(int)
Or with builtin outer comparison -
(np.equal.outer(range(num_classes),a)).astype(int)
Use uint8 if you have to use an int dtype or keep as boolean by skipping the int conversion altogether for further boost.
Sample run -
In [42]: a = np.array([[0,2,1],[1,0,1],[2,1,1]])
In [43]: num_classes = 3 # or depth
In [44]: (a==np.arange(num_classes)[:,None,None]).astype(int)
Out[44]:
array([[[1, 0, 0],
[0, 1, 0],
[0, 0, 0]],
[[0, 0, 1],
[1, 0, 1],
[0, 1, 1]],
[[0, 1, 0],
[0, 0, 0],
[1, 0, 0]]])
To have the depth/num_classes as the third dim, extend the input array and then compare against the range array -
(a[...,None]==np.arange(num_classes)).astype(int)
(np.equal.outer(im, range(num_classes))).astype(int)
(np.equal.outer(im, range(num_classes))).astype(np.uint8) # lower prec

Python Numpy. Manipulating with 2 matrices

I have 2 CSV files with the same size. Values are 1s and 0s.
I need to loop over 2 files (matrices) and create a new matrix using the following logic:
if matrix A value = 1 and matrix B value = 1
then
result value is 0,
if 1 and 0
then
0,
if 0 and 0
then
0.
A = [
[1, 0, 1],
[1, 1, 1]
]
B = [
[1, 0, 0],
[1, 0, 0]
]
=>
C = [
[0, 0, 1],
[0, 1, 1]
]
I know that Numpy is used to loop and manipulate with matrices and arrays, but I stuck to find how to do it in a proper way.

Here is one way to get your desired output, but I think the logic you described was not quite what you meant. This outputs an array of 1 where your matrices are different from one another, and 0 where they are alike.
A = np.array([
[1, 0, 1],
[1, 1, 1]
])
B = np.array([
[1, 0, 0],
[1, 0, 0]])
C = (A != B).astype('int')
array([[0, 0, 1],
[0, 1, 1]])

numpy 3D array shape

I created a numpy array of shape (4,3,2); I expected following code can print out a array shaped 4 X 3 or 3 X 4
import numpy as np
x = np.zeros((4,3,2), np.int32)
print(x[:][:][0])
However, I got
[[0 0]
[0 0]
[0 0]]
Looks like a 2 X 3? I am really confused on numpy matrix now. Shouldn't I get
[[0 0 0]
[0 0 0]
[0 0 0]
[0 0 0]]
in stead? How to do map a 3D image into a numpy 3D matrix?
For example, in MATLAB, the shape (m, n, k) means (row, col, slice) in a context of an (2D/3D) image.
Thanks a lot

x[:] slices all elements along the first dimension, so x[:] gives the same result as x and x[:][:][0] is thus equivalent to x[0].
To select an element on the last dimension, you can do:
x[..., 0]
#array([[0, 0, 0],
# [0, 0, 0],
# [0, 0, 0],
# [0, 0, 0]], dtype=int32)
or:
x[:,:,0]
#array([[0, 0, 0],
# [0, 0, 0],
# [0, 0, 0],
# [0, 0, 0]], dtype=int32)

You need to specify all slices at the same time in a tuple, like so:
x[:, :, 0]
If you do x[:][:][0] you are actually indexing the first dimension three times. The first two create a view for the entire array and the third creates a view for the index 0 of the first dimension

How to convert list of tuple to an array

new = zero(rows_A,cols_B)
for i in range(rows_A):
for j in range(cols_B):
new[i][j] += np.sum(A[i] * B[:,j])
If I'm using this form of array [[0, 0, 0], [0, 1, 0], [0, 2, 1]] in B
it is giving me an error
TypeError: list indices must be integers, not tuple
but if I'm using same array B, in place of A, it's working well.
I am getting this type of return array
[[0, 0, 0], [0, 1, 0], [0, 2, 1]]
so i want to convert it into this form
[[0 0 0]
[0 1 0]
[0 2 1]]

numpy.asarray will do that.
import numpy as np
B = np.asarray([[0, 0, 0], [0, 1, 0], [0, 2, 1]])
This produces
array([[0, 0, 0],
[0, 1, 0],
[0, 2, 1]])
which can be indexed with [:, j].
Also, it looks like you're trying to do a matrix product. You can do the same thing with just one line of code using np.dot:
new = np.dot(A, B)

It appears that B is a list. You can't index it as B[:,i] -- Which is implcitly passed to __getitem__ as (slice(None,None,None),i) -- i.e. a tuple.
You could convert B to a numpy array first (B = np.array(B)) and then go from there ...

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy 2d boolean array count consecutive True sizes - python

Related

Python/NumPy: find the first index of zero, then replace all elements with zero after that for each row

Speed up numpy integer-array indexing for depth

Python Numpy. Manipulating with 2 matrices

numpy 3D array shape

How to convert list of tuple to an array

Categories

Resources