Minimum bounding rectangle of an array - python

The problem is the following: a matrix with only 0s and 1s is provided (example below), I need to be able to identify (and extract eventually) the minimum bounding rectangle to the 1s.
e.g.
0 0 0 0 0 0
0 [0 0 1 0] 0
0 [0 1 1 0] 0
0 [1 0 0 0] 0
0 [0 0 0 1] 0
0 0 0 0 0 0
I am not able to come up with a good solution. Thanks for your help!

m = [
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0],
]
min_col = max_col = min_row = max_row = None
for i, row in enumerate(m):
for j, col in enumerate(row):
if col:
if min_row is None:
min_row = i
if min_col is None or min_col > j:
min_col = j
if max_row is None or max_row < i:
max_row = i
if max_col is None or max_col < j:
max_col = j
print('starting row = %s' % min_row)
print('starting column = %s' % min_col)
print('ending row = %s' % max_row)
print('ending column = %s' % max_col)
This outputs:
starting row = 1
starting column = 1
ending row = 4
ending column = 4

Main idea is:
Start with a candidate box being the whole array.
While the first or the last row or the first or the last column of the candidate box only contains zeroes, shrink the box by that row or column.
If you can't shrink any more following that rule, you have the bounding box (maybe 0 by 0 elements, if there was no 1 in the array).

arr = [[0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0], [0, 0, 1, 1, 0, 0], [0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0]]
# Find all positions where arr value is 1
pos_list = [(i,j) for i,r in enumerate(arr) for j,e in enumerate(r) if e]
# [(1, 3), (2, 2), (2, 3), (3, 1), (4, 4)]
# Get min of all (x, y) values as start_pos and max of all (x, y) values as end_pos
x_pos, y_pos = zip(*[(i,j) for i,r in enumerate(arr) for j,e in enumerate(r) if e])
start_pos, end_pos = (min(x_pos), min(y_pos)), (max(x_pos), max(y_pos))
print (start_pos, end_pos)
# (1, 1), (4, 4)
Eliminate all blank rows, transpose array (by zipping it), again eliminate all rows, transpose it back again, to get the smallest sub array for original array that fits all 1's
sub_arr = list(zip(*[r for r in zip(*[r for r in arr if any(r)]) if any(r)]))
print (sub_arr)
# [(0, 0, 1, 0), (0, 1, 1, 0), (1, 0, 0, 0), (0, 0, 0, 1)]

Related

Set consecutive equal numbers in an array equal to zero

I have an array like
a = np.array( [ 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1] )
and am looking for a way to set consecutive equal elements to zero:
a_desired = np.array( [ 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1] )
I've had a pretty unsuccessful time of it so far, I've tried something as simple as
for i in range(len(a)-1):
if a[i+1] == a[i]:
a[i+1] = 0
with output [1 0 1 0 0 0 0 1 0 1 0 0 1], as well as adding more conditions, like
for i in range(len(a)-1):
if a[i+1] == a[i]:
a[i+1] = 0
if a[i+1] != a[i] and a[i] == 0 and a[i+1] != a[i]:
a[i+1] = 0
which has output [1 0 0 0 0 0 0 0 0 0 0 0 0], but I can't seem to be able to successfully capture all the conditions required to make this work.
Some help would be appreciated!
I would do it following way:
import numpy as np
a = np.array([1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1])
a[1:][a[:-1]==a[1:]] = 0
print(a)
output:
[1 0 0 0 0 0 0 1 0 0 0 0 1]
I compare a without last element with a without first element, thus I do pair-wise comparison between what might be called previous element and current element, which result in array of Trues and Falses which is 1 shorther then a, then I use it as mask to set 0 where is True. Note that I only modify part of a after first element, as first will never change.
Try numpy xor
np.insert((np.logical_xor(a[:-1], a[1:]) * a[1:]), 0, a[0])
array([1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1])
Try:
import numpy as np
a = np.array([1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1])
a_desired = np.zeros(a.shape)
for i in range(len(a)-1, -1, -1):
if a[i] == a[i-1] and i != 0:
a_desired[i] = 0
else:
a_desired[i] = a[i]
print(a_desired)
Output:
[1. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1.]
How about:
value_detected = 0
for i in range(len(a)):
if value_detected:
if a[i] == value_detected:
a[i] = 0
else:
value_detected = a[i]
else:
if a[i]:
value_detected = a[i]
print(a)
For original input, the output:
[1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1]
Further test, if input is:
a = [ 1, 1, 2, 2, 3, 3, 3, 1, 1, 1, 1, 0, 1]
Then output is:
[1, 0, 2, 0, 3, 0, 0, 1, 0, 0, 0, 0, 1]
From me, first i make copy of original array and then make new desired array like this:
new_a = a.copy()
for i in range(1, len(a)):
if a[i] == a[i-1]: new_a[i] = 0
print(new_a)
Create a list with one element which would be the first element of input list.
Now, just iterate through your list starting from 2nd element and check if it is equal to the previous value.
If yes append 0 else, append the value.
input_arr = [ 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1]
output_arr = [input_arr[0]]
for i in range(1, len(input_arr)):
if input_arr[i]==input_arr[i-1]:
output_arr.append(0)
else:
output_arr.append(input_arr[i])
print (output_arr)

Calling a function within function does not work as expected?

I'm designing a maze generator in python and have various functions for different steps of the process. (I know the code can most definitely be improved but I'm just looking for an answer to my problem first before I work on optimizing it)
the first function generates a base maze in the form of a 2D list and works as expected:
def base_maze(dimension):
num_rows = int((2 * dimension[1]) + 1) #number of rows / columns
num_columns = int((2 * dimension[0]) + 1) #from tuple input
zero_row = [] #initialise a row of 0s
for i in range(num_columns):
zero_row.append(0)
norm_row = [] #initialise a row of
for i in range(num_columns // 2): #alternating 0s and 1s
norm_row.extend([0,1])
norm_row.append(0)
maze = [] #initialise maze
#(combination of zero rows
for i in range(num_rows // 2): # and normal rows)
maze.append(zero_row)
maze.append(norm_row)
maze.append(zero_row)
return maze
Another function gets the neighbors of the selected cell, and also works as expected:
def get_neighbours(cell, dimension):
y = cell[0] #set x/y values
max_y = dimension[0] - 1 #for reference
x = cell[1]
max_x = dimension[1] - 1
n = (x, y-1) #calculate adjacent
e = (x+1, y) #coordinates
s = (x, y+1)
w = (x-1, y)
if y > max_y or y < 0 or x > max_x or x < 0: #check if x/y
raise IndexError("Cell is out of maze bounds") #in bounds
neighbours = []
if y > 0: #add cells to list
neighbours.append(n) #if they're valid
if x < max_x: #cells inside maze
neighbours.append(e)
if y < max_y:
neighbours.append(s)
if x > 0:
neighbours.append(w)
return neighbours
the next function removes the wall between two given cells:
def remove_wall(maze, cellA, cellB):
dimension = []
x_dim = int(((len(maze[0]) - 1) / 2)) #calc the dimensions
y_dim = int(((len(maze) - 1) / 2)) #of maze matrix (x,y)
dimension.append(x_dim)
dimension.append(y_dim)
A_loc = maze[2*cellA[1]-1][2*cellA[0]-1]
B_loc = maze[2*cellB[1]-1][2*cellB[0]-1]
if cellB in get_neighbours(cellA, dimension): #if cell B is a neighbour
if cellA[0] == cellB[0] and cellA[1] < cellB[1]: #if the x pos of A is equal
adj_wall = maze[(2*cellA[0]+1)][2*cellA[1]+1+1] = 1 #to x pos of cell B and the y pos
#of A is less than B (A is below B)
elif cellA[0] == cellB[0] and cellA[1] > cellB[1]: #the adjacent wall is set to 1 (removed)
adj_wall = maze[(2*cellA[0]+1)][2*cellA[1]+1-1] = 1
#same is done for all other directions
if cellA[1] == cellB[1] and cellA[0] < cellB[0]:
adj_wall = maze[(2*cellA[0]+1)+1][(2*cellA[1]+1)] = 1
elif cellA[1] == cellB[1] and cellA[0] > cellB[0]:
adj_wall = maze[(2*cellA[0]+1-1)][(2*cellA[1]+1)] = 1
return maze
yet when I try to put these functions together into one final function to build the maze, they do not work as they work on their own, for example:
def test():
maze1 = base_maze([3,3])
maze2 = [[0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 1, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 1, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 1, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0]]
if maze1 == maze2:
print("they are exactly the same")
else:
print("WHY ARE THEY DIFFERENT???")
remove_wall(maze1,(0,0),(0,1))
remove_wall(maze2,(0,0),(0,1))
these will produce different results despite the input being exactly the same?:
test()
they are exactly the same
[[0, 0, 0, 0, 0, 0, 0], [0, 1, 1, 1, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0], [0, 1, 1, 1, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0], [0, 1, 1, 1, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0]]
[[0, 0, 0, 0, 0, 0, 0], [0, 1, 1, 1, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 1, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 1, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0]]
The problem is in your base_maze function, where you first create two types of row:
zero_row = [] #initialise a row of 0s
for i in range(num_columns):
zero_row.append(0)
norm_row = [] #initialise a row of
for i in range(num_columns // 2): #alternating 0s and 1s
norm_row.extend([0,1])
norm_row.append(0)
This is fine so far and works as expected, however when you build the maze from there
for i in range(num_rows // 2): # and normal rows)
maze.append(zero_row)
maze.append(norm_row)
maze.append(zero_row)
You are filling up the maze list with multiple instances of the same list. This means if you modify row 0 of the maze, row 2 & 4 will also be affected. To illustrate:
>>> def print_maze(maze):
... print('\n'.join(' '.join(str(x) for x in row) for row in maze))
...
>>> print_maze(maze)
0 0 0 0 0
0 1 0 1 0
0 0 0 0 0
0 1 0 1 0
0 0 0 0 0
>>> maze[0][0] = 3
>>> print_maze(maze)
3 0 0 0 0
0 1 0 1 0
3 0 0 0 0
0 1 0 1 0
3 0 0 0 0
Note that rows 0, 2, & 4 have all changed. This is because maze[0] is the same zero_row instance as maze[2] and maze[4].
Instead, when you create the maze you want to use a copy of the row lists. This can be done easily in Python using the following slicing notation
for i in range(num_rows // 2):
maze.append(zero_row[:]) # note the [:] syntax for copying a list
maze.append(norm_row[:])
maze.append(zero_row[:])

In 2D binary matrix find the number of islands

I am trying to count the number of islands (a group of connected 1s forms an island) in a 2D binary matrix.
Example:
[
[1, 1, 0, 0, 0],
[0, 1, 0, 0, 1],
[1, 0, 0, 1, 1],
[0, 0, 0, 0, 0],
[1, 0, 1, 0, 1]
]
In the above matrix there are 5 islands, which are:
First: (0,0), (0,1), (1,1), (2,0)
Second: (1,4), (2,3), (2,4)
Third: (4,0)
Fourth: (4,2)
Fifth: (4,4)
To count the number of island in the 2D matrix, I am assuming the matrix as a Graph and then I am using DFS kind of algorithm to count the islands.
I am keeping track for the number of DFS (a recursive function) calls, because that many components would be there in the Graph.
Below is the code I wrote for this purpose:
# count the 1's in the island
def count_houses(mat, visited, i, j):
# base case
if i < 0 or i >= len(mat) or j < 0 or j >= len(mat[0]) or\
visited[i][j] is True or mat[i][j] == 0:
return 0
# marking visited at i, j
visited[i][j] = True
# cnt is initialized to 1 coz 1 is found
cnt = 1
# now go in all possible directions (i.e. form 8 branches)
# starting from the left upper corner of i,j and going down to right bottom
# corner of i,j
for r in xrange(i-1, i+2, 1):
for c in xrange(j-1, j+2, 1):
# print 'r:', r
# print 'c:', c
# don't call for i, j
if r != i and c != j:
cnt += count_houses(mat, visited, r, c)
return cnt
def island_count(mat):
houses = list()
clusters = 0
row = len(mat)
col = len(mat[0])
# initialize the visited matrix
visited = [[False for i in xrange(col)] for j in xrange(row)]
# run over matrix, search for 1 and then do dfs when found 1
for i in xrange(row):
for j in xrange(col):
# see if value at i, j is 1 in mat and val at i, j is False in
# visited
if mat[i][j] == 1 and visited[i][j] is False:
clusters += 1
h = count_houses(mat, visited, i, j)
houses.append(h)
print 'clusters:', clusters
return houses
if __name__ == '__main__':
mat = [
[1, 1, 0, 0, 0],
[0, 1, 0, 0, 1],
[1, 0, 0, 1, 1],
[0, 0, 0, 0, 0],
[1, 0, 1, 0, 1]
]
houses = island_count(mat)
print houses
# print 'maximum houses:', max(houses)
I get a wrong output for the matrix I have passed in argument. I get 7 but there are 5 clusters.
I tried debugging the code for any logical errors. But I couldn't find out where is the problem.
big hammer approach, for reference
had to add structure argument np.ones((3,3)) to add diagonal connectivity
import numpy as np
from scipy import ndimage
ary = np.array([
[1, 1, 0, 0, 0],
[0, 1, 0, 0, 1],
[1, 0, 0, 1, 1],
[0, 0, 0, 0, 0],
[1, 0, 1, 0, 1]
])
labeled_array, num_features = ndimage.label(ary, np.ones((3,3)))
labeled_array, num_features
Out[183]:
(array([[1, 1, 0, 0, 0],
[0, 1, 0, 0, 2],
[1, 0, 0, 2, 2],
[0, 0, 0, 0, 0],
[3, 0, 4, 0, 5]]), 5)
Your algorithm is almost correct except for the line 21:
if r != i and c != j:
cnt += count_houses(mat, visited, r, c)
Instead you want to use or as you want to continue counting provided at least one of the coordinate is not the same as your center.
if r != i or c != j:
cnt += count_houses(mat, visited, r, c)
An alternate and more intuitive way to write this would be the following
if (r, c) != (i, j):
cnt += count_houses(mat, visited, r, c)

Turning an np array into a jagged np array based on boolean conditions

Say I have an array x equal to np.array(0 0 0 0 0 0 0 0 1000 0 0 0 0 1000 1000 1000)
and I want to turn it into a matrix array([array([0 0 0 0 0 0 0 0]), array([1000]), array([0 0 0 0]), array([1000 1000 1000])]). How would I do that?
The boolean conditions would be, if it's a string of0's, segment it so that it's one array inside the matrix. If it's a string of 1000's segment it the same way.
Here's one approach with np.split -
m = x!=0
out = np.split(x,np.flatnonzero(m[1:] != m[:-1])+1)
Sample run -
In [53]: x = np.array([0, 0, 0, 0, 0, 0, 0, 0, 1000, 0, 0, 0, 0, 1000, 1000, 1000])
In [54]: m = x!=0
In [55]: np.split(x,np.flatnonzero(m[1:] != m[:-1])+1)
Out[55]:
[array([0, 0, 0, 0, 0, 0, 0, 0]),
array([1000]),
array([0, 0, 0, 0]),
array([1000, 1000, 1000])]
To have an array of arrays as output, wrap it with np.array() -
In [56]: np.array(np.split(x,np.flatnonzero(m[1:] != m[:-1])+1))
Out[56]:
array([array([0, 0, 0, 0, 0, 0, 0, 0]), array([1000]), array([0, 0, 0, 0]),
array([1000, 1000, 1000])], dtype=object)
For a list of lists as output, here's one way -
m = x!=0
idx = np.r_[0, np.flatnonzero(m[1:] != m[:-1])+1, x.size]
out = [x.tolist()[idx[i]:idx[i+1]] for i in range(len(idx)-1)]
Sample run -
In [94]: m = x!=0
In [95]: idx = np.r_[0, np.flatnonzero(m[1:] != m[:-1])+1, x.size]
In [96]: [x.tolist()[idx[i]:idx[i+1]] for i in range(len(idx)-1)]
Out[96]: [[0, 0, 0, 0, 0, 0, 0, 0], [1000], [0, 0, 0, 0], [1000, 1000, 1000]]
Try this in python:
a = [0, 0, 0, 0, 0, 0, 0, 0, 1000, 0, 0, 0, 0, 1000, 1000, 1000]
out = []
flag = 0
tmp = []
for i in a:
if not flag and i==0:
tmp.append(i)
elif not flag and i==1000:
out.append(tmp)
tmp=[i]
flag=1
elif flag and i==1000:
tmp.append(i)
else:
out.append(tmp)
tmp=[i]
flag=0
out.append(tmp)
print out

2-D Matrix: Finding and deleting columns that are subsets of other columns

I have a problem where I want to identify and remove columns in a logic matrix that are subsets of other columns. i.e. [1, 0, 1] is a subset of [1, 1, 1]; but neither of [1, 1, 0] and [0, 1, 1] are subsets of each other. I wrote out a quick piece of code that identifies the columns that are subsets, which does (n^2-n)/2 checks using a couple nested for loops.
import numpy as np
A = np.array([[1, 0, 0, 0, 0, 1],
[0, 1, 1, 1, 1, 0],
[1, 0, 1, 0, 1, 1],
[1, 1, 0, 1, 0, 1],
[1, 1, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0],
[0, 0, 1, 0, 1, 0]])
rows,cols = A.shape
columns = [True]*cols
for i in range(cols):
for j in range(i+1,cols):
diff = A[:,i]-A[:,j]
if all(diff >= 0):
print "%d is a subset of %d" % (j, i)
columns[j] = False
elif all(diff <= 0):
print "%d is a subset of %d" % (i, j)
columns[i] = False
B = A[:,columns]
The solution should be
>>> print B
[[1 0 0]
[0 1 1]
[1 1 0]
[1 0 1]
[1 0 1]
[1 0 0]
[0 1 1]
[0 1 0]]
For massive matrices though, I'm sure there's a way that I could do this faster. One thought is to eliminate subset columns as I go so I'm not checking columns already known to be a subset. Another thought is to vectorize this so don't have O(n^2) operations. Thank you.
Since the A matrices I'm actually dealing with are 5000x5000 and sparse with about 4% density, I decided to try a sparse matrix approach combined with Python's "set" objects. Overall it's much faster than my original solution, but I feel like my process of going from matrix A to list of sets D is not as fast it could be. Any ideas on how to do this better are appreciated.
Solution
import numpy as np
A = np.array([[1, 0, 0, 0, 0, 1],
[0, 1, 1, 1, 1, 0],
[1, 0, 1, 0, 1, 1],
[1, 1, 0, 1, 0, 1],
[1, 1, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0],
[0, 0, 1, 0, 1, 0]])
rows,cols = A.shape
drops = np.zeros(cols).astype(bool)
# sparse nonzero elements
C = np.nonzero(A)
# create a list of sets containing the indices of non-zero elements of each column
D = [set() for j in range(cols)]
for i in range(len(C[0])):
D[C[1][i]].add(C[0][i])
# find subsets, ignoring columns that are known to already be subsets
for i in range(cols):
if drops[i]==True:
continue
col1 = D[i]
for j in range(i+1,cols):
col2 = D[j]
if col2.issubset(col1):
# I tried `if drops[j]==True: continue` here, but that was slower
print "%d is a subset of %d" % (j, i)
drops[j] = True
elif col1.issubset(col2):
print "%d is a subset of %d" % (i, j)
drops[i] = True
break
B = A[:, ~drops]
print B
Here's another approach using NumPy broadcasting -
A[:,~((np.triu(((A[:,:,None] - A[:,None,:])>=0).all(0),1)).any(0))]
A detailed commented explanation is listed below -
# Perform elementwise subtractions keeping the alignment along the columns
sub = A[:,:,None] - A[:,None,:]
# Look for >=0 subtractions as they indicate non-subset criteria
mask3D = sub>=0
# Check if all elements along each column satisfy that criteria giving us a 2D
# mask which represent the relationship between all columns against each other
# for the non subset criteria
mask2D = mask3D.all(0)
# Finally get the valid column mask by checking for all columns in the 2D mas
# that have at least one element in a column san the diagonal elements.
# Index into input array with it for the final output.
colmask = ~(np.triu(mask2D,1).any(0))
out = A[:,colmask]
Define subset as col1.dot(col1) == col1.dot(col2) if and only if col1 is a subset of col2
Define col1 and col2 are the same if and only if col1 is subset of col2 and vice versa.
I split the work into two. First get rid of all but one equivalent columns. Then remove subsets.
Solution
import numpy as np
def drop_duplicates(A):
N = A.T.dot(A)
D = np.diag(N)[:, None]
drops = np.tril((N == D) & (N == D.T), -1).any(axis=1)
return A[:, ~drops], drops
def drop_subsets(A):
N = A.T.dot(A)
drops = ((N == np.diag(N)).sum(axis=0) > 1)
return A[:, ~drops], drops
def drop_strict(A):
A1, d1 = drop_duplicates(A)
A2, d2 = drop_subsets(A1)
d1[~d1] = d2
return A2, d1
A = np.array([[1, 0, 0, 0, 0, 1],
[0, 1, 1, 1, 1, 0],
[1, 0, 1, 0, 1, 1],
[1, 1, 0, 1, 0, 1],
[1, 1, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0],
[0, 0, 1, 0, 1, 0]])
B, drops = drop_strict(A)
Demonstration
print B
print
print drops
[[1 0 0]
[0 1 1]
[1 1 0]
[1 0 1]
[1 0 1]
[1 0 0]
[0 1 1]
[0 1 0]]
[False True False False True True]
Explanation
N = A.T.dot(A) is a matrix of every combination of dot product. Per the definition of subset at the top, this will come in handy.
def drop_duplicates(A):
N = A.T.dot(A)
D = np.diag(N)[:, None]
# (N == D)[i, j] being True identifies A[:, i] as a subset
# of A[:, j] if i < j. The relationship is reversed if j < i.
# If A[:, j] is subset of A[:, i] and vice versa, then we have
# equivalent columns. Taking the lower triangle ensures we
# leave one.
drops = np.tril((N == D) & (N == D.T), -1).any(axis=1)
return A[:, ~drops], drops
def drop_subsets(A):
N = A.T.dot(A)
# without concern for removing equivalent columns, this
# removes any column that has an off diagonal equal to the diagonal
drops = ((N == np.diag(N)).sum(axis=0) > 1)
return A[:, ~drops], drops

Categories

Resources