Let's say I have an array like this
grid:
[[1, 1, 0, 0, 0],
[1, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 1]]
I want to isolate the group of "items" in this case 1's which are three groups the rule being the 0's are used to separate them like intersections. So this example has 3 groups of 1.
If you know how to do this with python, the first question I'd be asked is what I've tried as proof of not handing my homework to the community, the idea I had was to iterate down and left but that would have a high likelihood of missing some numbers since if you think about it, it would form a cross eminating from the top left and well this group is here to learn.
So for me and others who have an interest in this data science like problem be considerate.
If you do not need to know which sets are duplicates, you can use python's set built-in to determine unique items in a list. This can be a bit tricky since set doesn't work on a list of lists. However, you can convert this to a list of tuples, put those back in a list, and then get the len of that list to find out how many unique value sets there are.
grid = [[1, 1, 0, 0, 0],
[1, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 1]]
unique = [list(x) for x in set(tuple(x) for x in grid)]
unique_count = len(unique) # this will return 3
Relatively straightforward depth first search based implementation of connected component labeling.
def get_components(grid, indicator=1):
def label(g, row, col, group):
if row >= 0 and col >= 0 and row < len(g) and col < len(g[row]) and g[row][col] == -1:
# only label if currently unlabeled
g[row][col] = group
# attempt to label neighbors with same label
label(g, row + 1, col, group)
label(g, row, col + 1, group)
label(g, row - 1, col, group)
label(g, row, col - 1, group)
return True
else:
return False
# initialize label grid as -1 for entries that need labeled
label_grid = [[-1 if gc == indicator else 0 for gc in gr] for gr in grid]
group_count = 0
for row, grid_row in enumerate(grid):
for col in range(len(grid_row)):
if label(label_grid, row, col, group_count + 1):
group_count += 1
return label_grid, group_count
The results of label_grid, group_count = get_components(grid) for your example inputs are
label_grid = [[1, 1, 0, 0, 0],
[1, 1, 0, 0, 0],
[0, 0, 2, 0, 0],
[0, 0, 0, 3, 3]]
group_count = 3
And for cases like the following
grid = [[1 0 1],
[1 1 1]]
we get group_count = 1.
Related
I have a two dimensional list like :
data = [[0,0,0,0,0,1,0,0,0,0], [0,1,0,0,0,0,0,0,0,0]]
How can I access the index of the neighbours, where the value equals 1?
Expected output:
[[4, 5, 6], [0, 1, 2]]
For example, the indices of an array data in first row at value 1 is 5, so I need to access its left and right side neighbour indices like 4 and 6. Same way for row 2.
If I understand description well (please clarify) , maybe you can try this one. Additionally, you can check edge case where there is no 1, or no left or right .
import numpy as np
a = np.array([
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0]])
if __name__ == "__main__":
indices = np.where(a == 1)[1]
indices = indices.reshape(-1,1)
indices = np.concatenate([indices-1,indices,indices+1],-1)
print(indices)
One efficient solution is using FOR loops:
for i in range(2):
for j in range(10):
if a[i][j]==1:
print(str(i)+' '+str(j))
If using lists, here is a one approach which identifies the indexes of the neighbours of 1. As a caveat, this will fail with a index out of range, if the 1 value is the first of last element in the list.
Input:
data = [[0,0,0,0,0,1,0,0,0,0], [0,1,0,0,0,0,0,0,0,0]]
Example:
[[i-1, i, i+1] for sub in data for i, j in enumerate(sub) if j == 1]
Output:
[[4, 5, 6], [0, 1, 2]]
I have an numpy array like this:
a = np.array([[1, 0, 1, 1, 1],
[1, 1, 1, 1, 0],
[1, 0, 0, 1, 1],
[1, 0, 1, 0, 1]])
Question 1:
As shown in the title, I want to replace all elements with zero after the first zero appeared. The result should be like this :
a = np.array([[1, 0, 0, 0, 0],
[1, 1, 1, 1, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0]])
Question 2: how to slice different columns for each row like this example?
As I am dealing with an array with large size. If any one could find an efficient way to solve this please. Thank you very much.
One way to accomplish question 1 is to use numpy.cumprod
>>> np.cumprod(a, axis=1)
array([[1, 0, 0, 0, 0],
[1, 1, 1, 1, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0]])
Question 1:
You could iterate over the array like so:
for i in range(a.shape[0]):
j = 0
row = a[i]
while row[j]>0:
j += 1
row[j+1:] = 0
This will change the array in-place. If you are interested in very high performance, the answers to this question could be of use to find the first zero faster. np.where scans the entire array for this and therefore is not optimal for the task.
Actually, the fastest solution will depend a bit on the distribution of your array entries: If there are many floats in there and rarely is there ever a zero, the while loops in the code above will interrupt late on average, requiring to write only "a few" zeros. If however there are only two possible entries like in your sample array and these occur with a similar probability (i.e. ~50%), there would be a lot of zeros to be written to a, and the following will be faster:
b = np.zeros(a.shape)
for i in range(a.shape[0]):
j = 0
a_row = a[i]
b_row = b[i]
while a_row[j]>0:
b_row[j] = a_row[j]
j += 1
Question 2:
If you mean to slice each row individually on a similar criterion dealing with a first occurence of some kind, you could simply adapt this iteration pattern. If the criterion is more global (like finding the maximum of the row, for example) built-in methods like np.where exist that will be more efficient, but it probably would depend a bit on the criterion itself which choice is best.
Question 1: An efficient way to do this would be the following.
import numpy as np
a = np.array([[1, 0, 1, 1, 1],
[1, 1, 1, 1, 0],
[1, 0, 0, 1, 1],
[1, 0, 1, 0, 1]])
for row in a:
zeros = np.where(row == 0)[0]
if (len(zeros)):# Check if zero exists
row[zeros[0]:] = 0
print(a)
Output:
[[1 0 0 0 0]
[1 1 1 1 0]
[1 0 0 0 0]
[1 0 0 0 0]]
Question 2: Using the same array, for each row rowIdx, you can have a array of columns colIdxs that you want to extract from.
rowIdx = 2
colIdxs = [1, 3, 4]
print(a[rowIdx, colIdxs])
Output:
[0 1 1]
I prefer Ayrat's creative answer for the first question, but if you need to slice different columns for different rows in large size, this could help you:
indexer = tuple(np.s_[i:a.shape[1]] for i in (a==0).argmax(axis=1))
for i,j in enumerate(indexer):
a[i,j]=0
indexer:
(slice(1, 5, None), slice(4, 5, None), slice(1, 5, None), slice(1, 5, None))
or:
indexer = (a==0).argmax(axis=1)
for i in range(a.shape[0]):
a[i,indexer[i]:]=0
indexer:
[1 4 1 1]
output:
[[1 0 0 0 0]
[1 1 1 1 0]
[1 0 0 0 0]
[1 0 0 0 0]]
I'm currently working on a Nonogram solver on python, and having a problem with calculating all possible row combinations using backtracking.
So, I need a function, foo, that receives a row and constraints foo(row, const), and returns the combinations according to the order in the constraints list.
The row contains the elements: 1, 0 or "?".
1 - already colored
0 - not colored
"?" - neutral (can be both colored and not colored)
Few examples of the wanted output:
foo([1, 1, "?", 0], [3]) ---> [[1, 1, 1, 0]]
foo(["?", 0, 1, 0, "?", 0], [1,1]) ---> [[0, 0, 1, 0, 1, 0], [1, 0, 1, 0, 0, 0]]
foo([0, 0, "?", 1, 0] , [3]) ---> []
Here's my failed attempt:
def foo(row, const, options_lst=[]):
if len([const[j] for i in range(len(row)) if row[i:i + const[0]]
== const[0]*[1] for j in range(len(const))]) >= 1:
if row not in options_lst:
for i in range(len(row)):
if row[i] == "?":
row[i] = 0
options_lst.append(row)
print(options_lst)
return
for i in range(len(row)):
if row[i] == "?":
row[i] = 1
foo(row, const)
I want to create a grid with a variable number of rows and columns. What I have done to achieve this is this
BaseRow = []
for j in range (0, columns):
BaseRow.append(0)
Grid = []
for j in range (0, rows):
Grid.append(BaseRow)
So all seems fine until now, I print the rows in order with this piece of code
for i in range (1, rows+1):
print Grid[rows-i]
and a grid that looks something like this
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
is printed. Thing is, afterwards, I want to change a specific element. But when I do,
Grid[0][0] = 1
and print again, instead of just changing the rightmost down most 0 to a 1, the whole column changes, so it becomes
[1, 0, 0]
[1, 0, 0]
[1, 0, 0]
I suspect it sees that Grid is taking from BaseRow, so it changes BaseRow, and then the Grid takes the rows from BaseRow and just puts that value everywhere. I suspect .append might not be what I am looking for, but for all the research I have done I have not managed to find something else to use. If I understand correctly, .extend will not add it as a list but as individual numbers. What should I change, or how should I structure this?
Please excuse my limited knowledge, I just started programming in python half a week ago. Thanks for your help!
BaseRow = []
for j in range (0, columns):
BaseRow.append(0)
Grid = []
for j in range (0, rows):
Grid.append(BaseRow)
When you do this, the same instance of BaseRow is appended to Grid multiple times. So, if you change even row in Grid, the effect will be on all rows, as it is basically the same instance of list in all rows.
If you want a copy of BaseRow to be appended to Grid, use the below code:
for j in range(0, rows):
Grid.append(BaseRow[:])
You could also use list comprehension:
Grid = [[0 for j in range(0, columns)] for i in range(0, rows)]
Output for Columns = 3 and rows = 4:
[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]
Output after setting Grid[0][0] = 1:
[[1, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]
If you ask me, I would any day use List comprehension because it's so clean and easy:
columns, rows = 3, 3
lst = [[0 for j in range(columns)] for i in range(rows)] # List of List with 3 Columns and 3 Rows
lst[0][0] = 1 # modifying a member
print (lst) # Print the result
# Result: [[1, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
I personally prefer list comprehension but your code needs just little changes and it will serve you well. You append list and to that list you append elements:
matrix = []
for i in range(3):
matrix.append([])
for j in range(4):
matrix[-1].append(0)
print(matrix)
[[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
matrix[0][0] = 1
print(matrix)
[[1, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
I have a problem where I want to identify and remove columns in a logic matrix that are subsets of other columns. i.e. [1, 0, 1] is a subset of [1, 1, 1]; but neither of [1, 1, 0] and [0, 1, 1] are subsets of each other. I wrote out a quick piece of code that identifies the columns that are subsets, which does (n^2-n)/2 checks using a couple nested for loops.
import numpy as np
A = np.array([[1, 0, 0, 0, 0, 1],
[0, 1, 1, 1, 1, 0],
[1, 0, 1, 0, 1, 1],
[1, 1, 0, 1, 0, 1],
[1, 1, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0],
[0, 0, 1, 0, 1, 0]])
rows,cols = A.shape
columns = [True]*cols
for i in range(cols):
for j in range(i+1,cols):
diff = A[:,i]-A[:,j]
if all(diff >= 0):
print "%d is a subset of %d" % (j, i)
columns[j] = False
elif all(diff <= 0):
print "%d is a subset of %d" % (i, j)
columns[i] = False
B = A[:,columns]
The solution should be
>>> print B
[[1 0 0]
[0 1 1]
[1 1 0]
[1 0 1]
[1 0 1]
[1 0 0]
[0 1 1]
[0 1 0]]
For massive matrices though, I'm sure there's a way that I could do this faster. One thought is to eliminate subset columns as I go so I'm not checking columns already known to be a subset. Another thought is to vectorize this so don't have O(n^2) operations. Thank you.
Since the A matrices I'm actually dealing with are 5000x5000 and sparse with about 4% density, I decided to try a sparse matrix approach combined with Python's "set" objects. Overall it's much faster than my original solution, but I feel like my process of going from matrix A to list of sets D is not as fast it could be. Any ideas on how to do this better are appreciated.
Solution
import numpy as np
A = np.array([[1, 0, 0, 0, 0, 1],
[0, 1, 1, 1, 1, 0],
[1, 0, 1, 0, 1, 1],
[1, 1, 0, 1, 0, 1],
[1, 1, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0],
[0, 0, 1, 0, 1, 0]])
rows,cols = A.shape
drops = np.zeros(cols).astype(bool)
# sparse nonzero elements
C = np.nonzero(A)
# create a list of sets containing the indices of non-zero elements of each column
D = [set() for j in range(cols)]
for i in range(len(C[0])):
D[C[1][i]].add(C[0][i])
# find subsets, ignoring columns that are known to already be subsets
for i in range(cols):
if drops[i]==True:
continue
col1 = D[i]
for j in range(i+1,cols):
col2 = D[j]
if col2.issubset(col1):
# I tried `if drops[j]==True: continue` here, but that was slower
print "%d is a subset of %d" % (j, i)
drops[j] = True
elif col1.issubset(col2):
print "%d is a subset of %d" % (i, j)
drops[i] = True
break
B = A[:, ~drops]
print B
Here's another approach using NumPy broadcasting -
A[:,~((np.triu(((A[:,:,None] - A[:,None,:])>=0).all(0),1)).any(0))]
A detailed commented explanation is listed below -
# Perform elementwise subtractions keeping the alignment along the columns
sub = A[:,:,None] - A[:,None,:]
# Look for >=0 subtractions as they indicate non-subset criteria
mask3D = sub>=0
# Check if all elements along each column satisfy that criteria giving us a 2D
# mask which represent the relationship between all columns against each other
# for the non subset criteria
mask2D = mask3D.all(0)
# Finally get the valid column mask by checking for all columns in the 2D mas
# that have at least one element in a column san the diagonal elements.
# Index into input array with it for the final output.
colmask = ~(np.triu(mask2D,1).any(0))
out = A[:,colmask]
Define subset as col1.dot(col1) == col1.dot(col2) if and only if col1 is a subset of col2
Define col1 and col2 are the same if and only if col1 is subset of col2 and vice versa.
I split the work into two. First get rid of all but one equivalent columns. Then remove subsets.
Solution
import numpy as np
def drop_duplicates(A):
N = A.T.dot(A)
D = np.diag(N)[:, None]
drops = np.tril((N == D) & (N == D.T), -1).any(axis=1)
return A[:, ~drops], drops
def drop_subsets(A):
N = A.T.dot(A)
drops = ((N == np.diag(N)).sum(axis=0) > 1)
return A[:, ~drops], drops
def drop_strict(A):
A1, d1 = drop_duplicates(A)
A2, d2 = drop_subsets(A1)
d1[~d1] = d2
return A2, d1
A = np.array([[1, 0, 0, 0, 0, 1],
[0, 1, 1, 1, 1, 0],
[1, 0, 1, 0, 1, 1],
[1, 1, 0, 1, 0, 1],
[1, 1, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0],
[0, 0, 1, 0, 1, 0]])
B, drops = drop_strict(A)
Demonstration
print B
print
print drops
[[1 0 0]
[0 1 1]
[1 1 0]
[1 0 1]
[1 0 1]
[1 0 0]
[0 1 1]
[0 1 0]]
[False True False False True True]
Explanation
N = A.T.dot(A) is a matrix of every combination of dot product. Per the definition of subset at the top, this will come in handy.
def drop_duplicates(A):
N = A.T.dot(A)
D = np.diag(N)[:, None]
# (N == D)[i, j] being True identifies A[:, i] as a subset
# of A[:, j] if i < j. The relationship is reversed if j < i.
# If A[:, j] is subset of A[:, i] and vice versa, then we have
# equivalent columns. Taking the lower triangle ensures we
# leave one.
drops = np.tril((N == D) & (N == D.T), -1).any(axis=1)
return A[:, ~drops], drops
def drop_subsets(A):
N = A.T.dot(A)
# without concern for removing equivalent columns, this
# removes any column that has an off diagonal equal to the diagonal
drops = ((N == np.diag(N)).sum(axis=0) > 1)
return A[:, ~drops], drops