Numpy trim_zeros in 2D or 3D - python

How to remove leading / trailing zeros from a NumPy array? Trim_zeros works only for 1D.

Here's some code that will handle 2-D arrays.
import numpy as np
# Arbitrary array
arr = np.array([
[0, 0, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 1, 1, 1, 0],
[0, 1, 0, 1, 0],
[1, 1, 0, 1, 0],
[1, 0, 0, 1, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]
])
nz = np.nonzero(arr) # Indices of all nonzero elements
arr_trimmed = arr[nz[0].min():nz[0].max()+1,
nz[1].min():nz[1].max()+1]
assert np.array_equal(arr_trimmed, [
[0, 0, 0, 1],
[0, 1, 1, 1],
[0, 1, 0, 1],
[1, 1, 0, 1],
[1, 0, 0, 1],
])
This can be generalized to N-dimensions as follows:
def trim_zeros(arr):
"""Returns a trimmed view of an n-D array excluding any outer
regions which contain only zeros.
"""
slices = tuple(slice(idx.min(), idx.max() + 1) for idx in np.nonzero(arr))
return arr[slices]
test = np.zeros((5,5,5,5))
test[1:3,1:3,1:3,1:3] = 1
trimmed_array = trim_zeros(test)
assert trimmed_array.shape == (2, 2, 2, 2)
assert trimmed_array.sum() == 2**4

The following function works for any dimension:
def trim_zeros(arr, margin=0):
'''
Trim the leading and trailing zeros from a N-D array.
:param arr: numpy array
:param margin: how many zeros to leave as a margin
:returns: trimmed array
:returns: slice object
'''
s = []
for dim in range(arr.ndim):
start = 0
end = -1
slice_ = [slice(None)]*arr.ndim
go = True
while go:
slice_[dim] = start
go = not np.any(arr[tuple(slice_)])
start += 1
start = max(start-1-margin, 0)
go = True
while go:
slice_[dim] = end
go = not np.any(arr[tuple(slice_)])
end -= 1
end = arr.shape[dim] + min(-1, end+1+margin) + 1
s.append(slice(start,end))
return arr[tuple(s)], tuple(s)
Which can be tested with:
test = np.zeros((3,4,5,6))
test[1,2,2,5] = 1
trim_zeros(test, margin=1)

I would like to extend the previous answers to n-dimension with ignore axis:
def array_trim(arr, ignore=[],margin=0):
all = np.where(arr != 0)
idx = ()
for i in range(len(all)):
if i in ignore:
idx += (np.s_[:],)
else:
idx += (np.s_[all[i].min()-margin: all[i].max()+margin+1],)
return arr[idx]

Related

Python linear algebra in a finite field

Is there a way to do linear algebra and matrix manipulation in a finite field in Python? I need to be able to find the null space of a non-square matrix in the finite field F2. I currently can't find a way to do this. I have tried the galois package, but it does not support the scipy null space function. It is easy to compute the null space in sympy, however I do not know how to work in a finite field in sympy.
I'm the author of the galois library you mentioned. As noted by other comments, this capability is easy to add, so I added it in galois#259. It is now available in v0.0.24 (released today 02/12/2022).
Here is the documentation for computing the null space FieldArray.null_space() that you desire.
Here's an example computing the row space and left null space.
In [1]: import galois
In [2]: GF = galois.GF(2)
In [3]: m, n = 7, 3
In [4]: A = GF.Random((m, n)); A
Out[4]:
GF([[1, 1, 0],
[0, 0, 0],
[1, 0, 0],
[1, 1, 1],
[0, 0, 1],
[1, 1, 1],
[0, 1, 0]], order=2)
In [5]: R = A.row_space(); R
Out[5]:
GF([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]], order=2)
In [6]: LN = A.left_null_space(); LN
Out[6]:
GF([[1, 0, 0, 0, 1, 1, 0],
[0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 1, 1, 1],
[0, 0, 0, 1, 0, 1, 0]], order=2)
# The left null space annihilates the rows of A
In [7]: LN # A
Out[7]:
GF([[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]], order=2)
# The dimension of the row space and left null space sum to m
In [8]: R.shape[0] + LN.shape[0] == m
Out[8]: True
Here's the column space and null space.
In [9]: C = A.column_space(); C
Out[9]:
GF([[1, 0, 0, 0, 1, 0, 1],
[0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 1, 1, 1, 0]], order=2)
In [10]: N = A.null_space(); N
Out[10]: GF([], shape=(0, 3), order=2)
# If N has dimension > 0, then A # N.T == 0
In [11]: C.shape[0] + N.shape[0] == n
Out[11]: True
That's how I would approach as well.
Null space for floating point numbers is usually implemented using SVD or some other robust algorithm, for your GF(2) field it you can simply use a gaussian elimination, since there is no rounding.
Here goes an example
import numpy as np
import galois
# Initialize GF(2) and a random matrix to serve as an example
M,N = 7, 4
GF2 = galois.GF(2)
A = GF2.Random((M, N))
# B is an augmented matrix [A | I]
B = GF2.Zeros((M, M+N));
B[:, :N] = A
for i in range(M):
B[i, N+i] = 1;
for i in range(M):
B[i, N+i] = 1;
# Run gaussian elimination
k = 0;
for j in range(N):
i = j;
for i in range(k, M):
if B[i,j] != 0:
if i != j:
B[[i,k],:] = B[[k,i],:]
break;
if B[k,j] == 0:
continue;
for i in range(j+1, M):
if B[i,j]:
B[i,j:] += B[k,j:];
k += 1;
C = B[k:, N:]
# C should be the left null space of A
C # A # should be zero

Python: distance from index to 1s in binary mask

I have a binary mask like this:
X = [[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 1, 1],
[0, 0, 1, 1, 1, 1],
[0, 0, 1, 1, 1, 1],
[0, 0, 0, 1, 1, 1]]
I have a certain index in this array and want to compute the distance from that index to the closest 1 in the mask. If there's already a 1 at that index, the distance should be zero.
Examples (assuming Manhattan distance):
distance(X, idx=(0, 5)) == 0 # already is a 1 -> distance is zero
distance(X, idx=(1, 2)) == 2 # second row, third column
distance(X, idx=(0, 0)) == 5 # upper left corner
Is there already existing functionality like this in Python/NumPy/SciPy? Both Euclidian and Manhattan distance would be fine.
I'd prefer to avoid computing distances for the entire matrix (as that is pretty big in my case), and only get the distance for my one index.
Here's one for manhattan distance metric for one entry -
def bwdist_manhattan_single_entry(X, idx):
nz = np.argwhere(X==1)
return np.abs((idx-nz).sum(1)).min()
Sample run -
In [143]: bwdist_manhattan_single_entry(X, idx=(0,5))
Out[143]: 0
In [144]: bwdist_manhattan_single_entry(X, idx=(1,2))
Out[144]: 2
In [145]: bwdist_manhattan_single_entry(X, idx=(0,0))
Out[145]: 5
Optimize further on performance by extracting the boudary elements only off the blobs of 1s -
from scipy.ndimage.morphology import binary_erosion
def bwdist_manhattan_single_entry_v2(X, idx):
k = np.ones((3,3),dtype=int)
nz = np.argwhere((X==1) & (~binary_erosion(X,k,border_value=1)))
return np.abs((idx-nz).sum(1)).min()
Number of elements in nz with this method would be smaller number than the earlier one, hence it improves.
You can use scipy.ndimage.morphology.distance_transform_cdt to compute the "taxicab" (Manhattan) distance transform:
import numpy as np
import scipy.ndimage.morphology
x = np.array([[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 1, 1],
[0, 0, 1, 1, 1, 1],
[0, 0, 1, 1, 1, 1],
[0, 0, 0, 1, 1, 1]])
d = scipy.ndimage.morphology.distance_transform_cdt(1 - x, 'taxicab')
print(d[0, 5])
# 0
print(d[1, 2])
# 2
print(d[0, 0])
# 5
You can do it like this:
def Manhattan_distance(X, idx):
dist = min([ abs(i-idx[0]) + abs(j-idx[1]) for i, row in enumerate(X) for j, val in enumerate(X[i]) if val == 1])
return dist
Thanks.

Updating numpy 2-dimensional array according to conditions across different 2-D arrays

In the code that I am writing, I have three 2D numpy arrays with the same dimensions (m x n), with each 2D array containing info about a specific trait, but each corresponding cell (with a specific row/col value) across all three 2D arrays corresponding to a specific person. The three 2D arrays are trait1, trait2, and trait3. As an example, person (0, 0) will have traits 1, 2, but not three, if only trait1 and trait2 have a value of 1 at location (0,0), but trait3 does not.
What would be an efficient method of updating a 2D array at a specific location based on the values of other corresponding 2D arrays of the same dimension at the same location? That is, how can I efficiently update a 2D array at a specific location such that the other 2D arrays at this same location fulfill specific conditions?
I am currently trying to update the values of the 2D array trait1 and trait2 according to the current values of trait1 and trait2 (such that the corresponding trait1 value == 1, and the corresponding trait2 value == 0); I am also trying to update the values of trait3 according to the current values of trait1, and trait2 (under the same conditions as the previous). However, I am having trouble doing this without using nested for loops, which greatly slows down my program.
Below is my current approach, which works, but is much too slow for my purposes:
for i in range (0, m):
for j in range (0, n):
if trait1[i][j] == 1:
if trait2[i][j] == 0:
trait1[i][j] = 0
trait2[i][j] = 1
new_color(i, j, 1) #updates the color of the specific person on a grid
trait3[i][j] = 0
elif trait1[i][j] == 0:
if trait2[i][j] <= 0:
trait1[i][j] = 1
trait2[i][j] = 0
new_color(i, j, 0)
Numpy array are really slow if you use loop indeed. If you can use matrices operations / numpy function for everything, it will go much faster.
In your case, you could first extract the indices you're interested about, and then update your matrices like this:
import numpy as np
np.random.seed(1)
# Generate some sample data
trait1, trait2, trait3 = ( np.random.randint(0,2, [4,4]) for _ in range(3) )
In [4]: trait1
Out[4]:
array([[1, 1, 0, 0],
[1, 1, 1, 1],
[1, 0, 0, 1],
[0, 1, 1, 0]])
In [5]: trait2
Out[5]:
array([[0, 1, 0, 0],
[0, 1, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0]])
In [6]: trait3
Out[6]:
array([[1, 1, 1, 1],
[1, 0, 0, 0],
[1, 1, 1, 1],
[1, 1, 0, 1]])
And then:
cond1_idx = np.where((trait1 == 1) & (trait2==0))
cond2_idx = np.where((trait1 == 0) & (trait2<=0))
trait1[cond1_idx] = 0
trait2[cond1_idx] = 1
trait3[cond1_idx] = 0
[ new_color(i, j, 1) for i,j in zip(*cond1_idx) ]
trait1[cond2_idx] = 1
trait2[cond2_idx] = 0
[ new_color(i, j, 0) for i,j in zip(*cond2_idx) ]
Result:
In [2]: trait1
Out[2]:
array([[0, 1, 1, 1],
[0, 1, 0, 0],
[1, 1, 1, 0],
[0, 0, 0, 1]])
In [3]: trait2
Out[3]:
array([[1, 1, 0, 0],
[1, 1, 1, 1],
[1, 0, 0, 1],
[1, 1, 1, 0]])
In [4]: trait3
Out[4]:
array([[0, 1, 1, 1],
[0, 0, 0, 0],
[1, 1, 1, 0],
[1, 0, 0, 1]])
I cannot really test the new_color though since I don't have the function

List wrapping for finding distance between indices

I have a random generated list that could look like:
[1, 0, 0, 1, 1, 0, 1, 0, 0, 0]
I need to find all of the distance between the 1's including the ones that wrap around.
For an example the list above, the first 1 has a distance of 3 to the next 1. The second 1 has a distance of 1 to the following 1 and so on.
How do I find the distance for the last 1 in the list using wrap around to the first 1?
def calc_dist(loc_c):
first = []
#lst2 = []
count = 0
for i in range(len(loc_c)):
if loc_c[i] == 0:
count += 1
#lst2.append(0)
elif loc_c[i] == 1:
first.append(i)
count += 1
loc_c[i] = count
#lst2.append(loc_c[i])
#if loc_c[i] + count > len(loc_c):
# x = loc_c[first[0] + 11 % len(loc_c)]
# loc_c[i] = x
count = 0
return loc_c
My expected outcome should be [3, 1, 2, 4].
Store the index of the first 1 you first reference, then when you get to the last 1 you only have to add the index of the first plus the number of 0 elements after the last 1 to get that distance (so len(inputlist) - lastindex + firstindex).
The other distances are the difference between the preceding 1 value and the current index.
from typing import Any, Generator, Iterable
def distances(it: Iterable[Any]) -> Generator[int, None, None]:
"""Produce distances between true values in an iterable.
If the iterable is not endless, the final distance is that of the last
true value to the first as if the sequence of values looped round.
"""
first = prev = None
length = 0
for i, v in enumerate(it):
length += 1
if v:
if first is None:
first = i
else:
yield i - prev
prev = i
if first is not None:
yield length - prev + first
The above generator calculates distances as it loops over the sequence seq, yielding them one by one:
>>> for distance in distances([1, 0, 0, 1, 1, 0, 1, 0, 0, 0]):
... print(distance)
...
3
1
2
4
Just call list() on the generator if you must have list output:
>>> list(distances([1, 0, 0, 1, 1, 0, 1, 0, 0, 0]))
[3, 1, 2, 4]
If there are no 1 values, this results in zero distances yielded:
>>> list(distances([0, 0, 0]))
[]
and 1 1 value gives you 1 distance:
>>> list(distances([1, 0, 0]))
[3]
I've made the solution generic enough to be able to handle any iterable, even if infinite; this means you can use another generator to feed it too. If given an infinite iterable that produces at least some non-zero values, it'll just keep producing distances.
Nice and tidy:
def calc_dist(l):
idx = [i for i, v in enumerate(l) if v]
if not idx: return []
idx.append(len(l)+idx[0])
return [idx[i]-idx[i-1] for i in range(1,len(idx))]
print(calc_dist([1, 0, 0, 1, 1, 0, 1, 0, 0, 0]))
# [3, 1, 2, 4]
print(calc_dist([0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0]))
# [3, 1, 2, 7]
print(calc_dist([0, 0, 0, 0])
# []
You can use numpy:
import numpy as np
L = np.array([1, 0, 0, 1, 1, 0, 1, 0, 0, 0])
id = np.where(test == 1)[0]
# id = array([0, 3, 4, 6], dtype=int64)
res = [id[i]-id[i-1] for i in range(1, len(id))]
# [3, 1, 2]
# Last distance missing:
res.append(len(L)- id[-1])
res = [3, 1, 2, 4]
Note that the information you ask for is comprised above, but maybe the output format is wrong. You were not really specific...
Edit: How to convert list to an array since you generate random list
L = [1, 0, 0, 1, 1, 0, 1, 0, 0, 0]
np.asarray(L)
Edit2: How to check if there is no 1 in the list:
import numpy as np
L = np.array([1, 0, 0, 1, 1, 0, 1, 0, 0, 0])
id = np.where(test == 1)[0]
if len(id) == 0:
res = []
else:
res = [id[i]-id[i-1] for i in range(1, len(id))]
res.append(len(L)- id[-1])
OR:
try:
res = [id[i]-id[i-1] for i in range(1, len(id))]
res.append(len(L)- id[-1])
except:
res = []

2-D Matrix: Finding and deleting columns that are subsets of other columns

I have a problem where I want to identify and remove columns in a logic matrix that are subsets of other columns. i.e. [1, 0, 1] is a subset of [1, 1, 1]; but neither of [1, 1, 0] and [0, 1, 1] are subsets of each other. I wrote out a quick piece of code that identifies the columns that are subsets, which does (n^2-n)/2 checks using a couple nested for loops.
import numpy as np
A = np.array([[1, 0, 0, 0, 0, 1],
[0, 1, 1, 1, 1, 0],
[1, 0, 1, 0, 1, 1],
[1, 1, 0, 1, 0, 1],
[1, 1, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0],
[0, 0, 1, 0, 1, 0]])
rows,cols = A.shape
columns = [True]*cols
for i in range(cols):
for j in range(i+1,cols):
diff = A[:,i]-A[:,j]
if all(diff >= 0):
print "%d is a subset of %d" % (j, i)
columns[j] = False
elif all(diff <= 0):
print "%d is a subset of %d" % (i, j)
columns[i] = False
B = A[:,columns]
The solution should be
>>> print B
[[1 0 0]
[0 1 1]
[1 1 0]
[1 0 1]
[1 0 1]
[1 0 0]
[0 1 1]
[0 1 0]]
For massive matrices though, I'm sure there's a way that I could do this faster. One thought is to eliminate subset columns as I go so I'm not checking columns already known to be a subset. Another thought is to vectorize this so don't have O(n^2) operations. Thank you.
Since the A matrices I'm actually dealing with are 5000x5000 and sparse with about 4% density, I decided to try a sparse matrix approach combined with Python's "set" objects. Overall it's much faster than my original solution, but I feel like my process of going from matrix A to list of sets D is not as fast it could be. Any ideas on how to do this better are appreciated.
Solution
import numpy as np
A = np.array([[1, 0, 0, 0, 0, 1],
[0, 1, 1, 1, 1, 0],
[1, 0, 1, 0, 1, 1],
[1, 1, 0, 1, 0, 1],
[1, 1, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0],
[0, 0, 1, 0, 1, 0]])
rows,cols = A.shape
drops = np.zeros(cols).astype(bool)
# sparse nonzero elements
C = np.nonzero(A)
# create a list of sets containing the indices of non-zero elements of each column
D = [set() for j in range(cols)]
for i in range(len(C[0])):
D[C[1][i]].add(C[0][i])
# find subsets, ignoring columns that are known to already be subsets
for i in range(cols):
if drops[i]==True:
continue
col1 = D[i]
for j in range(i+1,cols):
col2 = D[j]
if col2.issubset(col1):
# I tried `if drops[j]==True: continue` here, but that was slower
print "%d is a subset of %d" % (j, i)
drops[j] = True
elif col1.issubset(col2):
print "%d is a subset of %d" % (i, j)
drops[i] = True
break
B = A[:, ~drops]
print B
Here's another approach using NumPy broadcasting -
A[:,~((np.triu(((A[:,:,None] - A[:,None,:])>=0).all(0),1)).any(0))]
A detailed commented explanation is listed below -
# Perform elementwise subtractions keeping the alignment along the columns
sub = A[:,:,None] - A[:,None,:]
# Look for >=0 subtractions as they indicate non-subset criteria
mask3D = sub>=0
# Check if all elements along each column satisfy that criteria giving us a 2D
# mask which represent the relationship between all columns against each other
# for the non subset criteria
mask2D = mask3D.all(0)
# Finally get the valid column mask by checking for all columns in the 2D mas
# that have at least one element in a column san the diagonal elements.
# Index into input array with it for the final output.
colmask = ~(np.triu(mask2D,1).any(0))
out = A[:,colmask]
Define subset as col1.dot(col1) == col1.dot(col2) if and only if col1 is a subset of col2
Define col1 and col2 are the same if and only if col1 is subset of col2 and vice versa.
I split the work into two. First get rid of all but one equivalent columns. Then remove subsets.
Solution
import numpy as np
def drop_duplicates(A):
N = A.T.dot(A)
D = np.diag(N)[:, None]
drops = np.tril((N == D) & (N == D.T), -1).any(axis=1)
return A[:, ~drops], drops
def drop_subsets(A):
N = A.T.dot(A)
drops = ((N == np.diag(N)).sum(axis=0) > 1)
return A[:, ~drops], drops
def drop_strict(A):
A1, d1 = drop_duplicates(A)
A2, d2 = drop_subsets(A1)
d1[~d1] = d2
return A2, d1
A = np.array([[1, 0, 0, 0, 0, 1],
[0, 1, 1, 1, 1, 0],
[1, 0, 1, 0, 1, 1],
[1, 1, 0, 1, 0, 1],
[1, 1, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0],
[0, 0, 1, 0, 1, 0]])
B, drops = drop_strict(A)
Demonstration
print B
print
print drops
[[1 0 0]
[0 1 1]
[1 1 0]
[1 0 1]
[1 0 1]
[1 0 0]
[0 1 1]
[0 1 0]]
[False True False False True True]
Explanation
N = A.T.dot(A) is a matrix of every combination of dot product. Per the definition of subset at the top, this will come in handy.
def drop_duplicates(A):
N = A.T.dot(A)
D = np.diag(N)[:, None]
# (N == D)[i, j] being True identifies A[:, i] as a subset
# of A[:, j] if i < j. The relationship is reversed if j < i.
# If A[:, j] is subset of A[:, i] and vice versa, then we have
# equivalent columns. Taking the lower triangle ensures we
# leave one.
drops = np.tril((N == D) & (N == D.T), -1).any(axis=1)
return A[:, ~drops], drops
def drop_subsets(A):
N = A.T.dot(A)
# without concern for removing equivalent columns, this
# removes any column that has an off diagonal equal to the diagonal
drops = ((N == np.diag(N)).sum(axis=0) > 1)
return A[:, ~drops], drops

Categories

Resources