How to check is numpy 2d array "surrounded" by zeros - python

Is there any neat way to check is numpy array surrounded by zeros.
Example:
[[0,0,0,0],
[0,1,2,0],
[0,0,0,0]]
I know I can iterate it element wise to find out but I wonder is there any nice trick we can use here. The numpy array is of floats, n x m of arbitrary size.
Any ideas are welcome.

You can use numpy.any() to test if there is any non-zero element in numpy array.
Now, to test if a 2D array is surrounded by zeroes, you can get first and last columns as well as first and last rows and test if any of those contains a non-zero number.
def zero_surrounded(array):
return not (array[0,:].any() or array[-1,:].any() or array[:,0].any() or array[:,-1].any())

We can check this by constructing two submatrices:
A[[0,-1]] the first and the last row, including the first and last column; and
A[1:-1,[0,-1]] the first and last column, excluding the first and last row.
All the values of these matrices should be equal to zero, so we can use:
if np.all(A[[0,-1]] == 0) and np.all(A[1:-1,[0,-1]] == 0):
# ...
pass
This works for an arbitrary 2d-array, but not for arrays with arbitrary depth. We can however use a trick for that as well.
For an arbitrary matrix, we can use:
def surrounded_zero_dim(a):
n = a.ndim
sel = ([0,-1],)
sli = (slice(1,-1),)
return all(np.all(a[sli*i+sel] == 0) for i in range(n))
Using the slice is strictly speaking not necessary, but it prevents checking certain values twice.

Not the fastest, but perhaps the shortest (and hence a "neat") way of doing it:
surrounded = np.sum(a[1:-1, 1:-1]**2) == np.sum(a**2)
print(surrounded) # True
Here, a is the array.
This compares the sum of all squared elements to the sum of all squared elements except for those on the boundary. If we left out the squaring, cases where positive and negative boundary values add up to zero would produce the wrong answer.

Related

Coding an iterated sum of sums in python

For alpha and k fixed integers with i < k also fixed, I am trying to encode a sum of the form
where all the x and y variables are known beforehand. (this is essentially the alpha coordinate of a big iterated matrix-vector multiplication)
For a normal sum varying over one index I usually create a 1d array A and set A[i] equal to the i indexed entry of the sum then use sum(A), but in the above instance the entries of the innermost sum depend on the indices in the previous sum, which in turn depend on the indices in the sum before that, all the way back out to the first sum which prevents me using this tact in a straightforward manner.
I tried making a 2D array B of appropriate length and width and setting the 0 row to be the entries in the innermost sum, then the 1 row as the entries in the next sum times sum(np.transpose(B),0) and so on, but the value of the first sum (of row 0) needs to vary with each entry in row 1 since that sum still has indices dependent on our position in row 1, so on and so forth all the way up to sum k-i.
A sum which allows for a 'variable' filled in by each position of the array it's summing through would thusly do the trick, but I can't find anything along these lines in numpy and my attempts to hack one together have thus far failed -- my intuition says there is a solution that involves summing along the axes of a k-i dimensional array, but I haven't been able to make this precise yet. Any assistance is greatly appreciated.
One simple attempt to hard-code something like this would be:
for j0 in range(0,n0):
for j1 in range(0,n1):
....
Edit: (a vectorized version)
You could do something like this: (I didn't test it)
temp = np.ones(n[k-i])
for j in range(0,k-i):
temp = x[:n[k-i-1-j],:n[k-i-j]].T#(y[:n[k-i-j]]*temp)
result = x[alpha,:n[0]]#(y[:n[0]]*temp)
The basic idea is that you try to press it into a matrix-vector form. (note that this is python3 syntax)
Edit: You should note that you need to change the "k-1" to where the innermost sum is (I just did it for all sums up to index k-i)
This is 95% identical to #sehigle's answer, but includes a generic N vector:
def nested_sum(XX, Y, N, alpha):
intermediate = np.ones(N[-1], dtype=XX.dtype)
for n1, n2 in zip(N[-2::-1], N[:0:-1]):
intermediate = np.sum(XX[:n1, :n2] * Y[:n2] * intermediate, axis=1)
return np.sum(XX[alpha, :N[0]] * Y[:N[0]] * intermediate)
Similarly, I have no knowledge of the expression, so I'm not sure how to build appropriate tests. But it runs :\

Finding repeated rows in a numpy array

The following function is designed to find the unique rows of an array:
def unique_rows(a):
b = np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
_, idx = np.unique(b, return_index=True)
unique_a = a[idx]
return unique_a
For example,
test = np.array([[1,0,1],[1,1,1],[1,0,1]])
unique_rows(test)
[[1,0,1],[1,1,1]]
I believe that this function should work all the time, however it may not be watertight. In my code I would like to calculate how many unique positions exist for a set of particles. The particles are stored in a 2d array, each row corresponding to the position of a particle. The positions are of type np.float64.
I have also defined the following function
def pos_tag(pos):
x,y,z = pos[:,0],pos[:,1],pos[:,2]
return (2**x)*(3**y)*(5**z)
In principle this function should produce a unique value for any (x,y,z) position.
However, when I use these to functions to calculate the number of unique positions in my set of particles they produce different answers. Is this due to some possible logical flaw in the first function, or the second function not producing a unique value for each given position?
EDIT: Usage example
I have some long code that produces a 2d array of particle postions.
partpos.shape = (6039539,3)
I then calculate the number of unique rows as follows
len(unqiue_rows(partpos))
6034411
And
posids = pos_tag(partpos)
len(np.unique(posids))
5328871
I believe that the discrepancy arises due to a precision error.
Using the code
print len(unique_rows(partpos.astype(np.float32)))
print len(np.unique(pos_tag(partpos)))
6034411
6034411
However with
print len(unique_rows(partpos.astype(np.float32)))
print len(np.unique(pos_tag(partpos.astype(np.float32))))
6034411
5328871
a = [[1,0,1],[1,1,1],[1,0,1]]
# Convert rows to tuples so they're hashable, creating a generator thereof
b = (tuple(row) for row in a)
# Convert back to list of lists, after coercing to a set to eliminate non-unique rows
unique_rows = list(list(row) for row in set(b))
Edit: Well that's embarrassing. I just realized I didn't really address the question asked. This could still be the answer the OP is looking for, so I'll leave it, but it's not really what was asked. Sorry for that.

Replace values in a 2D array with different random numbers

I have a 2D array (image) in which I want to replace array values greater than some threshold with a random number in some range. My attempt was to use numpy.random.uniform, as so
Z[Z > some_value] = uniform(lower_limit,upper_limit)
However I've found that this replaces all values above the threshold with the same random value. I would like to replace all array values above the threshold with a different random value each.
I think this would require some interation over the entire array for which I would need to generate a random value if the condition is met. How would I do this?
You are correct that iteration would be the correct way to go. Let's do a list comprehension.
[uniform(lower_limit, upper_limit) if i > some_value else i
for i in Z]
Let's step through it. Take an individual value. If it is greater than the threshold, use a randomly generated one, otherwise the original value.
uniform(lower_limit, upper_limit) if i > some_value else i
Repeat this for every element in Z
for i in Z
For a 2D array, nest multiple comprehensions. Imagine that the above solution was to hit everything in one row and then repeat it for every row.
[[uniform(lower_limit, upper_limit) if i > some_value else i
for i in row]
for row in Z]
Check the third argument to uniform. Using size=N will yield an array of random values with length N. Thus
z[z>some_value] = np.random.uniform(lower, upper, len(z>some_value))
will do what you want.

Operation by indexing only last axis

I have an array of 3 dimensional vectors. The dimension of the array is arbitrary: it could be a single (N×3), double (M×N×3), triple (K×M×N×3) etc. I need to operate on two components of the vector while preserving the other dimensions.
For example, if I know it is three dimensionsional, I could do the following:
R = numpy.arctan2(A[:,:,:,1], A[:,:,:,0])
which gives me a three dimensional array of scalar values.
Now, to be able to do this on arbitrary number of dimensions. I need to slice over all other dimensions except the the last. So far, I'm able to do it with this:
s = [numpy.s_[:]] * (len(A.shape)-1)
R = numpy.arctan2(A[s+[1]], A[s+[0]])
which works even for single vectors. Is there a more numpythonic way of achieving the above?
I found an even nicer way. This here works for me
R = numpy.arctan2(A[...,1],A[...,0])

Comparing two vectors

I have some code where I want to test if the product of a matrix and vector is the zero vector. An example of my attempt is:
n =2
zerovector = np.asarray([0]*n)
for column in itertools.product([0,1], repeat = n):
for row in itertools.product([0,1], repeat = n-1):
M = toeplitz(column, [column[0]]+list(row))
for v in itertools.product([-1,0,1], repeat = n):
vector = np.asarray(v)
if (np.dot(M,v) == zerovector):
print M, "No good!"
break
But the line if (np.dot(M,v) == zerovector): gives the error ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(). What is the right way to do this?
The problem is that == between two arrays is an element-wise comparison—you get back an array of boolean values. An array of boolean values isn't a boolean value itself, so you can't use it in an if. This is what the error is trying to tell you.
You could solve this by using the all method, to check whether all of the elements in the boolean array are true. But you're making this way more complicated than you need to. Nonzero values are truthy, zero values are falsey, so you can just use any without a comparison:
if not np.dot(M, v).any():
If you want to make the comparison to zero explicit, just compare to a scalar, don't build a zero vector; it'll get broadcast the same way. And, if you ever do want to build a zero vector, just use the zeros function; don't build a list of zeros in a complicated way and pass it to asarray.
You could also use the count_nonzero function here as a different alternative. If it returns anything truthy (that is, any non-zero number), the array had at least one non-zero.
In general, you're making almost everything harder than necessary, and working through a brief NumPy tutorial and then scanning the main docs pages for useful functions would really help you.
Also, if your values aren't integers, you probably don't actually want to compare == 0 in the first place. Floating-point numbers accumulate rounding errors. To handle that, use the allclose function instead.
as the error says you need to use all
if all(np.dot(M,v) == zerovector):
or np.all. np.dot(M,v) == zerovector gives you a vector which is pair-wise comparison of the two vectors.

Categories

Resources