Use of np.where()[0] - python

My code detects all the points under a threshold, then locates the start and end points.
below = np.where(self.data < self.threshold)[0]
startandend = np.diff(below)
startpoints = np.insert(startandend, 0, 2)
endpoints = np.insert(startandend, -1, 2)
startpoints = np.where(startpoints>1)[0]
endpoints = np.where(endpoints>1)[0]
startpoints = below[startpoints]
endpoints = below[endpoints]
I don't really get the use of [0] after np.where() function here

below = np.where(self.data < self.threshold)[0]
means:
take the first element from the tuple of ndarrays returned by np.where() and
assign it to below.

np.where is tricky. It returns an array of lists of indices where the conditions are met, even if the condition is never satisfied. In the case of np.where(my_numpy_array==some_value)[0] specifically, this means that you want the first value in the array, which is a list, and which contains the list of indices of condition-meeting cells.
Quite a mouthful. In simple terms, np.where(array==x)[0] returns a list of indices where the conditions have been met. I'm guessing this is a result of designing numpy for extensively broad applications.
Keep in mind that no matches still returns an empty list; errors like only size-1 arrays can be converted to python (some type) may be attributed to that.

Related

Finding repeated rows in a numpy array

The following function is designed to find the unique rows of an array:
def unique_rows(a):
b = np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
_, idx = np.unique(b, return_index=True)
unique_a = a[idx]
return unique_a
For example,
test = np.array([[1,0,1],[1,1,1],[1,0,1]])
unique_rows(test)
[[1,0,1],[1,1,1]]
I believe that this function should work all the time, however it may not be watertight. In my code I would like to calculate how many unique positions exist for a set of particles. The particles are stored in a 2d array, each row corresponding to the position of a particle. The positions are of type np.float64.
I have also defined the following function
def pos_tag(pos):
x,y,z = pos[:,0],pos[:,1],pos[:,2]
return (2**x)*(3**y)*(5**z)
In principle this function should produce a unique value for any (x,y,z) position.
However, when I use these to functions to calculate the number of unique positions in my set of particles they produce different answers. Is this due to some possible logical flaw in the first function, or the second function not producing a unique value for each given position?
EDIT: Usage example
I have some long code that produces a 2d array of particle postions.
partpos.shape = (6039539,3)
I then calculate the number of unique rows as follows
len(unqiue_rows(partpos))
6034411
And
posids = pos_tag(partpos)
len(np.unique(posids))
5328871
I believe that the discrepancy arises due to a precision error.
Using the code
print len(unique_rows(partpos.astype(np.float32)))
print len(np.unique(pos_tag(partpos)))
6034411
6034411
However with
print len(unique_rows(partpos.astype(np.float32)))
print len(np.unique(pos_tag(partpos.astype(np.float32))))
6034411
5328871
a = [[1,0,1],[1,1,1],[1,0,1]]
# Convert rows to tuples so they're hashable, creating a generator thereof
b = (tuple(row) for row in a)
# Convert back to list of lists, after coercing to a set to eliminate non-unique rows
unique_rows = list(list(row) for row in set(b))
Edit: Well that's embarrassing. I just realized I didn't really address the question asked. This could still be the answer the OP is looking for, so I'll leave it, but it's not really what was asked. Sorry for that.

How to check for real equality (of numpy arrays) in python?

I have some function in python returning a numpy.array:
matrix = np.array([0.,0.,0.,0.,0.,0.,1.,1.,1.,0.],
[0.,0.,0.,1.,1.,0.,0.,1.,0.,0.])
def some_function:
rows1, cols1 = numpy.nonzero(matrix)
cols2 = numpy.array([6,7,8,3,4,7])
rows2 = numpy.array([0,0,0,1,1,1])
print numpy.array_equal(rows1, rows2) # returns True
print numpy.array_equal(cols1, cols2) # returns True
return (rows1, cols1) # or (rows2, cols2)
It should normally extract the indices of nonzero entries of a matrix (rows1, cols1). However, I can also extract the indices manually (rows2, cols2). The problem is that the program returns different results depending on whether the function returns (rows1, cols1) or (rows2, cols2), although the arrays should be equal.
I should probably add that this code is used in the context of pyipopt, which calls a c++ software package IPOPT. The problem then occurs within this package.
Can it be that the arrays are not "completely" equal? I would say that they somehow must be because I am not modifying anything but returning one instead of the other.
Any idea on how to debug this problem?
You could check where the arrays are not equal:
print(where(rows1 != rows2))
But what you are doing is unclear, first there is no nonzeros function in numpy, only a nonzero which returns a tuple of coordinates. Are you only using the one corresponding to the rows?

Comparing two vectors

I have some code where I want to test if the product of a matrix and vector is the zero vector. An example of my attempt is:
n =2
zerovector = np.asarray([0]*n)
for column in itertools.product([0,1], repeat = n):
for row in itertools.product([0,1], repeat = n-1):
M = toeplitz(column, [column[0]]+list(row))
for v in itertools.product([-1,0,1], repeat = n):
vector = np.asarray(v)
if (np.dot(M,v) == zerovector):
print M, "No good!"
break
But the line if (np.dot(M,v) == zerovector): gives the error ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(). What is the right way to do this?
The problem is that == between two arrays is an element-wise comparison—you get back an array of boolean values. An array of boolean values isn't a boolean value itself, so you can't use it in an if. This is what the error is trying to tell you.
You could solve this by using the all method, to check whether all of the elements in the boolean array are true. But you're making this way more complicated than you need to. Nonzero values are truthy, zero values are falsey, so you can just use any without a comparison:
if not np.dot(M, v).any():
If you want to make the comparison to zero explicit, just compare to a scalar, don't build a zero vector; it'll get broadcast the same way. And, if you ever do want to build a zero vector, just use the zeros function; don't build a list of zeros in a complicated way and pass it to asarray.
You could also use the count_nonzero function here as a different alternative. If it returns anything truthy (that is, any non-zero number), the array had at least one non-zero.
In general, you're making almost everything harder than necessary, and working through a brief NumPy tutorial and then scanning the main docs pages for useful functions would really help you.
Also, if your values aren't integers, you probably don't actually want to compare == 0 in the first place. Floating-point numbers accumulate rounding errors. To handle that, use the allclose function instead.
as the error says you need to use all
if all(np.dot(M,v) == zerovector):
or np.all. np.dot(M,v) == zerovector gives you a vector which is pair-wise comparison of the two vectors.

Numpy nonzero/flatnonzero index order; order of returned elements in boolean indexing

I'm wondering about the order of indices returned by numpy.nonzero / numpy.flatnonzero.
I couldn't find anything in the docs about it. It just says:
A[nonzero(flag)] == A[flag]
While in most cases this is enough, there are some when you need a sorted list of indices. Is it guaranteed that returned indices are sorted in case of 1-D or I need to sort them explicitly? (A similar question is the order of elements returned simply by selecting with a boolean array (A[flag]) which must be the same according to the docs.)
Example: finding the "gaps" between True elements in flag:
flag=np.array([True,False,False,True],dtype=bool)
iflag=flatnonzero(flag)
gaps= iflag[1:] - iflag[:-1]
Thanks.
Given the specification for advanced (or "fancy") indexing with integers, the guarantee that A[nonzero(flag)] == A[flag] is also a guarantee that the values are sorted low-to-high in the 1-d case. However, in higher dimensions, the result (while "sorted") has a different structure than you might expect.
In short, given a 1-dimensional array of integers ind and a 1-dimensional array x to be indexed, we have the following for all valid i defined for ind:
result[i] = x[ind[i]]
result takes the shape of ind, and contains the values of x at the indices indicated by ind. This means that we can deduce that if x[flag] maintains the original order of x, and if x[nonzero(flag)] is the same as x[flag], then nonzero(flag) must always produce indices in sorted order.
The only catch is that for multidimensional arrays, the indices are stored as distinct arrays for each dimension being indexed. So in other words,
x[array([0, 1, 2]), array([0, 0, 0])]
is equal to
array([x[0, 0], x[1, 0], x[2, 0]])
The values are still sorted, but each dimension is broken out into its own array. (You can do interesting things with broadcasting as a result; but that's beyond the scope of this answer.)
The only problem with this line of reasoning is that -- to my great surprise -- I can't find an explicit statement guaranteeing that boolean indexing preserves the original order of the array. Nonetheless, I'm quite certain from experience that it does. More generally, it would be unbelievably perverse to have x[[True, True, True]] return a reversed version of x.

how to deal in a pythonic way with list of arrays or single array

I have this issue:
in my software I am either dealing with a single array or a list of 3 arrays (they are 1 or 3 components of a pixelized sky map).
If the single array was a list of 1 array, then it would be very easy to iterate over it transparently, regardless the number of elements.
Now, let's say I want to square these maps:
my_map = np.ones(100) # case of single component
# my_map = [np.ones(100) for c in [0, 1, 2]] # case of 3 components
if isinstance(my_map, list): #this is ugly
my_map_2 = [m**2 for m in my_map]
else:
my_map = my_map ** 2
would you have any suggestion on how to improve this?
Why wouldn't you directly create a 2D array ?
my_array = np.ones((100,3), dtype=float)
That way, you could directly square your 'three' arrays at once. You could still access individual elements as:
(x, y, z) = my_array.T
where .T is a shortcut for the .transpose method.
Using this approach would be far more efficient than looping on a list, especially if you apply the same function to the three arrays. Even if you want, say, to square the first array, double the second and take the square root of the third, you could:
my_array[:,0] **=2
my_array[:,1] *=2
my_array[:,2] **=0.5
You can convert your return value to list if it is a single value.. using list() factory method..
my_map = []
temp = np.ones(100) # case of single component
# Append your temp value.. either single or a list to your empty list..
my_map.append(temp)
my_map_2 = [m**2 for m in my_map]
I assume that your method np.ones(100) may return a single value and even a list..
Have you tried numpy.asarray? Then the if-else would just be
my_map = numpy.asarray(my_map)**2
Also check out numpy.asanyarray if you want to handle subclasses of ndarrays as well.
I often put a numpy.asarray call at the beginning of my functions, so they work on both lists and arrays.

Categories

Resources