How to check for real equality (of numpy arrays) in python?

How to check for real equality (of numpy arrays) in python? - python

I have some function in python returning a numpy.array:
matrix = np.array([0.,0.,0.,0.,0.,0.,1.,1.,1.,0.],
[0.,0.,0.,1.,1.,0.,0.,1.,0.,0.])
def some_function:
rows1, cols1 = numpy.nonzero(matrix)
cols2 = numpy.array([6,7,8,3,4,7])
rows2 = numpy.array([0,0,0,1,1,1])
print numpy.array_equal(rows1, rows2) # returns True
print numpy.array_equal(cols1, cols2) # returns True
return (rows1, cols1) # or (rows2, cols2)
It should normally extract the indices of nonzero entries of a matrix (rows1, cols1). However, I can also extract the indices manually (rows2, cols2). The problem is that the program returns different results depending on whether the function returns (rows1, cols1) or (rows2, cols2), although the arrays should be equal.
I should probably add that this code is used in the context of pyipopt, which calls a c++ software package IPOPT. The problem then occurs within this package.
Can it be that the arrays are not "completely" equal? I would say that they somehow must be because I am not modifying anything but returning one instead of the other.
Any idea on how to debug this problem?

You could check where the arrays are not equal:
print(where(rows1 != rows2))
But what you are doing is unclear, first there is no nonzeros function in numpy, only a nonzero which returns a tuple of coordinates. Are you only using the one corresponding to the rows?

Related

Use of np.where()[0]

My code detects all the points under a threshold, then locates the start and end points.
below = np.where(self.data < self.threshold)[0]
startandend = np.diff(below)
startpoints = np.insert(startandend, 0, 2)
endpoints = np.insert(startandend, -1, 2)
startpoints = np.where(startpoints>1)[0]
endpoints = np.where(endpoints>1)[0]
startpoints = below[startpoints]
endpoints = below[endpoints]
I don't really get the use of [0] after np.where() function here

below = np.where(self.data < self.threshold)[0]
means:
take the first element from the tuple of ndarrays returned by np.where() and
assign it to below.

np.where is tricky. It returns an array of lists of indices where the conditions are met, even if the condition is never satisfied. In the case of np.where(my_numpy_array==some_value)[0] specifically, this means that you want the first value in the array, which is a list, and which contains the list of indices of condition-meeting cells.
Quite a mouthful. In simple terms, np.where(array==x)[0] returns a list of indices where the conditions have been met. I'm guessing this is a result of designing numpy for extensively broad applications.
Keep in mind that no matches still returns an empty list; errors like only size-1 arrays can be converted to python (some type) may be attributed to that.

Finding repeated rows in a numpy array

The following function is designed to find the unique rows of an array:
def unique_rows(a):
b = np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
_, idx = np.unique(b, return_index=True)
unique_a = a[idx]
return unique_a
For example,
test = np.array([[1,0,1],[1,1,1],[1,0,1]])
unique_rows(test)
[[1,0,1],[1,1,1]]
I believe that this function should work all the time, however it may not be watertight. In my code I would like to calculate how many unique positions exist for a set of particles. The particles are stored in a 2d array, each row corresponding to the position of a particle. The positions are of type np.float64.
I have also defined the following function
def pos_tag(pos):
x,y,z = pos[:,0],pos[:,1],pos[:,2]
return (2**x)*(3**y)*(5**z)
In principle this function should produce a unique value for any (x,y,z) position.
However, when I use these to functions to calculate the number of unique positions in my set of particles they produce different answers. Is this due to some possible logical flaw in the first function, or the second function not producing a unique value for each given position?
EDIT: Usage example
I have some long code that produces a 2d array of particle postions.
partpos.shape = (6039539,3)
I then calculate the number of unique rows as follows
len(unqiue_rows(partpos))
6034411
And
posids = pos_tag(partpos)
len(np.unique(posids))
5328871

I believe that the discrepancy arises due to a precision error.
Using the code
print len(unique_rows(partpos.astype(np.float32)))
print len(np.unique(pos_tag(partpos)))
6034411
6034411
However with
print len(unique_rows(partpos.astype(np.float32)))
print len(np.unique(pos_tag(partpos.astype(np.float32))))
6034411
5328871

a = [[1,0,1],[1,1,1],[1,0,1]]
# Convert rows to tuples so they're hashable, creating a generator thereof
b = (tuple(row) for row in a)
# Convert back to list of lists, after coercing to a set to eliminate non-unique rows
unique_rows = list(list(row) for row in set(b))
Edit: Well that's embarrassing. I just realized I didn't really address the question asked. This could still be the answer the OP is looking for, so I'll leave it, but it's not really what was asked. Sorry for that.

TypeError: return arrays must be of ArrayType

epsData is a two-dimensional array consisting of Dates and StockID.
I took out some of the code in order to make it simple.
The code calls the functions Generate and neweps, epsData is passed by the engine. I am not sure why it gives an error when I try to pass the array epsss to the SUE() function.
I tried to remove the extra bracket in array (if any) by using flatten function but that does not help.
SUE() is supposed to loop through the array and find the 4th last different value and then store these in an array.
I get this error:
TypeError: return arrays must be of ArrayType
with the three lines marked below:
def lastdifferentvalue(vals,datas,i):
sizes=len(datas)
j=sizes-1
values=0
while (i>0) and (j>=0):
if logical_and((vals-datas[j]!=0),(datas[j]!=0),(datas[j-1]!=0)): # !! HERE !!
i=i-1
values=datas[j-1]
j=j-1
return j, values
def SUE(datas):
sizes=len(datas)
j=sizes-1
values=0
sues=zeros(8)
eps1=datas[j]
i=7
while (j>0) and (i>=0) :
counts, eps2=lastdifferentvalue(eps1,array(datas[0:j]),4)
if eps2!=0:
sues[i]=eps1-eps2
i=i-1
j,eps1=lastdifferentvalue(eps1,datas[0:j],1) # !! HERE !!
stddev=std(SUE)
sue7=SUE[7]
return stddev,sue7
def Generate(di,alpha):
#the code below loops through the data. neweps is a two dimensional array of floats [dates, stockid]
for ii in range(0,len(alpha)):
if (epss[2,ii]-epss[1,ii]!=0) and (epss[2,ii]!=0) and (epss[1,ii]!=0):
predata=0
epsss= neweps[di-delay-250:di-delay+1,ii]
stddevs,suedata= SUE(array(epsss.flatten())) # !! HERE !!

Presumably, you're using numpy.logical_and, in the form of
np.logical_and(a, b, c)
with the meaning that you'd like to take the logical and of the three. If you check the documentation, though, that's not what it does. It's interpreting c as the array where you intend to store the results.
You probably mean here something like
np.logical_and(a, np.logical_and(b, c))
or
from functools import reduce
reduce(np.logical_and, [a, b, c])

The line:
if logical_and((vals-datas[j]!=0),(datas[j]!=0),(datas[j-1]!=0))
has two errors:
Presumably you are wanting to perform a logical_and over (vals-datas[j] != 0) and (datas[j] != 0) and (datas[j-1] != 0). However numpy.logical_and only takes two input parameters, the third if passed is assumed to be an output array. Thus if you are wishing to have numpy.logical_and operate over three arrays it should be expressed as:
logical_and(logical_and((vals-datas[j] != 0), (datas[j] != 0)), (datas[j-1] != 0))
In any case, using a logical_and in an if statement makes no sense. It returns an array and an array does not have a truth value. That is, the result of a logical_and is an array of booleans, some of which are true and some false. Are you wishing to check if they are all true? Or if at least some are true?
If the former, then you should test it as:
if numpy.all(logical_and(...)):
...
And if the latter then test it as:
if numpy.any(logical_and(...)):
...

Manipulating copied numpy array without changing the original

I am trying to manipulate a numpy array that contains data stored in an other array. So far, when I change a value in my array, both of the arrays get values changed:
import numpy as np
from astropy.io import fits
image = fits.getdata("randomImage.fits")
fft = np.fft.fft2(image)
fftMod = np.copy(fft)
fftMod = fftMod*2
if fftMod.all()== fft.all():
print "shit same same same "
-- > shit same same same
Why is?

You misunderstood the usage of the .all() method.
It yields True if all elements of an array are not 0. This seems to be the case in both your arrays or in neither of them.
Since one is the double of the other, they definetly give the same result to the .all() method (both True or both False)
edit as requested in the comments:
To compare the content of the both arrays use element wise comparison first and check that all elements are True with .all:
(fftMod == fft).all()
Or maybe better for floats including a certain tolerance:
np.allclose(fftMod, fft)

Comparing two vectors

I have some code where I want to test if the product of a matrix and vector is the zero vector. An example of my attempt is:
n =2
zerovector = np.asarray([0]*n)
for column in itertools.product([0,1], repeat = n):
for row in itertools.product([0,1], repeat = n-1):
M = toeplitz(column, [column[0]]+list(row))
for v in itertools.product([-1,0,1], repeat = n):
vector = np.asarray(v)
if (np.dot(M,v) == zerovector):
print M, "No good!"
break
But the line if (np.dot(M,v) == zerovector): gives the error ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(). What is the right way to do this?

The problem is that == between two arrays is an element-wise comparison—you get back an array of boolean values. An array of boolean values isn't a boolean value itself, so you can't use it in an if. This is what the error is trying to tell you.
You could solve this by using the all method, to check whether all of the elements in the boolean array are true. But you're making this way more complicated than you need to. Nonzero values are truthy, zero values are falsey, so you can just use any without a comparison:
if not np.dot(M, v).any():
If you want to make the comparison to zero explicit, just compare to a scalar, don't build a zero vector; it'll get broadcast the same way. And, if you ever do want to build a zero vector, just use the zeros function; don't build a list of zeros in a complicated way and pass it to asarray.
You could also use the count_nonzero function here as a different alternative. If it returns anything truthy (that is, any non-zero number), the array had at least one non-zero.
In general, you're making almost everything harder than necessary, and working through a brief NumPy tutorial and then scanning the main docs pages for useful functions would really help you.
Also, if your values aren't integers, you probably don't actually want to compare == 0 in the first place. Floating-point numbers accumulate rounding errors. To handle that, use the allclose function instead.

as the error says you need to use all
if all(np.dot(M,v) == zerovector):
or np.all. np.dot(M,v) == zerovector gives you a vector which is pair-wise comparison of the two vectors.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to check for real equality (of numpy arrays) in python? - python

You could check where the arrays are not equal: print(where(rows1 != rows2)) But what you are doing is unclear, first there is no nonzeros function in numpy, only a nonzero which returns a tuple of coordinates. Are you only using the one corresponding to the rows?

Related

Use of np.where()[0]

Finding repeated rows in a numpy array

TypeError: return arrays must be of ArrayType

Manipulating copied numpy array without changing the original

Comparing two vectors

Categories

Resources