How to test for closeness in angular quantities - python

I'm trying to write a unit test where the result should be an array of arrays of zero degrees. Using np.assert_allclose results in the following failure:
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0.000277778
E
E (mismatch 100.0%)
E x: array([[ 3.600000e+02],
E [ 3.155310e-10]])
E y: array([[0],
E [0]])
What's clearly happening is that the code is working ( [[360], [3e-10]] is close enough to [[0], [0]] for an angular quantities for me), but np.assert_allclose doesn't realize that 0 ≅ 360.
Is there a way to use numpy's testing framework for comparisons where I don't care if the values are off by multiples of 360?
In this particular case, printing the first element of the array with np.set_printoptions(precision=30) gives me 359.999999999823955931788077577949, so this isn't a case that can just be normalized to be between 0 and 360.
This is not a package I maintain, so I'd like to not include other dependencies besides astropy and numpy.

(edited answer, previous version was wrong)
Use e.g. this to reduce your values to the required range:
>>> def _h(x, a):
... xx = np.mod(x, a)
... return np.minimum(xx, np.abs(a - xx))
Then
>>> xx = np.asarray([1, -1, 359, 361, 360*3+1, -8*360 + 2])
>>> _h(xx, 360)
array([1, 1, 1, 1, 1, 2])

Given that all the numbers you want to test for closeness on a circle are in a ndarray named a, then
np.allclose(np.fmod(a+180, 360)-180,0, atol=mytol)
or, even simpler,
np.allclose(np.fmod(a+180, 360),180, atol=mytol)
is all you need (note that 180 is quite arbitrary indeed, it's just that you have to move the comparison away from 0 aka 360)
Edit
I had deleted my answer because of a flaw, that was shown to me in a comment by ev-br, but later I changed my mind because (thank you ev-br) I saw the light.
One wants to test if a point on a circle, identified by an angle in degrees, is close to the point identified by the angle 0. First, the distance on the circumference D(0,theta) is equal to D(0,-theta), hence we can compare the absolute values of the angles.
The test I proposed above is valid, or at least I think so, for any positive value of theta.
If I use the above test on the absolute values of the angles to be tested, everything
should be ok, shouldn't it? Here follows a bit of testing
In [1]: import numpy as np
In [2]: a = np.array([0, 1e-5,-1e-7,360.1,-360.1,359.9,-359.9,3600.1,-3600.1,3599.9,-3599.9])
In [3]: np.allclose(np.mod(np.abs(a)+180, 360), 180, atol=0.2)
Out[3]: True
In [4]:

Related

Unclear array of Array notation in numpy

I am on my way to understand a vectorized approach to calculating (and plotting) Julia sets. On the web, I found the following code (annotations are mainly mine, based on my growing understanding of the ideas behind the code):
import numpy as np
import matplotlib.pyplot as plt
c = -0.74543+0.11301j # Example value for this picture (Julia set)
n = 512 # Maximum number of iterations
x = np.linspace(-1.5, 1.5, 2000).reshape((1, 2000)) # 1 row, 2000 columns
y = np.linspace(-1.2, 1.2, 1600).reshape((1600, 1)) # 1600 rows, 1 column
z = x + 1j*y # z is an array with 1600 * 2000 complex entries
c = np.full(z.shape, c) # c is a complex number matrix to be added for the iteration
diverge = np.zeros(z.shape) # 1600 * 2000 zeroes (0s), contains divergent iteration counts
m = np.full(z.shape, True) # 1600 * 2000 True, used as a kind of mask (convergent values)
for i in range(0,n): # Do at most n iterations
z[m] = z[m]**2 + c[m] # Matrix op: Complex iteration for fixed c (Julia set perspective)
m[np.abs(z) > 2] = False # threshold for convergence of absolute(z) is 2
diverge[m] = i
plt.imshow(diverge, cmap='magma') # Color map "magma" applied to the iterations for each point
plt.show() # Display image plotted
I don't understand the mechanics of the line
diverge[m] = i
I gather that m is a 1600*2000 element array of Booleans. It seems that m is used as a kind of mask to let stand only those values in diverge[] for which the corresponding element in m is True. Yet I would like to understand this concept in greater detail. The syntax diverge[m] = i seems to imply that an array is used as some sort of generalized "index" to another array (diverge), and I could use some help understanding this concept. (The code runs as expected, I just have problems understanding the working of it.)
Thank you.
Yes, you can use an array to index another. In many many ways. That a complex matter. And even if I flatter myself to understand numpy quite a bit now, I still sometimes encouter array indexation that make me scratch my head a little bit before I understand.
But this case is not a very complex one
M=np.array([[1,2,3],
[4,5,6],
[7,8,9]])
msk=np.array([[True, False, True],
[True, True, True],
[False, True, False]])
M[msk]
Returns array([1, 3, 4, 5, 6, 8]). You can, I am sure, easily understand the logic.
But more importantly, indexation is a l-value. So that means that M[msk] can be to the left side of the =. And then the values of M are impacted
So, that means that
M[msk]=0
M
shows
array([[0, 2, 0],
[0, 0, 0],
[7, 0, 9]])
Likewise
M=np.array([[1,2,3],
[4,5,6],
[7,8,9]])
A=np.array([[2,2,4],
[4,6,6],
[8,8,8]])
msk=np.array([[True, False, True],
[True, True, True],
[False, True, False]])
M[msk] = M[msk]+A[msk]
M
Result is
array([[ 3, 2, 7],
[ 8, 11, 12],
[ 7, 16, 9]])
So back to your case,
z[m] = z[m]**2 + c[m] # Matrix op: Complex iteration for fixed c (Julia set perspective)
Is somehow just an optimisation. You could have also just z=z**2+c. But why would be the point to compute that even where overflow has already occured. So, it computes z=z**2+c only where there was no overflow yet
m[np.abs(z) > 2] = False # threshold for convergence of absolute(z) is 2
np.abs(s)>2 is a 2d array of True/False values. m is set to False at for every "pixels" for which |z|>2. Other values of m remain unchanged. So they stay False if they were already False. Note that this one is slightly over complicated. Since, because of the previous line, z doesn't change once it became >2, in reality, there is no pixels where np.abs(z)<=2 and yet m is already False. So
m=np.abs(z)<=2
would have worked as well. And it would not have been slower, since the original version computes that anyway. In fact, it would be faster, since we spare the indexation/affecation operation. On my computer my version runs 1.3 seconds faster than the original (on a 12 second computation time. So 10% approx.)
But the original version has the merit to makes next line easier to understand, becaus it makes one point clear: m starts with all True values, and then some values turn False as long as algorithm runs, but none never become True again.
diverge[m] = i
m being the mask of pixels that has not yet diverged (it starts with all True, and as long as we iterate, more and more values of m are False).
So doing so update diverge to i everywhere no divergence occured yet (the name of the variable is not the most pertinent).
So pixels whose z values become>2 at iteration 50, so whose m value became False at iteration 50, would have been updated to 1, then 2, then 3, then 4, ..., then 48, then 49 by this line. But not to 50, 51, ...
So at the end, what stays in "diverge" is the last i for which m was still True. That is the last i for which algorithm was still converging. Or, at 1 unit shift, the first one for which algorithm diverges.

Solve linear equation with 2 unkown and 3 equations in numpy with np.linalg.solve

3 euqations with two unknow have 3 solutions: One solution, infinte solutions, no solution.
How would you write this in Numpy
to get the solutions?
I tried it the way you would do it with 3 unknowns:
import numpy as np
a = np.array([-9,-8, 14])
A = np.array([[ 1, 2, -2],
[-3,-1, 4],
])
x = np.linalg.solve(A, a)
print(x)
But gives an Error, as A is not square. Sadly if I remove the last column of a and A, although i get an answer the system might still have no solution as it might not fit in the third equation.
You do all this using the lstsq method. For example,
a = np.array([-9,-8, 14])
A = np.array([[ 1, 2, -2],
[-3,-1, 4],
])
x,err,rk = np.linalg.lstsq(A.T, a)[:3]
print(x)
print(err)
print(rk)
yields the output
[-3. 2.]
[9.98402083e-31]
2
From the fact that the error is zero (up to numerical precision), you know that this solution is exact, which is to say that A.T#x should exactly equal a. So, the system has at least one solution.
From the fact that the rank is 2 (which matches the number of columns in A.T), we deduce that A.T has a trivial nullspace, which means that any solutions are unique.

Why don't scipy.stats.mstats.pearsonr results agree with scipy.stats.pearsonr?

I expected that the results for scipy.stats.mstats.pearsonr for masked array inputs would give the same results for scipy.stats.pearsonr for the unmasked values of the input data, but it doesn't:
from pylab import randn,rand
from numpy import ma
import scipy.stats
# Normally distributed data with noise
x=ma.masked_array(randn(10000),mask=False)
y=x+randn(10000)*0.6
# Randomly mask one tenth of each of x and y
x[rand(10000)<0.1]=ma.masked
y[rand(10000)<0.1]=ma.masked
# Identify indices for which both data are unmasked
bothok=((~x.mask)*(~y.mask))
# print results of both functions, passing only the data where
# both x and y are good to scipy.stats
print "scipy.stats.mstats.pearsonr:", scipy.stats.mstats.pearsonr(x,y)[0]
print "scipy.stats.pearsonr:", scipy.stats.pearsonr(x[bothok].data,y[bothok].data)[0]
The answer will vary a little bit each time you do this, but the values differed by about 0.1 for me, and the bigger the masked fraction, the bigger the disagreement.
I noticed that if the same mask was used for both x and y, the results are the same for both functions, i.e.:
mask=rand(10000)<0.1
x[mask]=ma.masked
y[mask]=ma.masked
...
Is this a bug, or am I expected to precondition the input data to make sure the masks in both x and y are identical (surely not)?
I'm using numpy version '1.8.0' and scipy version '0.11.0b1'
This looks like a bug in scipy.stats.mstats.pearsonr. It appears that the values in x and y are expected to be paired by index, so if one is masked, the other should be ignored. That is, if x and y look like (using -- for a masked value):
x = [1, --, 3, 4, 5]
y = [9, 8, --, 6, 5]
then both (--, 8) and (3, --) are to be ignored, and the result should should be the same as scipy.stats.pearsonr([1, 4, 5], [9, 6, 5]).
The bug in the mstats version is that the code to compute the means of x and y does not use the common mask.
I created an issue for this on the scipy github site: https://github.com/scipy/scipy/issues/3645
We have (at least) two options for missing value handling, complete case deletion and pairwise deletion.
In your use of scipy.stats.pearsonr you completely drop cases where there is a missing value in any of the variables.
numpy.ma.corrcoef gives the same results.
Checking the source of scipy.stats.mstats.pearsonr, it doesn't do complete case deletion for the calculating the variance or the mean.
>>> xm = x - x.mean(0)
>>> ym = y - y.mean(0)
>>> np.ma.dot(xm, ym) / np.sqrt(np.ma.dot(xm, xm) * np.ma.dot(ym, ym))
0.7731167378113557
>>> scipy.stats.mstats.pearsonr(x,y)[0]
0.77311673781135637
However, the difference between complete and pairwise case deletion on mean and standard deviations is small.
The main discrepancy seems to come from the missing correction for the different number of non-missing elements. Iignoring degrees of freedom corrections, I get
>>> np.ma.dot(xm, ym) / bothok.sum() / \
np.sqrt(np.ma.dot(xm, xm) / (~xm.mask).sum() * np.ma.dot(ym, ym) / (~ym.mask).sum())
0.85855728319303393
which is close to the complete case deletion case.

Numpy to check if a solution exists such that each row is < 0?

Consider the following code
X=np.matrix([[1,-1,1],[-1,0,1]])
print X.T
'''
[[ 1 -1]
[-1 0]
[ 1 1]]
'''
I want to check if a solution exists where the transpose has a <0 solution. For example this would mean checking if the following has a solution
1*y1 + -1*y2 < 0
-1*y1 + 0*y2 < 0
1*y1 + 1*y2 < 0
Tried reading http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.solve.html#numpy.linalg.solve but apparently no such luck
It seems that your question is equivalent to asking if the plane that contains the origin and also vectors U=r_[1,-1,1] and V=r_[-1, 0, 1] extends into the octant of 3-d space where all coords are negative.
The cross product UxV (or cross(U,V) is normal to this plane. If this cross-product has three nonzero components all of the same sign, then none of the the normals from it can be in the dreaded octant. For the case of your numbers, I get all three components negative, so there is no solution.
[UPDATE]
In general, the tricky things happen when the normal contains zeros:
Three-zeros: Your original vectors are parallel, or one of them is zero. Pick one that is not zero and if all components have the same sign, then you have a solution.
Two-zeros: Your plane is one of X=0, Z=0, Y=0. Thus one dimension is always nonnegative, there are no solutions.
One-zero: Your plain includes the X, Y or Z axis. There is a solution if and only if the remaining two components of the normal have differing signs.
here is the documentation you need:
numpy apply along axis
import numpy as np:
def func(b,y1,y2):
a = b.T
if a[0]*y1 + a[1]*y2 < 0:
return True
else:
return False
np.apply_along_axis(func,0,X,y1,y2)
so now let's say you want y1 as -1 and y2 as 3:
>>> np.apply_along_axis(func,0,X,-1,3)
array([ True, False, False], dtype=bool)
so this means on transpose the first row (which would be the normal first column) works with your algorithm, the second and third do not!
this is a function for an arbitrary number of Ys as in as large of a matrix as you want:
def func(b,*args):
a = b.T
total = [a[i]*args[i] for i in range(len(args)-1)]
if sum(total) < 0:
return True
else:
return False

What is the best way of getting random numbers in NumPy?

I want to generate random numbers in the range -1, 1 and want each one to have equal probability of being generated. I.e. I don't want the extremes to be less likely to come up. What is the best way of doing this?
So far, I have used:
2 * numpy.random.rand() - 1
and also:
2 * numpy.random.random_sample() - 1
Your approach is fine. An alternative is to use the function numpy.random.uniform():
>>> numpy.random.uniform(-1, 1, size=10)
array([-0.92592953, -0.6045348 , -0.52860837, 0.00321798, 0.16050848,
-0.50421058, 0.06754615, 0.46329675, -0.40952318, 0.49804386])
Regarding the probability for the extremes: If it would be idealised, continuous random numbers, the probability to get one of the extremes would be 0. Since floating point numbers are a discretisation of the continuous real numbers, in realitiy there is some positive probability to get some of the extremes. This is some form of discretisation error, and it is almost certain that this error will be dwarved by other errors in your simulation. Stop worrying!
Note that numpy.random.rand allows to generate multiple samples from a uniform distribution at one call:
>>> np.random.rand(5)
array([ 0.69093485, 0.24590705, 0.02013208, 0.06921124, 0.73329277])
It also allows to generate samples in a given shape:
>>> np.random.rand(3,2)
array([[ 0.14022471, 0.96360618],
[ 0.37601032, 0.25528411],
[ 0.49313049, 0.94909878]])
As You said, uniformly distributed random numbers between [-1, 1) can be generated with:
>>> 2 * np.random.rand(5) - 1
array([ 0.86704088, -0.65406928, -0.02814943, 0.74080741, -0.14416581])
From the documentation for numpy.random.random_sample:
Results are from the “continuous uniform” distribution over the stated interval. To sample Unif[A, b), b > a multiply the output of random_sample by (b-a) and add a:
(b - a) * random_sample() + a
Per Sven Marnach's answer, the documentation probably needs updating to reference numpy.random.uniform.
To ensure that the extremes of range [-1, 1] are included, I randomly generate a numpy array of integers in the range [0, 200000001[. The value of the latter integer depends on the final numpy data type that is desired. Here, I take the numpy float64, which is the default type used for numpy arrays. Then, I divide the numpy array with 100000000 to generate floats and subtract with unity. Code for this is:
>>> import numpy as np
>>> number = ((np.random.randint(low=0, high=200000001, size=5)) / 100000000) - 1
>>> print(number)
[-0.65960772 0.30378946 -0.05171788 -0.40737182 0.12998227]
Make sure not to transform these numpy floats to python floats to avoid rounding errors.

Categories

Resources