np.linalg.norm: "invalid value encountered in sqrt" - python

I'm working with some position vectors. I am operating each position with each other position and am using matrices to do it as efficiently as I can. I encountered a problem with my most recent version where it gives me a warning: RuntimeWarning: invalid value encountered in sqrt
return sqrt(add.reduce(s, axis=axis, keepdims=keepdims))
An example of some code that gives me this warning is below.
This warning is caused by np.linalg.norm and only happens when I specify a data type for the array, it also only happens in the example code below when I have more than 90 vectors.
Is this a NumPy bug, a known limitation in NumPy, or am I doing something wrong?
x = np.full((100, 3), 1) # Create an array of vectors, in this case all [1, 1, 1]
ps, qs = np.broadcast_arrays(x, np.expand_dims(x, 1)) # Created so that I can operate each vector on each other vector.
z = np.subtract(ps, qs, dtype=np.float32) # Get the difference between them.
np.linalg.norm(z, axis=2) # Get the magnitude of the difference.

You should make sure that Z doesn't contain any negative value!
test if you have negative values:
print len([_ for _ in z if _ < 0])

Related

Can't find values in my array with numpy.where

I have a numpy array of dimensions (30435615,3) containing coordinates expressed for example (0.0 0.0 0.0 1) and I'm looking for a method to set to True the indexes that have coordinates contained in another array. I tried with numpy.where method but I'm having some problems.
If I print the 50th element of my array I got:
>>> print(coordsRAS[50,:])
[-165.31173706 7.91322422 -271.87799072]
But if I search this point:
>>> import numpy as np
>>> print(np.where((coordsRAS[:,0]==-165.31173706) & (coordsRAS[:,1] == 7.91322422) & (coordsRAS[:,2] == -256.87799072)))
(array([], dtype=int64),)
I can't figure out why it can't find the point.
EDIT 1:
Sorry I copied the wrong value above, -256.87799072 instead of -271.87799072. However the problem was in the approximation of the print, actually the value has more significant digits for this he could not find it. In this way works:
np.where((np.round(coordsRAS[:,0],8)==-165.31173706) & (np.round(coordsRAS[:,1],8) == 7.91322422) & (np.round(coordsRAS[:,2],8) == -271.87799072))
But now I have another problem. The other array I want to compare coordsRAS to is smaller, so when I try to compare == it gives me an error.
>>> coordsRAS = np.where(coordsRAS[:,:]==points[:,:3],True,False)
C:/Users/silvi/AppData/Local/Temp/xpython_8292/987583353.py:11: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
coordsRAS = np.where(coordsRAS [:,:]==points[:,:3],True,False)
How can I set coordsRAS values to True that are also present in points?
When you are working with floats, it is not a good idea to use equality statements to find numbers, because you are always dealing with numerical inaccuracies. The answer given by Majid will fail in case you multiply your coordsRAS with pi and then divide again by pi. Theoretically it should give you the same result, but it fails:
import numpy as np
coordsRAS = np.random.random((5, 3))
point = [-165.31173706, 7.91322422, -256.87799072]
coordsRAS[4, :] = point
coordsRAS *= np.pi
coordsRAS /= np.pi
result1 = np.where((coordsRAS[:, 0] == -165.31173706), (coordsRAS[:, 1] == 7.91322422), (coordsRAS[:, 2] == -271.87799072))
print(coordsRAS[result1])
We have divided and multiplied with the same number, but now we cannot find the point anymore, due to the numerical round off error. The result in this case is:
[]
So the result is empty, because your float has slightly changed due to numerical round off errors.
The solution is to calculate the difference of your array with the required point, and search for the location where your distance falls below a certain accuracy. So you should do:
distance = np.linalg.norm(coordsRAS - point, axis=-1)
row = np.where(distance < 1e-10)
result2 = coordsRAS[row]
Now the correct point can still be found:
print(result2)
[[-165.31173706 7.91322422 -256.87799072]]
EDIT1:
In case you want to get all the locations stored in an other smaller array, you have to iterate over the points. E.g. you have the following two arrays:
coordsRAS = np.random.random((10, 3))
points = np.random.random((3, 3))
coordsRAS[4:7, :] = points
where the locations of points are stored in the coordsRAS array as well, you can find the locations of points back in the coordsRAS array as
mask_total = None
for point in points[:]:
distance = np.linalg.norm(coordsRAS - point, axis=-1)
mask = distance < 1e-10
if mask_total is None:
mask_total = mask
else:
mask_total = mask_total | mask
result = coordsRAS[mask_total]

Invalid index to scalar variable: copying entries from one np array to another np array element-wise

For context, I am writing code to compute the gray-level co-occurrence matrix in python for my data mining assignment.
when I have
c = np.array([[1,1,2,1,3],
[2,1,2,3,3],
[1,2,1,1,3],
[1,3,1,2,1],
[3,3,2,1,1]])
and call glcm(c, 3, 2, 1) it matches with what the example we went over in class and no errors occur. I have the function call within the same cell as the function definition in Jupyter notebook. When I call the function on an image matrix (np array), I get an error saying invalid index to scalar variable on the line specified in the code below.
I find this weird because it works with the example (np-array named c) but does not work on other np arrays that represent grayscale images.
Am I populating C_shift incorrectly?
def glcm(C, k, mu, nu):
C_rows = c.shape[0]
C_cols = c.shape[1]
C_shift = np.zeros((C_rows, C_cols))
C_shift[:] = np.nan # initialize everything to NaN
# calculate C_shift
for i in range(C_rows - mu):
for j in range(C_cols - nu):
C_shift[i][j] = C[i+mu][j+nu] # ERROR: invalid index to scalar variable.
# set the values of g
g = np.zeros((k,k))
for i in range(k):
Ii = mat_map(C,i+1)
for j in range(k):
Ij = mat_map(C_shift, j+1)
g[i][j] = np.multiply(Ii, Ij).sum()
return g, g.sum()
EDIT:
I resolved it, turns out that I was not actually inputting a 2d array and was inputting a flattened array.
The array named 'c' above was what I thought I was inputting but I was actually inputting a flattened array (aka a vector). That is why when I try calling C[i+mu][j+nu] I was getting the invalid index to scalar variable because when C[i+mu] is called it would just return the i+mu'th number in the array. You cannot get the j+nu'th number of a scalar which is why the error occurred.

Python Numpy error : setting an array element with a sequence

I'm quite new to Python and Numpy, so I apologize if I'm missing something obvious here.
I have a function that solves a system of 2 differential equations :
import numpy as np
import numpy.linalg as la
def solve_ode(x0, a0, beta, t):
At = np.array([[0.23*t, (-10**5)*t], [0, -beta*t]], dtype=np.float32)
# get eigenvalues and eigenvectors
evals, V = la.eig(At)
Vi = la.inv(V)
# get e^At coeff
eAt = V # np.exp(evals) # Vi
xt = eAt*x0
return xt
However, running it with this code :
import matplotlib.pyplot as plt
# initial values
x0 = 10**6
a0 = 2.5
beta = 0.05
t = np.linspace(0, 3600, 360)
plt.semilogy(t, solve_ode(x0, a0, beta, t))
... throws this error :
ValueError: setting an array element with a sequence.
At this line :
At = np.array([[0.23*t, (-10**5)*t], [0, -beta*t]], dtype=np.float32)
Note that t and beta are supposed to be floats. I think Python might not be able to infer this but I don't know how I could do this...
Thx in advance for your help.
You are supplying t as a numpy array of shape 360 from linspace and not simply a float. The resulting At numpy array you are trying to create is then ill formed as all columns must be the same length. In python there is an important difference between lists and numpy arrays. For example, you could do what you have here as a list of lists, e.g.
At = [[0.23*t, (-10**5)*t], [0, -beta*t]]
with dimensions [[360 x 360] x [1 x 360]].
Alternatively, if all elements of At are the length of t the array would work,
At = np.array([[0.23*t, (-10**5)*t], [t, -beta*t]], dtype=np.float32)
with shape [2, 2, 360].
When you give a list or a list of lists, or in this case, a list of list of listss, all of them should have the same length, so that numpy can automatically infer the dimensions (shape) of the resulting matrix.
In your example, it's all correctly put, except the part you put 0 as a column I guess. Not sure what to call it though, cause your expected output is a cube I suppose.
You can fix it by giving the correct number of zeros as bellow:
At = np.array([[0.23*t, (-10**5)*t], [np.zeros(len(t)), -beta*t]], dtype=np.float32)
But check the .shape of the resulting array, and make sure it's what you want.
As others note the problem is the 0 in the inner list. It doesn't match the 360 length arrays generated by the other expressions. np.array can make an object dtype array from that (2x2), but can't make a float one.
At = np.array([[0.23*t, (-10**5)*t], [0*t, -beta*t]])
produces a (2,2,360) array. But I suspect the rest of that function is built around the assumption that At is (2,2) - a 2d square array with eig, inv etc.
What is the return xt supposed to be?
Does this work?
S = np.array([solve_ode(x0, a0, beta, i) for i in t])
giving a 1d array with the same number of values as in t?
I'm not suggesting this is the fastest way of solving the problem, but it's the simplest, especially if you are only generating 360 values.

Why don't scipy.stats.mstats.pearsonr results agree with scipy.stats.pearsonr?

I expected that the results for scipy.stats.mstats.pearsonr for masked array inputs would give the same results for scipy.stats.pearsonr for the unmasked values of the input data, but it doesn't:
from pylab import randn,rand
from numpy import ma
import scipy.stats
# Normally distributed data with noise
x=ma.masked_array(randn(10000),mask=False)
y=x+randn(10000)*0.6
# Randomly mask one tenth of each of x and y
x[rand(10000)<0.1]=ma.masked
y[rand(10000)<0.1]=ma.masked
# Identify indices for which both data are unmasked
bothok=((~x.mask)*(~y.mask))
# print results of both functions, passing only the data where
# both x and y are good to scipy.stats
print "scipy.stats.mstats.pearsonr:", scipy.stats.mstats.pearsonr(x,y)[0]
print "scipy.stats.pearsonr:", scipy.stats.pearsonr(x[bothok].data,y[bothok].data)[0]
The answer will vary a little bit each time you do this, but the values differed by about 0.1 for me, and the bigger the masked fraction, the bigger the disagreement.
I noticed that if the same mask was used for both x and y, the results are the same for both functions, i.e.:
mask=rand(10000)<0.1
x[mask]=ma.masked
y[mask]=ma.masked
...
Is this a bug, or am I expected to precondition the input data to make sure the masks in both x and y are identical (surely not)?
I'm using numpy version '1.8.0' and scipy version '0.11.0b1'
This looks like a bug in scipy.stats.mstats.pearsonr. It appears that the values in x and y are expected to be paired by index, so if one is masked, the other should be ignored. That is, if x and y look like (using -- for a masked value):
x = [1, --, 3, 4, 5]
y = [9, 8, --, 6, 5]
then both (--, 8) and (3, --) are to be ignored, and the result should should be the same as scipy.stats.pearsonr([1, 4, 5], [9, 6, 5]).
The bug in the mstats version is that the code to compute the means of x and y does not use the common mask.
I created an issue for this on the scipy github site: https://github.com/scipy/scipy/issues/3645
We have (at least) two options for missing value handling, complete case deletion and pairwise deletion.
In your use of scipy.stats.pearsonr you completely drop cases where there is a missing value in any of the variables.
numpy.ma.corrcoef gives the same results.
Checking the source of scipy.stats.mstats.pearsonr, it doesn't do complete case deletion for the calculating the variance or the mean.
>>> xm = x - x.mean(0)
>>> ym = y - y.mean(0)
>>> np.ma.dot(xm, ym) / np.sqrt(np.ma.dot(xm, xm) * np.ma.dot(ym, ym))
0.7731167378113557
>>> scipy.stats.mstats.pearsonr(x,y)[0]
0.77311673781135637
However, the difference between complete and pairwise case deletion on mean and standard deviations is small.
The main discrepancy seems to come from the missing correction for the different number of non-missing elements. Iignoring degrees of freedom corrections, I get
>>> np.ma.dot(xm, ym) / bothok.sum() / \
np.sqrt(np.ma.dot(xm, xm) / (~xm.mask).sum() * np.ma.dot(ym, ym) / (~ym.mask).sum())
0.85855728319303393
which is close to the complete case deletion case.

My numpy array always ends in zero?

I think I missed something somewhere. I filled a numpy array using two for loops (x and y) and a function based on the x,y position. The only problem is that the value of the array always ends in zero irregardless of the size of the array.
thetamap = numpy.zeros(36, dtype=float)
thetamap.shape = (6, 6)
for y in range(0,5):
for x in range(0,5):
thetamap[x][y] = x+y
print thetamap
range(0, 5) produces 0, 1, 2, 3, 4. The endpoint is always omitted. You want simply range(6).
Better yet, use the awesome power of NumPy to make the array in one line:
thetamap = np.arange(6) + np.arange(6)[:,None]
This makes a row vector and a column vector, then adds them together using NumPy broadcasting to make a matrix.

Categories

Resources