np.cov() matrix returns unexpected values

np.cov() matrix returns unexpected values - python

I am trying to find the covariance matrix of all possible images(flattened) with each pixel - {0,1}.
I have written following code using numpy:
import numpy as np
a = np.array(np.meshgrid([1,0], [1, 0], [1, 0],[1,0],[0,1])).T.reshape(-1,5)
a = np.transpose(a)
covariance = np.cov(a)
print(covariance)
I get output 0.25806452 in the diagonal. But I think the diagonal should be exactly 0.25.
Can anyone explain why it isn't 0.25?

It is being normalised by 1/(N-1), not 1/N. Set the ddof parameter to change this behaviour.

Related

How can I get a 'broadcasted' (m,n) array from a (m,)-valued Sympy Matrix which should take a (n,) numpy array as an argument?

I have the following case:
I defined a Sympy Matrix (Vector) which is a function of parameters in some, but not all elements. So e.g. take
from sympy import *
a = Symbol('a')
M = Matrix([a,0])
Now I want this to be a function which takes numpy arrays as an element, I used lambdify for this. Actually I want M to be a row vector so I did the following which I found here.
funcM = lambdify([a], M.T.tolist()[0], 'numpy')
Passing a list or an array, e.g. [0,1] to this new function gives me:
In [596]: funcM([0,1])
Out[596]: [[0, 1], 0]
Actually I want the function funcM to work in a way that the output is
[[0,1],[0,0]]
so that the output contains two column vectors, one for each input value in the list, so the column with 0,0 for the input 0 and the column 1,0 for the input 1.
Thanks for helping me!

LinearNDInterpolatorExtrapolate returns error with trivial example

I'm trying to use scipy's LinearNDInterpolatorExtrapolate.
The following minimal code should be as trivial as possible, yet it returns an error
from scipy.interpolate import NearestNDInterpolator
points = [[0,0,0], [1,0,0], [1,1,0],[0,1,0],[.5,.5,1]]
values = [1,2,3,4,5]
interpolator = NearestNDInterpolator(points,values)
interpolator([.5,.5,.8])
returns
TypeError: only integer scalar arrays can be converted to a scalar index
The error seems to come from line 81 of scipy.interpolate.ndgriddata [source]. Unfortunately I could not chase the error further, as I don't understand what tree.query is returning.
Is this a bug or I'm doing something wrong?

In your case, it seems like a problem with value type. Because first values of points and values are Python's integers, the rest are interpreted as integers.
The following fixes your code and returns a correct answer, which is [5]:
import numpy as np
from scipy.interpolate import NearestNDInterpolator
points = np.array([[0, 0, 0], [1, 0, 0], [1, 1, 0],[0, 1, 0],[.5, .5, 1]])
values = np.array([1, 2, 3, 4, 5])
interpolator = NearestNDInterpolator(points, values)
interpolator(np.array([[.5, .5, .8]]))
>>> array([5])
Notice two things:
I imported numpy and used np.array. This is the preferable way to work with scipy, because np.array, albeit being static, is much faster comparing to python's list and provides a spectrum of mathematical operations.
When calling interpolator, I used [[...]] instead of your [...]. Why? It highlights the fact that NearestNDInterpolator can interpolate values in multiple points.

Pass your input as arrays
interpolator = NearestNDInterpolator(np.array(points),np.array(
values))
You can even pass many points:
interpolator([np.array([.5,.5,.8]),np.array([1,1,2])])
>>>> array([5,5])

Just pass the values without a list as a tuple of x-values
from scipy.interpolate import NearestNDInterpolator
points = [[0,0,0], [1,0,0], [1,1,0],[0,1,0],[.5,.5,1]]
values = [1,2,3,4,5]
interpolator = NearestNDInterpolator(points,values)
interpolator((.5,.5,.8))
# 5
If you want to stick to passing lists, you can unpack the list contents using * as
interpolator(*[.5,.5,.8])
For interpolating for more than one points, you can map the interpolator onto your list of points (tuples)
answer = list(map(interpolator, [(.5,.5,.8), (.05, 1.6, 2.9)]))
# [5, 5]

How to use numpy to calculate mean and standard deviation of an irregular shaped array

I have a numpy array that has many samples in it of varying length
Samples = np.array([[1001, 1002, 1003],
... ,
[1001, 1002]])
I want to (elementwise) subtract the mean of the array then divide by the standard deviation of the array. Something like:
newSamples = (Samples-np.mean(Samples))/np.std(Samples)
Except that doesn't work for irregular shaped arrays,
np.mean(Samples) causes
unsupported operand type(s) for /: 'list' and 'int'
due to what I assume to be it having set a static size for each axis and then when it encounters a different sized sample it can't handle it. What is an approach to solve this using numpy?
example input:
Sample = np.array([[1, 2, 3],
[1, 2]])
After subtracting by the mean and then dividing by standard deviation:
Sample = array([[-1.06904497, 0.26726124, 1.60356745],
[-1.06904497, 0.26726124]])

Don't make ragged arrays. Just don't. Numpy can't do much with them, and any code you might make for them will always be unreliable and slow because numpy doesn't work that way. It turns them into object dtypes:
Sample
array([[1, 2, 3], [1, 2]], dtype=object)
Which almost no numpy functions work on. In this case those objects are list objects, which makes your code even more confusing as you either have to switch between list and ndarray methods, or stick to list-safe numpy methods. This a recipe for disaster as anyone noodling around with the code later (even yourself if you forget) will be dancing in a minefield.
There's two things you can do with your data to make things work better:
First method is to index and flatten.
i = np.cumsum(np.array([len(x) for x in Sample]))
flat_sample = np.hstack(Sample)
This preserves the index of the end of each sample in i, while keeping the sample as a 1D array
The other method is to pad one dimension with np.nan and use nan-safe functions
m = np.array([len(x) for x in Sample]).max()
nan_sample = np.array([x + [np.nan] * (m - len(x)) for x in Sample])
So to do your calculations, you can use flat_sample and do similar to above:
new_flat_sample = (flat_sample - np.mean(flat_sample)) / np.std(flat_sample)
and use i to recreate your original array (or list of arrays, which I recommend:, see np.split).
new_list_sample = np.split(new_flat_sample, i[:-1])
[array([-1.06904497, 0.26726124, 1.60356745]),
array([-1.06904497, 0.26726124])]
Or use nan_sample, but you will need to replace np.mean and np.std with np.nanmean and np.nanstd
new_nan_sample = (nan_sample - np.nanmean(nan_sample)) / np.nanstd(nan_sample)
array([[-1.06904497, 0.26726124, 1.60356745],
[-1.06904497, 0.26726124, nan]])

#MichaelHackman (following the comment remark).
That's weird because when I compute the overall std and mean then apply it, I obtain different result (see code below).
import numpy as np
Samples = np.array([[1, 2, 3],
[1, 2]])
c = np.hstack(Samples) # Will gives [1,2,3,1,2]
mean, std = np.mean(c), np.std(c)
newSamples = np.asarray([(np.array(xi)-mean)/std for xi in Samples])
print newSamples
# [array([-1.06904497, 0.26726124, 1.60356745]), array([-1.06904497, 0.26726124])]
edit: Add np.asarray(), put mean,std computation outside loop following Imanol Luengo's excellent comments (Thanks!)

Python Numpy error : setting an array element with a sequence

I'm quite new to Python and Numpy, so I apologize if I'm missing something obvious here.
I have a function that solves a system of 2 differential equations :
import numpy as np
import numpy.linalg as la
def solve_ode(x0, a0, beta, t):
At = np.array([[0.23*t, (-10**5)*t], [0, -beta*t]], dtype=np.float32)
# get eigenvalues and eigenvectors
evals, V = la.eig(At)
Vi = la.inv(V)
# get e^At coeff
eAt = V # np.exp(evals) # Vi
xt = eAt*x0
return xt
However, running it with this code :
import matplotlib.pyplot as plt
# initial values
x0 = 10**6
a0 = 2.5
beta = 0.05
t = np.linspace(0, 3600, 360)
plt.semilogy(t, solve_ode(x0, a0, beta, t))
... throws this error :
ValueError: setting an array element with a sequence.
At this line :
At = np.array([[0.23*t, (-10**5)*t], [0, -beta*t]], dtype=np.float32)
Note that t and beta are supposed to be floats. I think Python might not be able to infer this but I don't know how I could do this...
Thx in advance for your help.

You are supplying t as a numpy array of shape 360 from linspace and not simply a float. The resulting At numpy array you are trying to create is then ill formed as all columns must be the same length. In python there is an important difference between lists and numpy arrays. For example, you could do what you have here as a list of lists, e.g.
At = [[0.23*t, (-10**5)*t], [0, -beta*t]]
with dimensions [[360 x 360] x [1 x 360]].
Alternatively, if all elements of At are the length of t the array would work,
At = np.array([[0.23*t, (-10**5)*t], [t, -beta*t]], dtype=np.float32)
with shape [2, 2, 360].

When you give a list or a list of lists, or in this case, a list of list of listss, all of them should have the same length, so that numpy can automatically infer the dimensions (shape) of the resulting matrix.
In your example, it's all correctly put, except the part you put 0 as a column I guess. Not sure what to call it though, cause your expected output is a cube I suppose.
You can fix it by giving the correct number of zeros as bellow:
At = np.array([[0.23*t, (-10**5)*t], [np.zeros(len(t)), -beta*t]], dtype=np.float32)
But check the .shape of the resulting array, and make sure it's what you want.

As others note the problem is the 0 in the inner list. It doesn't match the 360 length arrays generated by the other expressions. np.array can make an object dtype array from that (2x2), but can't make a float one.
At = np.array([[0.23*t, (-10**5)*t], [0*t, -beta*t]])
produces a (2,2,360) array. But I suspect the rest of that function is built around the assumption that At is (2,2) - a 2d square array with eig, inv etc.
What is the return xt supposed to be?
Does this work?
S = np.array([solve_ode(x0, a0, beta, i) for i in t])
giving a 1d array with the same number of values as in t?
I'm not suggesting this is the fastest way of solving the problem, but it's the simplest, especially if you are only generating 360 values.

Python - sparse vectors/distance calculation

I'm looking for dynamically growing vectors in Python, since I don't know their length in advance. In addition, I would like to calculate distances between these sparse vectors, preferably using the distance functions in scipy.spatial.distance (although any other suggestions are welcome). Any ideas how to do this? (Initially, it doesn't need to be efficient.)
Thanks a lot in advance!

You can use regular python lists (which are dynamic) as vectors. Trivial example follows.
from scipy.spatial.distance import sqeuclidean
a = [1,2,3]
b = [0,0,0]
print sqeuclidean(a,b) # 14
As per aganders3's suggestion, do note that you can also use numpy arrays if needed:
import numpy
a = numpy.array([1,2,3])
If the sparse part of your question is crucial I'd use scipy for that - it has support for sparse matrixes. You can define a 1xn matrix and use it as a vector. This works (the parameter is the size of the matrix, filled with zeroes by default):
sqeuclidean(scipy.sparse.coo_matrix((1,3)),scipy.sparse.coo_matrix((1,3))) # 0
There are many kinds of sparse matrixes, some dictionary based (see comment). You can define a row sparse matrix from a list like this:
scipy.sparse.csr_matrix([1,2,3])

Here is how you can do it in numpy:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([0, 0, 0])
c = np.sum(((a - b) ** 2)) # 14

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

np.cov() matrix returns unexpected values - python

It is being normalised by 1/(N-1), not 1/N. Set the ddof parameter to change this behaviour.

Related

How can I get a 'broadcasted' (m,n) array from a (m,)-valued Sympy Matrix which should take a (n,) numpy array as an argument?

LinearNDInterpolatorExtrapolate returns error with trivial example

How to use numpy to calculate mean and standard deviation of an irregular shaped array

Python Numpy error : setting an array element with a sequence

Python - sparse vectors/distance calculation

Categories

Resources