Calculating "generating functions" with numpy

Calculating "generating functions" with numpy - python

In mathematics, a "generating function" is defined from a sequence of numbers c0, c1, c2, ..., cn by c0+c1*x+c2*x^2 + ... + cn*x^n. These come as "moment generating functions", "probability generating functions" and various other types, depending on the source of the coefficient.
I have an array of the coefficients and I'd like a quick way to create the corresponding generating function.
I could do
import numpy as np
myArray = np.array([1,2,3,4])
x=0.2
sum([c*x**k for k,c in enumerate myArray])
or I could have an array having c[k] in the kth entry. It seems there should be a fast numpy way to do this.
Unfortunately attempts to look this up are complicated by the fact that "generate" and "function" are common words in programming, as is the combination "generating function" so I haven't had any luck with search engines.

x = .2
coeffs = np.array([1,2,3,4])
Make an array of the degree of each term
degrees = np.arange(len(coeffs))
Raise x the each degree
terms = np.power(x, degrees)
Multiply the coefficients and sum
result = np.sum(coeffs*terms)
>>> coeffs
array([1, 2, 3, 4])
>>> degrees
array([0, 1, 2, 3])
>>> terms
array([ 1. , 0.2 , 0.04 , 0.008])
>>> result
1.552
>>>
As a function:
def f(coeffs, x):
degrees = np.arange(len(coeffs))
terms = np.power(x, degrees)
return np.sum(coeffs*terms)
Or simply us the Numpy Polynomial Package
from numpy.polynomial import Polynomial as P
p = P(coeffs)
result = p(x)

If you are looking for performance, using np.einsum could be suggested too -
np.einsum('i,i->',myArray,x**np.arange(myArray.size))

>>> coeffs = np.random.random(5)
>>> coeffs
array([ 0.70632473, 0.75266724, 0.70575037, 0.49293719, 0.66905641])
>>> x = np.random.random()
>>> x
0.7252944971757169
>>> powers = np.arange(0, coeffs.shape[0], 1)
>>> powers
array([0, 1, 2, 3, 4])
>>> result = coeffs * x ** powers
>>> result
array([ 0.70632473, 0.54590541, 0.37126147, 0.18807659, 0.18514853])
>>> np.sum(result)
1.9967167252487628

Using numpys Polynomial class is probably the easiest way.
from numpy.polynomial import Polynomial
coefficients = [1,2,3,4]
f = Polynomial( coefficients )
You can then use the object like any other function.
import numpy as np
import matplotlib.pyplot as plt
print f( 0.2 )
x = np.linspace( -5, 5, 51 )
plt.plot( x , f(x) )

Related

scipy.stats.multivariable_norm.pdf: "The input matrix must be symmetric positive semidefinite."

So I have the following code below.
L = np.array([1,2,3])
M = np.array([1,2,3])
Q = np.random.uniform(0,10,size=(3,3))
S = Q.T*Q
print(sp.stats.multivariate_normal.pdf(L,M,S))
Clearly S is a symmetric positive semidefinite matrix. I can prove it using linear algebra theory. However, scipy complains that it isn't when running the above code. What can I do to solve this problem?

Like the comment by Mechanic Pig says, replace * (element-wise multiplication on Numpy arrays) with #.
import scipy as sp
import numpy as np
L = np.array([1, 2, 3])
M = np.array([1, 2, 3])
Q = np.random.uniform(0, 10, size=(3, 3))
S = Q.T # Q
print(sp.stats.multivariate_normal.pdf(L, M, S))
prints, in my case, 0.0003568248543110567.
You can verify your "covariance" is positive-definite by just comparing
np.linalg.eig(Q.T * Q)[0], using element-wise multiply (will likely have negative values), vs
np.linalg.eig(Q.T # Q)[0] (will have no negative values).

Get the least squares straight line for a set of points

For a set of points, I want to get the straight line that is the closest approximation of the points using a least squares fit.
I can find a lot of overly complex solutions here on SO and elsewhere but I have not been able to find something simple. And this should be very simple.
x = np. array([1, 2, 3, 4])
y = np. array([23, 31, 42, 43 ])
slope, intercept = leastSquares(x, y)
Is there some library function that implements the above leastSquares()?

numpy.linalg.lstsq can compute such a fit for you. There is an example in the documentation that does exactly what you need.
https://numpy.org/doc/stable/reference/generated/numpy.linalg.lstsq.html#numpy-linalg-lstsq
To summarize it here …
>>> x = np.array([0, 1, 2, 3])
>>> y = np.array([-1, 0.2, 0.9, 2.1])
>>> A = np.stack([x, np.ones(len(x))]).T
>>> m, c = np.linalg.lstsq(A, y, rcond=None)[0]
>>> m, c
(1.0 -0.95) # may vary

Well for one, I think for an ordinary least squares fit with a single line you can derive a closed-form solution for the coefficients, if I'm not utterly mistaken. Though there's some pitfalls with numerical stability.
If you look for least squares in general, you'll find more general and thus more complex solutions, because least squares can be done for many more models than just the linear one.
But maybe the sklearn package with its LinearRegression model might do easily what you want to do? https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
or for more detailed control the scipy package, https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.lstsq.html
import numpy as np
from scipy.linalg import lstq
# Turn x into 2d array; raise it to powers 0 (for y-axis-intercept)
# and 1 (for the slope part)
M = x[:, np.newaxis] ** [0, 1]
p, res, rnk, s = lstq(M, y)
intercept, slope = p[0], p[1]

Here's one way to implement the least squares regression:
import numpy as np
x = np. array([1, 2, 3, 4])
y = np. array([23, 31, 42, 43 ])
def leastSquares(x, y):
A = np.vstack([x, np.ones(len(x))]).T
y = y[:, np.newaxis]
slope, intercept = np.dot((np.dot(np.linalg.inv(np.dot(A.T,A)),A.T)),y)
return slope, intercept
slope, intercept = leastSquares(x, y)

You can try with Moore-Penrose pseudo-inverse:
from scipy import linalg
x = np. array([1, 2, 3, 4])
y = np. array([23, 31, 42, 43 ])
x = np.array([x, np.ones(len(x))])
B = linalg.pinv(x)
sol = np.reshape(y,(1,len(y))) # B
slope, intercept = sol[0,0], sol[0,1]

Partial convolution / correlation with numpy [duplicate]

I am learning numpy/scipy, coming from a MATLAB background. The xcorr function in Matlab has an optional argument "maxlag" that limits the lag range from –maxlag to maxlag. This is very useful if you are looking at the cross-correlation between two very long time series but are only interested in the correlation within a certain time range. The performance increases are enormous considering that cross-correlation is incredibly expensive to compute.
In numpy/scipy it seems there are several options for computing cross-correlation. numpy.correlate, numpy.convolve, scipy.signal.fftconvolve. If someone wishes to explain the difference between these, I'd be happy to hear, but mainly what is troubling me is that none of them have a maxlag feature. This means that even if I only want to see correlations between two time series with lags between -100 and +100 ms, for example, it will still calculate the correlation for every lag between -20000 and +20000 ms (which is the length of the time series). This gives a 200x performance hit! Do I have to recode the cross-correlation function by hand to include this feature?

Here are a couple functions to compute auto- and cross-correlation with limited lags. The order of multiplication (and conjugation, in the complex case) was chosen to match the corresponding behavior of numpy.correlate.
import numpy as np
from numpy.lib.stride_tricks import as_strided
def _check_arg(x, xname):
x = np.asarray(x)
if x.ndim != 1:
raise ValueError('%s must be one-dimensional.' % xname)
return x
def autocorrelation(x, maxlag):
"""
Autocorrelation with a maximum number of lags.
`x` must be a one-dimensional numpy array.
This computes the same result as
numpy.correlate(x, x, mode='full')[len(x)-1:len(x)+maxlag]
The return value has length maxlag + 1.
"""
x = _check_arg(x, 'x')
p = np.pad(x.conj(), maxlag, mode='constant')
T = as_strided(p[maxlag:], shape=(maxlag+1, len(x) + maxlag),
strides=(-p.strides[0], p.strides[0]))
return T.dot(p[maxlag:].conj())
def crosscorrelation(x, y, maxlag):
"""
Cross correlation with a maximum number of lags.
`x` and `y` must be one-dimensional numpy arrays with the same length.
This computes the same result as
numpy.correlate(x, y, mode='full')[len(a)-maxlag-1:len(a)+maxlag]
The return vaue has length 2*maxlag + 1.
"""
x = _check_arg(x, 'x')
y = _check_arg(y, 'y')
py = np.pad(y.conj(), 2*maxlag, mode='constant')
T = as_strided(py[2*maxlag:], shape=(2*maxlag+1, len(y) + 2*maxlag),
strides=(-py.strides[0], py.strides[0]))
px = np.pad(x, maxlag, mode='constant')
return T.dot(px)
For example,
In [367]: x = np.array([2, 1.5, 0, 0, -1, 3, 2, -0.5])
In [368]: autocorrelation(x, 3)
Out[368]: array([ 20.5, 5. , -3.5, -1. ])
In [369]: np.correlate(x, x, mode='full')[7:11]
Out[369]: array([ 20.5, 5. , -3.5, -1. ])
In [370]: y = np.arange(8)
In [371]: crosscorrelation(x, y, 3)
Out[371]: array([ 5. , 23.5, 32. , 21. , 16. , 12.5, 9. ])
In [372]: np.correlate(x, y, mode='full')[4:11]
Out[372]: array([ 5. , 23.5, 32. , 21. , 16. , 12.5, 9. ])
(It will be nice to have such a feature in numpy itself.)

Until numpy implements the maxlag argument, you can use the function ucorrelate from the pycorrelate package. ucorrelate operates on numpy arrays and has a maxlag keyword. It implements the correlation from using a for-loop and optimizes the execution speed with numba.
Example - autocorrelation with 3 time lags:
import numpy as np
import pycorrelate as pyc
x = np.array([2, 1.5, 0, 0, -1, 3, 2, -0.5])
c = pyc.ucorrelate(x, x, maxlag=3)
c
Result:
Out[1]: array([20, 5, -3])
The pycorrelate documentation contains a notebook showing perfect match between pycorrelate.ucorrelate and numpy.correlate:

matplotlib.pyplot provides matlab like syntax for computating and plotting of cross correlation , auto correlation etc.
You can use xcorr which allows to define the maxlags parameter.
import matplotlib.pyplot as plt
import numpy as np
data = np.arange(0,2*np.pi,0.01)
y1 = np.sin(data)
y2 = np.cos(data)
coeff = plt.xcorr(y1,y2,maxlags=10)
print(*coeff)
[-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
8 9 10] [ -9.81991753e-02 -8.85505028e-02 -7.88613080e-02 -6.91325329e-02
-5.93651264e-02 -4.95600447e-02 -3.97182508e-02 -2.98407146e-02
-1.99284126e-02 -9.98232812e-03 -3.45104289e-06 9.98555430e-03
1.99417667e-02 2.98641953e-02 3.97518558e-02 4.96037706e-02
5.94189688e-02 6.91964864e-02 7.89353663e-02 8.86346584e-02
9.82934198e-02] <matplotlib.collections.LineCollection object at 0x00000000074A9E80> Line2D(_line0)

#Warren Weckesser's answer is the best as it leverages numpy to get performance savings (and not just call corr for each lag). Nonetheless, it returns the cross-product (eg the dot product between the inputs at various lags). To get the actual cross-correlation I modified his answer w/ an optional mode argument, which if set to 'corr' returns the cross-correlation as such:
def crosscorrelation(x, y, maxlag, mode='corr'):
"""
Cross correlation with a maximum number of lags.
`x` and `y` must be one-dimensional numpy arrays with the same length.
This computes the same result as
numpy.correlate(x, y, mode='full')[len(a)-maxlag-1:len(a)+maxlag]
The return vaue has length 2*maxlag + 1.
"""
py = np.pad(y.conj(), 2*maxlag, mode='constant')
T = as_strided(py[2*maxlag:], shape=(2*maxlag+1, len(y) + 2*maxlag),
strides=(-py.strides[0], py.strides[0]))
px = np.pad(x, maxlag, mode='constant')
if mode == 'dot': # get lagged dot product
return T.dot(px)
elif mode == 'corr': # gets Pearson correlation
return (T.dot(px)/px.size - (T.mean(axis=1)*px.mean())) / \
(np.std(T, axis=1) * np.std(px))

I encountered the same problem some time ago, I paid more attention to the efficiency of calculation.Refer to the source code of MATLAB's function xcorr.m, I made a simple one.
import numpy as np
from scipy import signal, fftpack
import math
import time
def nextpow2(x):
if x == 0:
y = 0
else:
y = math.ceil(math.log2(x))
return y
def xcorr(x, y, maxlag):
m = max(len(x), len(y))
mx1 = min(maxlag, m - 1)
ceilLog2 = nextpow2(2 * m - 1)
m2 = 2 ** ceilLog2
X = fftpack.fft(x, m2)
Y = fftpack.fft(y, m2)
c1 = np.real(fftpack.ifft(X * np.conj(Y)))
index1 = np.arange(1, mx1+1, 1) + (m2 - mx1 -1)
index2 = np.arange(1, mx1+2, 1) - 1
c = np.hstack((c1[index1], c1[index2]))
return c
if __name__ == "__main__":
s = time.clock()
a = [1, 2, 3, 4, 5]
b = [6, 7, 8, 9, 10]
c = xcorr(a, b, 3)
e = time.clock()
print(c)
print(e-c)
Take the results of a certain run as an exmple:
[ 29. 56. 90. 130. 110. 86. 59.]
0.0001745000000001884
comparing with MATLAB code:
clear;close all;clc
tic
a = [1, 2, 3, 4, 5];
b = [6, 7, 8, 9, 10];
c = xcorr(a, b, 3)
toc
29.0000 56.0000 90.0000 130.0000 110.0000 86.0000 59.0000
时间已过 0.000279 秒。
If anyone can give a strict mathematical derivation about this,that would be very helpful.

I think I have found a solution, as I was facing the same problem:
If you have two vectors x and y of any length N, and want a cross-correlation with a window of fixed len m, you can do:
x = <some_data>
y = <some_data>
# Trim your variables
x_short = x[window:]
y_short = y[window:]
# do two xcorrelations, lagging x and y respectively
left_xcorr = np.correlate(x, y_short) #defaults to 'valid'
right_xcorr = np.correlate(x_short, y) #defaults to 'valid'
# combine the xcorrelations
# note the first value of right_xcorr is the same as the last of left_xcorr
xcorr = np.concatenate(left_xcorr, right_xcorr[1:])
Remember you might need to normalise the variables if you want a bounded correlation

Here is another answer, sourced from here, seems faster on the margin than np.correlate and has the benefit of returning a normalised correlation:
def rolling_window(self, a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
def xcorr(self, x,y):
N=len(x)
M=len(y)
meany=np.mean(y)
stdy=np.std(np.asarray(y))
tmp=self.rolling_window(np.asarray(x),M)
c=np.sum((y-meany)*(tmp-np.reshape(np.mean(tmp,-1),(N-M+1,1))),-1)/(M*np.std(tmp,-1)*stdy)
return c

as I answered here, https://stackoverflow.com/a/47897581/5122657
matplotlib.xcorr has the maxlags param. It is actually a wrapper of the numpy.correlate, so there is no performance saving. Nevertheless it gives exactly the same result given by Matlab's cross-correlation function. Below I edited the code from matplotlib so that it will return only the correlation. The reason is that if we use matplotlib.corr as it is, it will return the plot as well. The problem is, if we put complex data type as the arguments into it, we will get "casting complex to real datatype" warning when matplotlib tries to draw the plot.
<!-- language: python -->
import numpy as np
import matplotlib.pyplot as plt
def xcorr(x, y, maxlags=10):
Nx = len(x)
if Nx != len(y):
raise ValueError('x and y must be equal length')
c = np.correlate(x, y, mode=2)
if maxlags is None:
maxlags = Nx - 1
if maxlags >= Nx or maxlags < 1:
raise ValueError('maxlags must be None or strictly positive < %d' % Nx)
c = c[Nx - 1 - maxlags:Nx + maxlags]
return c

Computing pseudo inverse of a matrix using sympy

How should I compute the pseudo-inverse of a matrix using sympy (not using numpy, because the matrix has symbolic constants and I want the inverse also in symbolic). The normal inv() does not work for a non-square matrix in sympy. For example if M = Matrix(2,3, [1,2,3,4,5,6]), pinv(M) should give
-0.9444 0.4444
-0.1111 0.1111
0.7222 -0.2222

I think since this is all symbolic it should be OK to use the text-book formulas taught in a linear algebra class (e.g. see the list of special cases in the Wikipedia article on the Moore–Penrose pseudoinverse). For numerical evaluation pinv uses the singular value decomposition (svd) instead.
You have linearly independent rows (full row rank), so you can use the formula for a 'right' inverse:
>>> import sympy as sy
>>> M = sy.Matrix(2,3, [1,2,3,4,5,6])
>>> N = M.H * (M * M.H) ** -1
>>> N.evalf(4)
[-0.9444, 0.4444]
[-0.1111, 0.1111]
[ 0.7222, -0.2222]
>>> M * N
[1, 0]
[0, 1]
For full column rank, replace M with M.H, transpose the result, and simplify to get the following formula for the 'left' inverse:
>>> M = sy.Matrix(3, 2, [1,2,3,4,5,6])
>>> N = (M.H * M) ** -1 * M.H
>>> N.evalf(4)
[-1.333, -0.3333, 0.6667]
[ 1.083, 0.3333, -0.4167]
>>> N * M
[1, 0]
[0, 1]

How to get the integer eigenvectors of a Numpy matrix?

I have a Numpy matrix, for example, numpy.matrix([[-1, 2],[1, -2]], dtype='int'). I want to get its integer-valued eigenvectors, if any; for example, numpy.array([[-1], [1]]) for the above matrix. What Numpy returns are eigenvectors in floating numbers, scaled to have unit length.
One can do this in Sage, where one can specify the field (i.e., data type) of the matrix and operations done on the matrix will respect the field one specifies.
Any idea of how to do this nicely in Python? Many thanks in advance.

I am personally content with the following solution: I called sage in Python and let sage compute what I want. sage, being math-oriented, is rather versatile in computations involving fields other than reals.
Below is my script compute_intarrs.py and it requires sage be installed. Be aware it is a little slow.
import subprocess
import re
import numpy as np
# construct a numpy matrix
mat = np.matrix([[1,-1],[-1,1]])
# convert the matrix into a string recognizable by sage
matstr = re.sub('\s|[a-z]|\(|\)', '', mat.__repr__())
# write a (sage) python script "mat.py";
# for more info of the sage commands:
# www.sagemath.org/doc/faq/faq-usage.html#how-do-i-import-sage-into-a-python-script
# www.sagemath.org/doc/tutorial/tour_linalg.html
f = open('mat.py', 'w')
f.write('from sage.all import *\n\n')
f.write('A = matrix(ZZ, %s)\n\n' % matstr)
f.write('print A.kernel()') # this returns the left nullspace vectors
f.close()
# call sage and run mat.py
p = subprocess.Popen(['sage', '-python', 'mat.py'], stdout=subprocess.PIPE)
# process the output from sage
arrstrs = p.communicate()[0].split('\n')[2:-1]
arrs = [np.array(eval(re.sub('(?<=\d)\s*(?=\d|-)', ',', arrstr)))
for arrstr in arrstrs]
print arrs
Result:
In [1]: %run compute_intarrs.py
[array([1, 1])]

You can do some pretty cool things with dtype = object and the fractions.Fraction class, e.g.
>>> A = np.array([fractions.Fraction(1, j) for j in xrange(1, 13)]).reshape(3, 4)
>>> A
array([[1, 1/2, 1/3, 1/4],
[1/5, 1/6, 1/7, 1/8],
[1/9, 1/10, 1/11, 1/12]], dtype=object)
>>> B = np.array([fractions.Fraction(1, j) for j in xrange(1, 13)]).reshape(4, 3)
>>> B
array([[1, 1/2, 1/3],
[1/4, 1/5, 1/6],
[1/7, 1/8, 1/9],
[1/10, 1/11, 1/12]], dtype=object)
>>> np.dot(A, B)
array([[503/420, 877/1320, 205/432],
[3229/11760, 751/4620, 1217/10080],
[1091/6930, 1871/19800, 1681/23760]], dtype=object)
Unfortunately the np.linalg module converts everything to float before doing anything, so you can't expect to get solutions directly as integers or rationals. But you can always do the following after your computations:
def scale_to_int(x) :
fracs = [fractions.Fraction(j) for j in x.ravel()]
denominators = [j.denominator for j in fracs]
lcm = reduce(lambda a, b: max(a, b) / fractions.gcd(a, b) * min(a, b),
denominators)
fracs = map(lambda x : lcm * x, fracs)
gcd = reduce(lambda a, b: fractions.gcd(a, b), fracs)
fracs = map(lambda x: x / gcd, fracs)
return np.array(fracs).reshape(x.shape)
It will be slow, and very sensitive to round-off errors:
>>> scale_to_int(np.linspace(0, 1, 5)) # [0, 0.25, 0.5, 0.75, 1]
array([0, 1, 2, 3, 4], dtype=object)
>>> scale_to_int(np.linspace(0, 1, 4)) # [0, 0.33333333, 0.66666667, 1]
array([0, 6004799503160661, 12009599006321322, 18014398509481984], dtype=object)
You could mitigate some of that using the limit_denominator method of Fraction, but probably will not be all that robust.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Calculating "generating functions" with numpy - python

If you are looking for performance, using np.einsum could be suggested too - np.einsum('i,i->',myArray,x**np.arange(myArray.size))

Related

scipy.stats.multivariable_norm.pdf: "The input matrix must be symmetric positive semidefinite."

Get the least squares straight line for a set of points

Partial convolution / correlation with numpy [duplicate]

Computing pseudo inverse of a matrix using sympy

How to get the integer eigenvectors of a Numpy matrix?

Categories

Resources