Convolution computations in Numpy/Scipy - python

Profiling some computational work I'm doing showed me that one bottleneck in my program was a function that basically did this (np is numpy, sp is scipy):
def mix1(signal1, signal2):
spec1 = np.fft.fft(signal1, axis=1)
spec2 = np.fft.fft(signal2, axis=1)
return np.fft.ifft(spec1*spec2, axis=1)
Both signals have shape (C, N) where C is the number of sets of data (usually less than 20) and N is the number of samples in each set (around 5000). The computation for each set (row) is completely independent of any other set.
I figured that this was just a simple convolution, so I tried to replace it with:
def mix2(signal1, signal2):
outputs = np.empty_like(signal1)
for idx, row in enumerate(outputs):
outputs[idx] = sp.signal.convolve(signal1[idx], signal2[idx], mode='same')
return outputs
...just to see if I got the same results. But I didn't, and my questions are:
Why not?
Is there a better way to compute the equivalent of mix1()?
(I realise that mix2 probably wouldn't have been faster as-is, but it might have been a good starting point for parallelisation.)
Here's the full script I used to quickly check this:
import numpy as np
import scipy as sp
import scipy.signal
N = 4680
C = 6
def mix1(signal1, signal2):
spec1 = np.fft.fft(signal1, axis=1)
spec2 = np.fft.fft(signal2, axis=1)
return np.fft.ifft(spec1*spec2, axis=1)
def mix2(signal1, signal2):
outputs = np.empty_like(signal1)
for idx, row in enumerate(outputs):
outputs[idx] = sp.signal.convolve(signal1[idx], signal2[idx], mode='same')
return outputs
def test(num, chans):
sig1 = np.random.randn(chans, num)
sig2 = np.random.randn(chans, num)
res1 = mix1(sig1, sig2)
res2 = mix2(sig1, sig2)
np.testing.assert_almost_equal(res1, res2)
if __name__ == "__main__":
np.random.seed(0x1234ABCD)
test(N, C)

So I tested this out and can now confirm a few things:
1) numpy.convolve is not circular, which is what the fft code is giving you:
2) FFT does not internally pad to a power of 2. Compare the vastly different speeds of the following operations:
x1 = np.random.uniform(size=2**17-1)
x2 = np.random.uniform(size=2**17)
np.fft.fft(x1)
np.fft.fft(x2)
3) Normalization is not a difference -- if you do a naive circular convolution by adding up a(k)*b(i-k), you will get the result of the FFT code.
The thing is padding to a power of 2 is going to change the answer. I've heard tales that there are ways to deal with this by cleverly using prime factors of the length (mentioned but not coded in Numerical Recipes) but I've never seen people actually do that.

scipy.signal.fftconvolve does convolve by FFT, it's python code. You can study the source code, and correct you mix1 function.

As mentioned before, the scipy.signal.convolve function does not perform a circular convolution. If you want a circular convolution performed in realspace (in contrast to using fft's) I suggest using the scipy.ndimage.convolve function. It has a mode parameter which can be set to 'wrap' making it a circular convolution.
for idx, row in enumerate(outputs):
outputs[idx] = sp.ndimage.convolve(signal1[idx], signal2[idx], mode='wrap')

Related

Working with very large matrices in numpy

I have a transition matrix for which I want to calculate a steady state vector. The code I'm using is adapted from this question, and it works well for matrices of normal size:
def steady_state(matrix):
dim = matrix.shape[0]
q = (matrix - np.eye(dim))
ones = np.ones(dim)
q = np.c_[q, ones]
qtq = np.dot(q, q.T)
bqt = np.ones(dim)
return np.linalg.solve(qtq, bqt)
However, the matrix I'm working with has about 1.5 million rows and columns. It isn't a sparse matrix either; most entries are small but non-zero. Of course, just trying to build that matrix throws a memory error.
How can I modify the above code to work with huge matrices? I've heard of solutions like PyTables, but I'm not sure how to apply them, and I don't know if they would work for tasks like np.linalg.solve.
Being very new to numpy and very inexperienced with linear algebra, I'd very much appreciate an example of what to do in my case. I'm open to using something other than numpy, and even something other than Python if needed.
Here's some ideas to start with:
We can use the fact that any initial probability vector will converge on the steady state under time evolution (assuming it's ergodic, aperiodic, regular, etc).
For small matrices we could use
def steady_state(matrix):
dim = matrix.shape[0]
prob = np.ones(dim) / dim
other = np.zeros(dim)
while np.linalg.norm(prob - other) > 1e-3:
other = prob.copy()
prob = other # matrix
return prob
(I think the conventions assumed by the function in the question is that distributions go in rows).
Now we can use the fact that matrix multiplication and norm can be done chunk by chunk:
def steady_state_chunk(matrix, block_in=100, block_out=10):
dim = matrix.shape[0]
prob = np.ones(dim) / dim
error = 1.
while error > 1e-3:
error = 0.
other = prob.copy()
for i in range(0, dim, block_out):
outs = np.s_[i:i+block_out]
vec_out = np.zeros(block_out)
for j in range(0, dim, block_in):
ins = np.s_[j:j+block_in]
vec_out += other[ins] # matrix[ins, outs]
error += np.linalg.norm(vec_out - prob[outs])**2
prob[outs] = vec_out
error = np.sqrt(error)
return prob
This should use less memory for temporaries, thought you could do better by using the out parameter of np.matmul.
I should add something to deal with the last slice in each loop, in case dim isn't divisible by block_*, but I hope you get the idea.
For arrays that don't fit in memory to start with, you can apply the tools from the links in the comments above.

Eigen vectors in python giving seemingly random element-wise signs

I'm running the following code:
import numpy as np
import matplotlib
matplotlib.use("TkAgg")
import matplotlib.pyplot as plt
N = 100
t = 1
a1 = np.full((N-1,), -t)
a2 = np.full((N,), 2*t)
Hamiltonian = np.diag(a1, -1) + np.diag(a2) + np.diag(a1, 1)
eval, evec = np.linalg.eig(Hamiltonian)
idx = eval.argsort()[::-1]
eval, evec = eval[idx], evec[:,idx]
wave2 = evec[2] / np.sum(abs(evec[2]))
prob2 = evec[2]**2 / np.sum(evec[2]**2)
_ = plt.plot(wave2)
_ = plt.plot(prob2)
plt.show()
And the plot that comes out is this:
But I'd expect the blue line to be a sinoid as well. This has got me confused and I can't find what's causing the sudden sign changes. Plotting the function absolutely shows that the values associated with each x are fine, but the signs are screwed up.
Any ideas on what might cause this or how to solve it?
Here's a modified version of your script that does what you expected. The changes are:
Corrected the indexing for the eigenvectors; they are the columns of evec.
Use np.linalg.eigh instead of np.linalg.eig. This isn't strictly necessary, but you might as well use the more efficient code.
Don't reverse the order of the sorted eigenvalues. I keep the eigenvalues sorted from lowest to highest. Because eigh returns the eigenvalues in ascending order, I just commented out the code that sorts the eigenvalues.
(Only the first change is a required correction.)
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
N = 100
t = 1
a1 = np.full((N-1,), -t)
a2 = np.full((N,), 2*t)
Hamiltonian = np.diag(a1, -1) + np.diag(a2) + np.diag(a1, 1)
eval, evec = np.linalg.eigh(Hamiltonian)
#idx = eval.argsort()[::-1]
#eval, evec = eval[idx], evec[:,idx]
k = 2
wave2 = evec[:, k] / np.sum(abs(evec[:, k]))
prob2 = evec[:, k]**2 / np.sum(evec[:, k]**2)
_ = plt.plot(wave2)
_ = plt.plot(prob2)
plt.show()
The plot:
I may be wrong, but aren't they all valid eigen vectors/values? The sign shouldn't matter, as the definition of an eigen vector is:
In linear algebra, an eigenvector or characteristic vector of a linear transformation is a non-zero vector that only changes by an overall scale when that linear transformation is applied to it.
Just because the scale is negative doesn't mean it isn't valid.
See this post about Matlab's eig that has a similar problem
One way to fix this is to simply pick a sign for the start, and multiply everthing by -1 that doesn't fit that sign (or take abs of every element and multiply by your expected sign). For your results this should work (nothing crosses 0).
Neither matlab nor numpy care about what you are trying to solve, its simple mathematics that dictates that both signed eigenvector/value combinations are valid, your values are sinusoidal, its just that there exists two sets of eigenvector/values that work (negative and positive)

Least Squares method in practice

Very simple regression task. I have three variables x1, x2, x3 with some random noise. And I know target equation: y = q1*x1 + q2*x2 + q3*x3. Now I want to find target coefs: q1, q2, q3 evaluate the
performance using the mean Relative Squared Error (RSE) (Prediction/Real - 1)^2 to evaluate the performance of our prediction methods.
In the research, I see that this is ordinary Least Squares Problem. But I can't get from examples on the internet how to solve this particular problem in Python. Let say I have data:
import numpy as np
sourceData = np.random.rand(1000, 3)
koefs = np.array([1, 2, 3])
target = np.dot(sourceData, koefs)
(In real life that data are noisy, with not normal distribution.) How to find this koefs using Least Squares approach in python? Any lib usage.
#ayhan made a valuable comment.
And there is a problem with your code: Actually there is no noise in the data you collect. The input data is noisy, but after the multiplication, you don't add any additional noise.
I've added some noise to your measurements and used the least squares formula to fit the parameters, here's my code:
data = np.random.rand(1000,3)
true_theta = np.array([1,2,3])
true_measurements = np.dot(data, true_theta)
noise = np.random.rand(1000) * 1
noisy_measurements = true_measurements + noise
estimated_theta = np.linalg.inv(data.T # data) # data.T # noisy_measurements
The estimated_theta will be close to true_theta. If you don't add noise to the measurements, they will be equal.
I've used the python3 matrix multiplication syntax.
You could use np.dot instead of #
That makes the code longer, so I've split the formula:
MTM_inv = np.linalg.inv(np.dot(data.T, data))
MTy = np.dot(data.T, noisy_measurements)
estimated_theta = np.dot(MTM_inv, MTy)
You can read up on least squares here: https://en.wikipedia.org/wiki/Linear_least_squares_(mathematics)#The_general_problem
UPDATE:
Or you could just use the builtin least squares function:
np.linalg.lstsq(data, noisy_measurements)
In addition to the #lhk answer I have found great scipy Least Squares function. It is easy to get the requested behavior with it.
This way we can provide a custom function that returns residuals and form Relative Squared Error instead of absolute squared difference:
import numpy as np
from scipy.optimize import least_squares
data = np.random.rand(1000,3)
true_theta = np.array([1,2,3])
true_measurements = np.dot(data, true_theta)
noise = np.random.rand(1000) * 1
noisy_measurements = true_measurements + noise
#noisy_measurements[-1] = data[-1] # (1000 * true_theta) - uncoment this outliner to see how much Relative Squared Error esimator works better then default abs diff for this case.
def my_func(params, x, y):
res = (x # params) / y - 1 # If we change this line to: (x # params) - y - we will got the same result as np.linalg.lstsq
return res
res = least_squares(my_func, x0, args=(data, noisy_measurements) )
estimated_theta = res.x
Also, we can provide custom loss with loss argument function that will process the residuals and form final loss.

How to do convolution matrix operation in numpy?

Is there a way to do convolution matrix operation using numpy?
The numpy.convolve only operates on 1D arrays, so this is not the solution.
I rather want to avoid using scipy, since it appears to be more difficult getting installed on Windows.
You have scipy's ndimage which allows you to perform N-dimensional convolution with convolve:
from scipy.ndimage import convolve
convolve(data, kernel)
I know that you said that you want to avoid scipy... but I would advise against it. Scipy is great in so many ways. If you want to install it on windows, try Anaconda Distribution, which already comes with scipy installed.
Anaconda is a multiplatform python distribution that comes with all the essential libraries (including a lot of scientific computing libraries) preinstalled, and tools like pip or conda to install new ones. And no, they don't pay me to advertise it :/ but makes your multiplatform life much easier.
I would highly recommend using openCV for this purpose. However in principle you can almost directly use the "pseudo-code" on the wiki-article on kernel convolution to create your own function...
ks = (kl-1)/2 ## kernels usually square with odd number of rows/columns
kl = len(kernel)
imx = len(matrix)
imy = len(matrix[0])
for i in range(imx):
for j in range(imy):
acc = 0
for ki in range(kl): ##kernel is the matrix to be used
for kj in range(kl):
if 0 <= i-ks <= kl: ## make sure you don't get out of bound error
acc = acc + (matrix[i-ks+ki][j-ks+kj] * kernel[ki][kj])
matrix[i][j] = acc
this should in principle do the trick (but I have not yet tested it...)
I hope this is helpful.
I used the example on the wikipedia article and extrapolated it for every element in the matrix:
def image_convolution(matrix, kernel):
# assuming kernel is symmetric and odd
k_size = len(kernel)
m_height, m_width = matrix.shape
padded = np.pad(matrix, (k_size-1, k_size-1))
# iterates through matrix, applies kernel, and sums
output = []
for i in range(m_height):
for j in range(m_width):
output.append(np.sum(padded[i:k_size+i, j:k_size+j]*kernel))
output=np.array(output).reshape((m_height, m_width))
return output
padded[i:k_size+i, j:k_size+j] is a slice of the array the same size as the kernel.
Hope this is clear and helps.
An alternate numpy way to perform using matrix adds instead of cells reduces the looping.
def zConv(m,K):
#input assumed to be numpy arrays Kr<=mrow, Kc<=mcol, Kernal odd
#edges wrap Top/Bottom, Left/Right
#Zero Pad m by kr,kc if no wrap desired
mc=m*0
Kr,Kc= K.shape
kr=Kr//2 #kernel center
kc=Kc//2
for dr in range(-kr,kr+1):
mr=np.roll(m,dr,axis=0)
for dc in range(-kc,kc+1):
mrc=np.roll(mr,dc,axis=1)
mc=mc+K[dr+kr,dc+kc]*mrc
return mc
If your kernel is not symmetric (adjusted from the other answers):
def image_convolution(matrix, kernel):
# kernel can be asymmetric but still needs to be odd
k_height, k_width = kernel.shape
m_height, m_width = matrix.shape
k_size = max(k_height, k_width)
padded = np.pad(matrix, (int(k_size/2), int(k_size/2)))
if k_size > 1:
if k_height == 1:
padded = padded[1:-1,:]
elif k_width == 1:
padded = padded[:,1:-1]
# iterates through matrix, applies kernel, and sums
output = []
for i in range(m_height):
for j in range(m_width):
between = padded[i:k_height+i, j:k_width+j]*kernel
output.append(np.sum(between))
output=np.array(output).reshape((m_height, m_width))
return output

Speeding up Evaluation of Sympy Symbolic Expressions

A Python program I am currently working on (Gaussian process classification) is bottlenecking on evaluation of Sympy symbolic matrices, and I can't figure out what I can, if anything, do to speed it up. Other parts of the program I've already ensured are typed properly (in terms of numpy arrays) so calculations between them are properly vectorised, etc.
I looked into Sympy's codegen functions a bit (autowrap, binary_function) in particular, but because my within my ImmutableMatrix object itself are partial derivatives over elements of a symbolic matrix, there is a long list of 'unhashable' things which prevent me from using the codegen functionality.
Another possibility I looked into was using Theano - but after some initial benchmarks, I found that while it build the initial partial derivative symbolic matrices much quicker, it seemed to be a few orders of magnitude slower at evaluation, the opposite of what I was seeking.
Below is a working, extracted snippet of the code I am currently working on.
import theano
import sympy
from sympy.utilities.autowrap import autowrap
from sympy.utilities.autowrap import binary_function
import numpy as np
import math
from datetime import datetime
# 'Vectorized' cdist that can handle symbols/arbitrary types - preliminary benchmarking put it at ~15 times faster than python list comprehension, but still notably slower (forgot at the moment) than cdist, of course
def sqeucl_dist(x, xs):
m = np.sum(np.power(
np.repeat(x[:,None,:], len(xs), axis=1) -
np.resize(xs, (len(x), xs.shape[0], xs.shape[1])),
2), axis=2)
return m
def build_symbolic_derivatives(X):
# Pre-calculate derivatives of inverted matrix to substitute values in the Squared Exponential NLL gradient
f_err_sym, n_err_sym = sympy.symbols("f_err, n_err")
# (1,n) shape 'matrix' (vector) of length scales for each dimension
l_scale_sym = sympy.MatrixSymbol('l', 1, X.shape[1])
# K matrix
print("Building sympy matrix...")
eucl_dist_m = sqeucl_dist(X/l_scale_sym, X/l_scale_sym)
m = sympy.Matrix(f_err_sym**2 * math.e**(-0.5 * eucl_dist_m)
+ n_err_sym**2 * np.identity(len(X)))
# Element-wise derivative of K matrix over each of the hyperparameters
print("Getting partial derivatives over all hyperparameters...")
pd_t1 = datetime.now()
dK_df = m.diff(f_err_sym)
dK_dls = [m.diff(l_scale_sym) for l_scale_sym in l_scale_sym]
dK_dn = m.diff(n_err_sym)
print("Took: {}".format(datetime.now() - pd_t1))
# Lambdify each of the dK/dts to speed up substitutions per optimization iteration
print("Lambdifying ")
l_t1 = datetime.now()
dK_dthetas = [dK_df] + dK_dls + [dK_dn]
dK_dthetas = sympy.lambdify((f_err_sym, l_scale_sym, n_err_sym), dK_dthetas, 'numpy')
print("Took: {}".format(datetime.now() - l_t1))
return dK_dthetas
# Evaluates each dK_dtheta pre-calculated symbolic lambda with current iteration's hyperparameters
def eval_dK_dthetas(dK_dthetas_raw, f_err, l_scales, n_err):
l_scales = sympy.Matrix(l_scales.reshape(1, len(l_scales)))
return np.array(dK_dthetas_raw(f_err, l_scales, n_err), dtype=np.float64)
dimensions = 3
X = np.random.rand(50, dimensions)
dK_dthetas_raw = build_symbolic_derivatives(X)
f_err = np.random.rand()
l_scales = np.random.rand(3)
n_err = np.random.rand()
t1 = datetime.now()
dK_dthetas = eval_dK_dthetas(dK_dthetas_raw, f_err, l_scales, n_err) # ~99.7%
print(datetime.now() - t1)
In this example, 5 50x50 symbolic matrices are evaluated, i.e. only 12,500 elements, taking 7 seconds. I've spent quite some time looking for resources on speeding operations like this up, and trying to translate it into Theano (at least until I found its evaluation slower in my case) and having no luck there either.
Any help greatly appreciated!

Categories

Resources