Scipy optimize unable to find the correct results - python

I am trying to use scipy.optimize.minimize to fit parameters for a multivariate function, however, regardless of how many noise free data points I am providing to the optimizer, the optimizer could not converge to a correct (or close) answer.
I wonder if there is a mistake in the way I am using the optimizer but I have been scratching my head to find the mistake. I would appreciate any advice or guesses, thanks!
import numpy as np
from scipy.optimize import minimize
import math
def get_transform(ai,aj,ak,x,y,z):
i,j,k = 0, 1, 2
si, sj, sk = math.sin(ai), math.sin(aj), math.sin(ak)
ci, cj, ck = math.cos(ai), math.cos(aj), math.cos(ak)
cc, cs = ci*ck, ci*sk
sc, ss = si*ck, si*sk
M = np.identity(4)
M[i, i] = cj*ck
M[i, j] = sj*sc-cs
M[i, k] = sj*cc+ss
M[j, i] = cj*sk
M[j, j] = sj*ss+cc
M[j, k] = sj*cs-sc
M[k, i] = -sj
M[k, j] = cj*si
M[k, k] = cj*ci
M[0, 3] = x
M[1, 3] = y
M[2, 3] = z
return M
def camera_intrinsic(fx, ppx, fy, ppy):
K = np.zeros((3, 3), dtype='float64')
K[0, 0], K[0, 2] = fx, ppx
K[1, 1], K[1, 2] = fy, ppy
K[2, 2] = 1
return K
def apply_transform(p, matrix):
rotation = matrix[0:3,0:3]
T = np.array([matrix[0][3],matrix[1][3],matrix[2][3]])
transformed = (, p.T).T)+T
return transformed
def project(points_3D,internal_calibration):
points_3D = points_3D.T
projections_2d = np.zeros((2, points_3D.shape[1]), dtype='float32')
camera_projection = (internal_calibration).dot(points_3D)
projections_2d[0, :] = camera_projection[0, :]/camera_projection[2, :]
projections_2d[1, :] = camera_projection[1, :]/camera_projection[2, :]
return projections_2d.T
def error(x):
global points,pixels
transform = get_transform(x[0],x[1],x[2],x[3],x[4],x[5])
points_transfered = apply_transform(points, transform)
internal_calibration = camera_intrinsic(x[6],x[7],x[8],x[9])
projected = project(points_transfered,internal_calibration)
# print(((projected-pixels)**2).mean())
return ((projected-pixels)**2).mean()
def generate(points, x):
transform = get_transform(x[0],x[1],x[2],x[3],x[4],x[5])
points_transfered = apply_transform(points, transform)
internal_calibration = camera_intrinsic(x[6],x[7],x[8],x[9])
projected = project(points_transfered,internal_calibration)
return projected
points = np.random.rand(100,3)
x_initial = np.random.rand(10)
pixels = generate(points,x_initial)
x_guess = np.random.rand(10)
results = minimize(error,x_guess, method='nelder-mead', tol = 1e-15)
x = results.x

You are solving least squares problem, but trying to optimize it using a solver that minimizes a scalar function. While it can possibly solve the problem, it does so very inefficiently. It can require much more iterations or can fail to converge at all.
The better way is to use least_squares instead of minimize.
For it to work properly you should modify error function by returning 1D numpy array instead of a scalar:
def error(x):
return (projected-pixels).flatten()
Then call least_squares:
results = least_squares(error, x_guess)
x = results.x
print('error:', np.linalg.norm(error(x)))
Also, error(x) currently returns array of float32, because an array of float32 is created in project. It should be replaced by float64, otherwise minimization fails to converge, because most of gradients become zeros when 32 bit precision is used.
def project(points_3D,internal_calibration):
projections_2d = np.zeros((2, points_3D.shape[1]), dtype='float64')
With these modifications the solver converges to the solution most of the times, but can sometimes fail to do so. It happens because you generate the problem randomly, so in some cases the problem may be degenerate or make no physical sense. Such cases should be investigated on their own.
It can also help to use a robust loss, such as 'arctan', instead of linear loss:
results = least_squares(error, x_guess, loss='arctan')
[0.68589904 0.68782115 0.83299068 0.02360941 0.19367124 0.54715374
0.37609235 0.62190714 0.98824796 0.88385802]
[0.68589904 0.68782115 0.83299068 0.02360941 0.19367124 0.54715374
0.37609235 0.62190714 0.98824796 0.88385802]
error: 1.2269443642313758e-12


Speeding up a pytorch tensor operation

I am trying to speed up the below operation by doing some sort of matrix/vector-multiplication, can anyone see a nice quick solution?
It should also work for a special case where a tensor has shape 0 (torch.Size([])) but i am not able to initialize such a tensor.
See the image below for the type of tensor i am referring to:
tensor to add to test
def adstock_geometric(x: torch.Tensor, theta: float):
x_decayed = torch.zeros_like(x)
x_decayed[0] = x[0]
for xi in range(1, len(x_decayed)):
x_decayed[xi] = x[xi] + theta * x_decayed[xi - 1]
return x_decayed
def adstock_multiple_samples(x: torch.Tensor, theta: torch.Tensor):
listtheta = theta.tolist()
if isinstance(listtheta, float):
return adstock_geometric(x=x,
x_decayed = torch.zeros((100, 112, 1))
for idx, theta_ in enumerate(listtheta):
x_decayed_one_entry = adstock_geometric(x=x,
x_decayed[idx] = x_decayed_one_entry
return x_decayed
if __name__ == '__main__':
ones = torch.tensor([1])
hundreds = torch.tensor([idx for idx in range(100)])
x = torch.tensor([[idx] for idx in range(112)])
ones = adstock_multiple_samples(x=x,
hundreds = adstock_multiple_samples(x=x,
I came up with the following, which is 40 times faster on your example:
import torch
def adstock_multiple_samples(x: torch.Tensor, theta: torch.Tensor):
arange = torch.arange(len(x))
powers = (arange[:, None] - arange).clip(0)
return ((theta[:, None, None] ** powers[None, :, :]).tril() * x).sum(-1)
It behaves as expected:
>>> x = torch.arange(112)
>>> theta = torch.arange(100)
>>> adstock_multiple_samples(x, theta)
... # the same output
Note that I considered that x was a 1D-tensor, as for your example the second dimension was not needed.
It also works with theta = torch.empty((0,)), and it returns an empty tensor.

python kmedoids - calculating new medoid centers more efficiently

I'm following an excellent medium article: to implement kmedoids from scratch. There is a place in the code where each pixel's distance to the medoid centers is calculated and it is VERY slow. It has numpy.linalg.norm inside a loop. Is there a way to optimize this with numpy.linalg.norm or with numpy broadcasting or scipy.spatial.distance.cdist and np.argmin to do the same thing?
###helper function here###
def compute_d_p(X, medoids, p):
m = len(X)
medoids_shape = medoids.shape
# If a 1-D array is provided,
# it will be reshaped to a single row 2-D array
if len(medoids_shape) == 1:
medoids = medoids.reshape((1,len(medoids)))
k = len(medoids)
S = np.empty((m, k))
for i in range(m):
d_i = np.linalg.norm(X[i, :] - medoids, ord=p, axis=1)
S[i, :] = d_i**p
return S
this is where the slowdown occurs
for datap in cluster_points:
new_medoid = datap
new_dissimilarity= np.sum(compute_d_p(X, datap, p))
if new_dissimilarity < avg_dissimilarity :
avg_dissimilarity = new_dissimilarity
out_medoids[i] = datap
Full code below. All credits to the article author.
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import datasets
from sklearn.decomposition import PCA
# Dataset
iris = datasets.load_iris()
data = pd.DataFrame(,columns = iris.feature_names)
target = iris.target_names
labels =
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
data = pd.DataFrame(scaler.fit_transform(data), columns=data.columns)
#PCA Transformation
from sklearn.decomposition import PCA
pca = PCA(n_components=3)
principalComponents = pca.fit_transform(data)
PCAdf = pd.DataFrame(data = principalComponents , columns = ['principal component 1', 'principal component 2','principal component 3'])
datapoints = PCAdf.values
m, f = datapoints.shape
k = 3
def init_medoids(X, k):
from numpy.random import choice
from numpy.random import seed
samples = choice(len(X), size=k, replace=False)
return X[samples, :]
medoids_initial = init_medoids(datapoints, 3)
def compute_d_p(X, medoids, p):
m = len(X)
medoids_shape = medoids.shape
# If a 1-D array is provided,
# it will be reshaped to a single row 2-D array
if len(medoids_shape) == 1:
medoids = medoids.reshape((1,len(medoids)))
k = len(medoids)
S = np.empty((m, k))
for i in range(m):
d_i = np.linalg.norm(X[i, :] - medoids, ord=p, axis=1)
S[i, :] = d_i**p
return S
S = compute_d_p(datapoints, medoids_initial, 2)
def assign_labels(S):
return np.argmin(S, axis=1)
labels = assign_labels(S)
def update_medoids(X, medoids, p):
S = compute_d_p(points, medoids, p)
labels = assign_labels(S)
out_medoids = medoids
for i in set(labels):
avg_dissimilarity = np.sum(compute_d_p(points, medoids[i], p))
cluster_points = points[labels == i]
for datap in cluster_points:
new_medoid = datap
new_dissimilarity= np.sum(compute_d_p(points, datap, p))
if new_dissimilarity < avg_dissimilarity :
avg_dissimilarity = new_dissimilarity
out_medoids[i] = datap
return out_medoids
def has_converged(old_medoids, medoids):
return set([tuple(x) for x in old_medoids]) == set([tuple(x) for x in medoids])
#Full algorithm
def kmedoids(X, k, p, starting_medoids=None, max_steps=np.inf):
if starting_medoids is None:
medoids = init_medoids(X, k)
medoids = starting_medoids
converged = False
labels = np.zeros(len(X))
i = 1
while (not converged) and (i <= max_steps):
old_medoids = medoids.copy()
S = compute_d_p(X, medoids, p)
labels = assign_labels(S)
medoids = update_medoids(X, medoids, p)
converged = has_converged(old_medoids, medoids)
i += 1
return (medoids,labels)
results = kmedoids(datapoints, 3, 2)
final_medoids = results[0]
data['clusters'] = results[1]
There's a good chance numpy's broadcasting capabilities will help. Getting broadcasting to work in 3+ dimensions is a bit tricky, and I usually have to resort to a bit of trial and error to get the details right.
The use of linalg.norm here compounds things further, because my version of the code won't give identical results to linalg.norm for all inputs. But I believe it will give identical results for all relevant inputs in this case.
I've added some comments to the code to explain the thinking behind certain details.
def compute_d_p_broadcasted(X, medoids, p):
# If a 1-D array is provided,
# it will be reshaped to a single row 2-D array
if len(medoids.shape) == 1:
medoids = medoids.reshape((1,len(medoids)))
# In general, broadcasting n-dim arrays requires that the last
# dim of the first array be a singleton dimension, and that the
# first dim of the second array be a singleton dimension. We can
# quickly accomplish that by slicing with `None` in the appropriate
# places. (`np.newaxis` is a slightly more self-documenting way
# of spelling `None`, but I rarely bother.)
# In this case, the shapes of the other two dimensions also
# have to align in the same way you'd expect for a dot product.
# So we pass `medoids.T`.
diff = np.abs(X[:, :, None] - medoids.T[None, :, :])
# The last tricky bit is to figure out which axis to sum. Right
# now, the array is a 3-dimensional array, with the first
# dimension corresponding to the rows of `X` and the last
# dimension corresponding to the columns of `medoids.T`.
# The middle dimension corresponds to the underlying dimensionality
# of the space; that's what we want to sum for a sum of squares.
# (Or sum of cubes for L3 norm, etc.)
return (diff ** p).sum(axis=1)
def compute_d_p(X, medoids, p):
m = len(X)
medoids_shape = medoids.shape
# If a 1-D array is provided,
# it will be reshaped to a single row 2-D array
if len(medoids_shape) == 1:
medoids = medoids.reshape((1,len(medoids)))
k = len(medoids)
S = np.empty((m, k))
for i in range(m):
d_i = np.linalg.norm(X[i, :] - medoids, ord=p, axis=1)
S[i, :] = d_i**p
return S
# A couple of simple tests:
X = np.array([[ 1.0, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
medoids = X[[0, 2], :]
np.allclose(compute_d_p(X, medoids, 2),
compute_d_p_broadcasted(X, medoids, 2))
# Returns True
np.allclose(compute_d_p(X, medoids, 3),
compute_d_p_broadcasted(X, medoids, 3))
# Returns True
Of course, these tests don't tell whether this actually gives a significant speedup. You'll have to check that yourself for the relevant use-case. But I suspect it will at least help.

How to use scipy.optimize with array function?

I created a custom-made exponential smoothing function. I want to optimize its parameters thanks to scipy.optimize.basinhopping.
If I only optimize the function for one time serie it works.
import numpy as np
from scipy.optimize import basinhopping
d = [1,2,3,4]
cols = len(d)
def simple_exp_smooth(inputs):
ini,alpha = inputs
f = np.full(cols,np.nan)
f[0] = ini
for t in range(1,cols):
f[t] = alpha*d[t-1]+(1-alpha)*f[t-1]
error = sum(abs(f[1:] - d[1:]))
return error
func = simple_exp_smooth
bounds = np.array([(0,4),(0.0, 1.0)])
x0 = (1,0.1)
res = basinhopping(func, x0, minimizer_kwargs={'bounds': bounds},stepsize=0.1,niter=45)
One of the issue of this is that it is slow if you have 1000 time series to optimize. So I have created an array version to perform the exponential smoothing of multiple time series at once.
def simple(inputs):
a0,alpha = inputs
a = np.full([rows,cols],np.nan)
a[:,0] = a0
for t in range(1,cols):
a[:,t] = alpha*d[:,t]+(1-alpha)*a[:,t-1]
MAE = abs(d - a).mean(axis=1)/d.mean(axis=1)
return sum(MAE)
d = np.array([[1,2,3,4],
rows, cols = d.shape
a0_bound = np.vstack((d.min(axis=1),d.max(axis=1))).T
a0_ini = d.mean(axis=1)
bounds = ([a0_bound,(0.0, 1.0)])
x0 = (a0_ini,0.2)
res = basinhopping(simple, x0, minimizer_kwargs={'bounds': bounds},stepsize=0.1)
But now, the basinhopping gives me this error:
bounds = [(None if l == -np.inf else l, None if u == np.inf else u) for l, u in bounds]
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Is there any way for me to use basinhopping to optimize my whole array at once instead of each line one by one?

Efficient way to implement simple filter with varying coeffients in Python/Numpy

I am looking for an efficient way to implement a simple filter with one coefficient that is time-varying and specified by a vector with the same length as the input signal.
The following is a simple implementation of the desired behavior:
def myfilter(signal, weights):
output = np.empty_like(weights)
val = signal[0]
for i in range(len(signal)):
val += weights[i]*(signal[i] - val)
output[i] = val
return output
weights = np.random.uniform(0, 0.1, (100,))
signal = np.linspace(1, 3, 100)
output = myfilter(signal, weights)
Is there a way to do this more efficiently with numpy or scipy?
You can trade in the overhead of the loop for a couple of additional ops:
import numpy as np
def myfilter(signal, weights):
output = np.empty_like(weights)
val = signal[0]
for i in range(len(signal)):
val += weights[i]*(signal[i] - val)
output[i] = val
return output
def vectorised(signal, weights):
wp = np.r_[1, np.multiply.accumulate(1 - weights[1:])]
sw = weights * signal
sw[0] = signal[0]
sws = np.add.accumulate(sw / wp)
return wp * sws
weights = np.random.uniform(0, 0.1, (100,))
signal = np.linspace(1, 3, 100)
print(np.allclose(myfilter(signal, weights), vectorised(signal, weights)))
On my machine the vectorised version is several times faster. It uses a "closed form" solution of your recurrence equation.
Edit: For very long signal / weight (100,000 samples, say) this method doesn't work because of overflow. In that regime you can still save a bit (more than 50% on my machine) using the following trick, which has the added bonus that you needn't solve the recurrence formula, only invert it.
from scipy import linalg
def solver(signal, weights):
rw = 1 / weights[1:]
v = np.r_[1, rw, 1-rw, 0]
v.shape = 2, -1
return linalg.solve_banded((1, 0), v, signal)
This trick uses the fact that your recurrence is formally similar to a Gauss elimination on a matrix with only one nonvanishing subdiagonal. It piggybacks on a library function that specialises in doing precisely that.
Actually, quite proud of this one.

Swapping rows of a Theano symbolic matrix

I am implementing parallel tempering Gibbs sampling using Theano. I am trying to create a Theano function that takes a matrix X and swaps some of its rows. I have a symbolic binary vector named swaps that denotes which rows should be swapped (i.e., if swaps[i] == 1, then X[i] and X[i+1] should be swapped). The order of swapping is not important for me.
I was trying to write a theano.scan that goes through the swaps vector and performs swapping of X row-by-row. The problem is that Theano doesn't allow doing something like X[pos], X[pos + 1] = X[pos + 1], X[pos] with symbolic variables. Here is a simple code snippet of what I am trying to do.
import numpy as np
import theano
import theano.tensor as T
def swap(swp, pos, idx):
if swp: idx[pos], idx[pos + 1] = idx[pos + 1], idx[pos]
return idx
max_length = 10
swaps = T.ivector('swaps')
idx = T.ivector('idx')
pos = T.iscalar('pos')
new_idx, updates = theano.scan(swap,
sequences=[swaps, T.arange(max_length)],
do_swaps = theano.function([swaps, idx], new_idx[-1], updates=updates)
idx_swapped = do_swaps(np.array([1, 1, 0, 1]), np.arange(5))
print idx_swapped
Any ideas on how I can do this the right way?
Okay, here is a really simple solution I found.
import numpy as np
import theano
import theano.tensor as T
def swap(swp, pos, X):
return T.concatenate([X[:pos],X[[pos+swp]],X[[pos+1-swp]],X[pos+2:]])
max_length = 10
swaps = T.ivector('swaps')
pos = T.iscalar('pos')
X = T.vector('X')
new_X, _ = theano.scan(swap,
sequences=[swaps, T.arange(max_length)],
do_swaps = theano.function([swaps, X], new_X[-1])
X_swapped = do_swaps(np.array([1, 1, 0, 1], dtype='int32'), np.arange(5))
print X_swapped
However, I am not sure how it is optimal or not for executing on a GPU.

