lmfit minimize fails with ValueError: array is too big

lmfit minimize fails with ValueError: array is too big - python

I am trying to use the "brute" method to minimize a function of 20 variables. It is failing with a mysterious error. Here is the complete code:
import random
import numpy as np
import lmfit
def progress_update(params, iter, resid, *args, **kws):
pass
#print(resid)
def score(params, data = None):
parvals = params.valuesdict()
M = data
X_params = []
Y_params = []
for i in range(M.shape[0]):
X_params.append(parvals['x'+str(i)])
for j in range(M.shape[1]):
Y_params.append(parvals['y'+str(i)])
return diff(M, X_params, Y_params)
def diff(M, X_params, Y_params):
total = 0
for i in range(M.shape[0]):
for j in range(M.shape[1]):
total += abs(M[i,j] - (X_params[i] - Y_params[j])**2)
return total
dim = 10
random.seed(0)
M = np.empty((dim, dim))
for i in range(M.shape[0]):
for j in range(M.shape[1]):
M[i,j] = i*random.random()+j**2
params = lmfit.Parameters()
for i in range(M.shape[0]):
params.add('x'+str(i), value=random.random()*10, min=0, max=10)
for j in range(M.shape[1]):
params.add('y'+str(j), value=random.random()*10, min=0, max=10)
result = lmfit.minimize(score, params, method='brute', kws={'data': M}, iter_cb=progress_update)
However, this fails with:
ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.
What is causing this problem?

"What is causing this problem"
Math
You can't brute force a high dimensional problem because brute force methods require exponential work (time, and memory if implemented naively).
More directly, lmfit uses numpy (*) under the hood, which has a maximum size of how much data it can allocate. Your initial data structure isn't too big (10x10), it's the combinatorical table required for a brute force that's causing problems.
If you're willing to hack the implementation, you could switch to a sparse memory structure. But this doesn't solve the math problem.
On High Dimensional Optimization
Try a different minimzer, but be warned: it's very difficult to minimze globally in high dimensional space. "Local minima" methods like fixed point / gradient descent might be more productive.
I hate to be pessimistic, but high level optimization is very hard when probed generally, and I'm afraid is beyond the scope of an SO question. Here is a survey.
Practical Alternatives
Gradient descent is supported a little in sklearn but more for machine learning than general optimization; scipy actually has pretty good optimization coverage, and great documentation. I'd start there. It's possible to do gradient descent there too, but not necessary.
From scipy's docs on unconstrained minimization, you have many options:
Method Nelder-Mead uses the Simplex algorithm [], []. This algorithm
is robust in many applications. However, if numerical computation of
derivative can be trusted, other algorithms using the first and/or
second derivatives information might be preferred for their better
performance in general.
Method Powell is a modification of Powell’s method [], [] which is a
conjugate direction method. It performs sequential one-dimensional
minimizations along each vector of the directions set (direc field in
options and info), which is updated at each iteration of the main
minimization loop. The function need not be differentiable, and no
derivatives are taken.
and many more derivative-based methods are available. (In general, you do better when you have derivative information available.)
Footnotes/Looking at the Source Code
(*) the actual error is thrown here, based on your numpy implementation. Quoted:
`if (npy_mul_with_overflow_intp(&nbytes, nbytes, dim)) {
PyErr_SetString(PyExc_ValueError,
"array is too big; `arr.size * arr.dtype.itemsize` "
"is larger than the maximum possible size.");
Py_DECREF(descr);
return NULL;`

Related

Optimization Problem with fast matrix-vector multiplication in Python / cvxpy

I want to solve the following (convex) minimization problem:
min ||x||_1 under the constraints sgn(A[x,R]=y) and ||x||_2 = 1
where A is a mx(N+1) matrix, x in R^N a vector, and \[x,R\] a vector that is created by appending a given number R. The objective is to find the optimal value for x.
A is a Fourier matrix and there are fast matrix-vector, inversion, etc. algorithms available. Since this matrix is really big, I need to use an optimization algorithm that utilizes this.
Currently, I use the following implementation in cvxpy, which is way too slow:
import cvxpy as cvx
# rewrite the problem in the form x = x^- + x^+
n = A.shape[1]-1
vx = cvx.Variable(2*n)
objective = cvx.Minimize(cvx.pnorm(vx, 1)) # min ||x||_1
constraints = [vx >= 0, cvx.multiply(A[:,:n] # vx[:n] - A[:,:n] # vx[n:] + A[:,n]*R, y) >= 0,
cvx.norm(vx, 2) <= R] # sgn(A[x,1]) = y, ||x||_2 <= R
x, solve_time = solve(vx, objective, constraints)
solution = x[:n] - x[n:]
Is there a way to use fast matrix computations in cvxpy? Or is there a better library? I found a few implementations that can do this for one special algorithm but not in the general case, so I was not able to implement my problem.

No. The solver will not call your matrix multiplication code. They do their own linear algebra, which is very different in many ways. In a sense your matrix multiplication is just notation for the problem statement.
Regarding performance, it depends heavily on where the bottleneck is. Is it in generating the model (in cvxpy itself) or in the solver? What solver are you using? Consider using a different solver. Obviously, we don't have enough information (and no reproducible example) to answer this question.

Curve_fit for a function that returns a numpy array

I know the library curve_fit of scipy and its power to fitting curves. I have read many examples here and in the documentation, but I cannot solve my problem.
For example, I have 10 files (chemical structers but it does not matter) and ten experimental energy values. I have a function inside a class that calculates for each structure the theoretical energy for some parameters and it returns a numpy array with the theoretical energy values.
I want to find the best parameters to have the theoretical values nearest to the experimental ones. I will furnish here the minimum exemple of my code
This is the class function that reads the experimental energy files, extracts the correct substring and returns the values as a numpy array. The self.path is just the directory and self.nPoints = 10. It is not so important, but I furnish for the sake of completeness
def experimentalValues(self):
os.chdir(self.path)
energy = np.zeros(self.nPoints)
for i in range(1, self.nPoints):
f = open("p_" + str(i + 1) + ".xyz", "r")
energy[i] = float(f.readlines()[1].split()[1])
f.close()
os.chdir('..')
return energy
I calculate the theoretical value with this class function that takes two numpy arrays as arguments, lets say
sigma = np.full(nSubstrate, 2.)
epsilon = np.full(nSubstrate, 0.15)
where nSubstrate = 9
Here there is the class function. It reads files and does two nested loops to calculate for each file the theoretical value and return it to a numpy array.
def theoreticalEnergy(self, epsilon, sigma):
os.chdir(self.path)
cE = np.zeros(self.nPoints)
for n in range(0, self.nPoints):
filenameXYZ = "p_" + str(n + 1) + "_extended.xyz"
allCoordinates = np.loadtxt(filenameXYZ, skiprows = 0, usecols = (1, 2, 3))
substrate = allCoordinates[0:self.nSubstrate]
surface = allCoordinates[self.nSubstrate:]
for i in range(0, substrate.shape[0]):
positionAtomI = np.array(substrate[i][:])
for j in range(0, surface.shape[0]):
positionAtomJ = np.array(surface[j][:])
distanceIJ = self.distance(positionAtomI, positionAtomJ)
cE[n] += self.LennardJones(distanceIJ, epsilon[i], sigma[i])
os.chdir('..')
return cE
Again, for the sake of completeness the Lennard Jones class function is defined as
def LennardJones(self, distance, epsilon, sigma):
repulsive = (sigma/distance) ** 12.
attractive = (sigma/distance) ** 6.
potential = 4. * epsilon* (repulsive - attractive)
return potential
where in this case all the arguments are scalar as the return value.
To conclude the problem presentation I have 3 ingredients:
a numpy array with the experimental data
two numpy arrays with a guess for the parameters sigma and epsilon
a function that takes the last parameters and returns a numpy vector with the values to be fitted.
How can I solve this problem like the approach described in the documentation https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html?

Curve fitting
The curve_fit fits a function f(w, x[i]) to points y[i] by finding w that minimizes sum((f(w, x[i] - y[i])**2 for i in range(n)). As you will read in the first line after the function definition
[It uses] non-linear least squares to fit a function, f, to data.
It refers to least_squares where it states
Given the residuals f(x) (an m-D real function of n real variables) and the loss function rho(s) (a scalar function), least_squares finds a local minimum of the cost function F(x):
Curve fitting is a kind of convex-cost multi-objective optimization. Since the each individual cost is convex, you can add all of them and that will still be a convex function. Notice that the decision variables (the parameters to be optimized) are the same in every point.
Your problem
In my understanding for each energy level you have a different set of parameters, if you write it as a curve fitting problem, the objective function could be expressed as sum((f(w[i], x[i]) - y[i])**2 ...), where y[i]is determined by the energy level. Since each of the terms in the sum is independent on the other terms, this is equivalent to finding each group of parametersw[i]separately minimizing(f(w[i], x[i]) - y[i])**2`.
Convexity
Convexity is a very convenient property for optimization because it ensures that you will have only one minimum in the parameter space. I am not doing a detailed analysis but have reasonable doubts about the convexity of your energy function.
The Lennard Jones function has the difference of a repulsive and an attractive force both with negative even exponent on the distance this alone is very unlikely to be convex.
The sum of multiple local functions centered at different positions has no defined convexity.
Molecular energy, or crystal energy, or protein folding are well known to be non-convex.
A few days ago (on a bike ride) I was thinking about this, how the molecules will be configured in a global minimum energy, and I was wondering if it finds that configuration so rapidly because of quantum tunneling effects.
Non-convex optimization
The non-convex (global) optimization is different from (non-linear) least-squares, in the sense that when a local minimum is found the process don't return immediately, it start making new attempts in different regions of the search spaces. If the function is smooth you can still take advantage of a gradient based local optimization method, but the complexity is still NP.
A classic global optimization method is the Simulated annenaling, if you have a chemical background I think you will have some insights reading about it. Once upon a time, simulated annealing was provided in scipy.optimize.
You will find a few global optimization methods in scipy.optimize. I would encourage you to try Basin hopping, since it was successfully applied to similar problems, as you can read in the references.
I hope this drop you on the right way to your solution. But, be aware that you will probably need to spend, learning how to use the function and will need to make some decisions. You will need to find a balance of accuracy, simplicity, efficiency.
If you want better solution take the time to derive the gradient of the cost function (you can return two values f, and df, where df is the gradient of f with respect to the decision variables).

Is there a way to optimise for m out of n variables?

I have a (Portfolio Optimization) python program that uses scipy to optimize n variables using constraints. However, I was wondering if it is possible to tell the program to choose m out of these n variables that will be the best for maximsing the objective?
This is my current code:
def obj(x):
return (-np.sum(array_weights.t*x))
def con_vol(x):
return np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights))
where array_returns is list of returns for all the stocks that is imported from Bloomberg and cov_matrix is the covariance matrix constructed using the returns data.
cons = [{'type':'eq','fun':lambda x: np.sum(x)-1}, {'type':'ineq','fun':lambda x: con_vol}\
bnds = tuple(0.02, 0.1) for x in range(20)
opts = sco.minimize(obj, list_final_weights, bounds = bnds, method = 'SLQSP', constraints = cons)
In this program weights are allocated to all the assets. I want a way in which it can choose (say the best 10 to allocate to out of 20)

One option would be to solve this problem iteratively (running the scipy solver many times):
1st iteration: run the scipy solver, take the solution, and discard all of the n variables the coefficients of which are below a certain (initially rather small) threshold t.
Next iteration: run the scipy solver again, now not searching within the space of the previously discarded variables anymore. With a (slighlty) increased value of t, now discard further variables.
Repeat this until only m variables left.
There are also more sophisticated approaches involving e.g. genetic programming techniques to identify relevant parameters or even functional forms (so-called sparse symbolic regression in the latter case, see e.g. a python implementation here: https://github.com/snagcliffs/PDE-FIND.

FFT polynomial multiplication in Python using inbuilt Numpy.fft

I want to multiply two polynomials fast in python. As my polynomials are rather large (> 100000) elements and I have to multiply lots of them. Below, you will find my approach,
from numpy.random import seed, randint
from numpy import polymul, pad
from numpy.fft import fft, ifft
from timeit import default_timer as timer
length=100
def test_mul(arr_a,arr_b): #inbuilt python multiplication
c=polymul(arr_a,arr_b)
return c
def sb_mul(arr_a,arr_b): #my schoolbook multiplication
c=[0]*(len(arr_a) + len(arr_b) - 1 )
for i in range( len(arr_a) ):
for j in range( len(arr_b) ):
k=i+j
c[k]=c[k]+arr_a[i]*arr_b[j]
return c
def fft_test(arr_a,arr_b): #fft based polynomial multuplication
arr_a1=pad(arr_a,(0,length),'constant')
arr_b1=pad(arr_b,(0,length),'constant')
a_f=fft(arr_a1)
b_f=fft(arr_b1)
c_f=[0]*(2*length)
for i in range( len(a_f) ):
c_f[i]=a_f[i]*b_f[i]
return c_f
if __name__ == '__main__':
seed(int(timer()))
random=1
if(random==1):
x=randint(1,1000,length)
y=randint(1,1000,length)
else:
x=[1]*length
y=[1]*length
start=timer()
res=test_mul(x,y)
end=timer()
print("time for built in pol_mul", end-start)
start=timer()
res1=sb_mul(x,y)
end=timer()
print("time for schoolbook mult", end-start)
res2=fft_test(x,y)
print(res2)
#########check############
if( len(res)!=len(res1) ):
print("ERROR");
for i in range( len(res) ):
if( res[i]!=res1[i] ):
print("ERROR at pos ",i,"res[i]:",res[i],"res1[i]:",res1[i])
Now, here are my approach in detail,
1. First, I tried myself with a naive implementation of Schoolbook with complexity O(n^2). But as you may expect it turned out to be very slow.
Second, I came to know polymul in the Numpy library. This function is a lot faster than the previous one. But I realized this is also a O(n^2) complexity. You can see, if you increase the length k the time increases by k^2 times.
My third approach is to try a FFT based multiplication using the inbuilt FFT functions. I followed the the well known approach also described here but Iam not able to get it work.
Now my questions are,
Where am I going wrong in my FFT based approach? Can you please tell me how can I fix it?
Is my observation that polymul function has O(n^2) complexity correct?
Please, let me know if you have any question.
Thanks in advance.

Where am I going wrong in my FFT based approach? Can you please tell me how can I fix it?
The main problem is that in the FFT based approach, you should be taking the inverse transform after the multiplication, but that step is missing from your code. With this missing step your code should look like the following:
def fft_test(arr_a,arr_b): #fft based polynomial multiplication
arr_a1=pad(arr_a,(0,length),'constant')
arr_b1=pad(arr_b,(0,length),'constant')
a_f=fft(arr_a1)
b_f=fft(arr_b1)
c_f=[0]*(2*length)
for i in range( len(a_f) ):
c_f[i]=a_f[i]*b_f[i]
return ifft(c_f)
Note that there may also a few opportunities for improvements:
The zero padding can be handled directly by passing the required FFT length as the second argument (e.g. a_f = fft(arr_a, length))
The coefficient multiplication in your for loop may be directly handled by numpy.multiply.
If the polynomial coefficients are real-valued, then you can use numpy.fft.rfft and numpy.fft.irfft (instead of numpy.fft.fft and numpy.fft.ifft) for some extra performance boost.
So an implementation for real-valued inputs may look like:
from numpy.fft import rfft, irfft
def fftrealpolymul(arr_a, arr_b): #fft based real-valued polynomial multiplication
L = len(arr_a) + len(arr_b)
a_f = rfft(arr_a, L)
b_f = rfft(arr_b, L)
return irfft(a_f * b_f)
Is my observation that polymul function has O(n2) complexity correct?
That also seem to be the performance I am observing, and matches the available code in my numpy installation (version 1.15.4, and there doesn't seem any change in that part in the more recent 1.16.1 version).

resampling, interpolating matrix

I'm trying to interpolate some data for the purpose of plotting. For instance, given N data points, I'd like to be able to generate a "smooth" plot, made up of 10*N or so interpolated data points.
My approach is to generate an N-by-10*N matrix and compute the inner product the original vector and the matrix I generated, yielding a 1-by-10*N vector. I've already worked out the math I'd like to use for the interpolation, but my code is pretty slow. I'm pretty new to Python, so I'm hopeful that some of the experts here can give me some ideas of ways I can try to speed up my code.
I think part of the problem is that generating the matrix requires 10*N^2 calls to the following function:
def sinc(x):
import math
try:
return math.sin(math.pi * x) / (math.pi * x)
except ZeroDivisionError:
return 1.0
(This comes from sampling theory. Essentially, I'm attempting to recreate a signal from its samples, and upsample it to a higher frequency.)
The matrix is generated by the following:
def resampleMatrix(Tso, Tsf, o, f):
from numpy import array as npar
retval = []
for i in range(f):
retval.append([sinc((Tsf*i - Tso*j)/Tso) for j in range(o)])
return npar(retval)
I'm considering breaking up the task into smaller pieces because I don't like the idea of an N^2 matrix sitting in memory. I could probably make 'resampleMatrix' into a generator function and do the inner product row-by-row, but I don't think that will speed up my code much until I start paging stuff in and out of memory.
Thanks in advance for your suggestions!

This is upsampling. See Help with resampling/upsampling for some example solutions.
A fast way to do this (for offline data, like your plotting application) is to use FFTs. This is what SciPy's native resample() function does. It assumes a periodic signal, though, so it's not exactly the same. See this reference:
Here’s the second issue regarding time-domain real signal interpolation, and it’s a big deal indeed. This exact interpolation algorithm provides correct results only if the original x(n) sequence is periodic within its full time interval.
Your function assumes the signal's samples are all 0 outside of the defined range, so the two methods will diverge away from the center point. If you pad the signal with lots of zeros first, it will produce a very close result. There are several more zeros past the edge of the plot not shown here:
Cubic interpolation won't be correct for resampling purposes. This example is an extreme case (near the sampling frequency), but as you can see, cubic interpolation isn't even close. For lower frequencies it should be pretty accurate.

If you want to interpolate data in a quite general and fast way, splines or polynomials are very useful. Scipy has the scipy.interpolate module, which is very useful. You can find many examples in the official pages.

Your question isn't entirely clear; you're trying to optimize the code you posted, right?
Re-writing sinc like this should speed it up considerably. This implementation avoids checking that the math module is imported on every call, doesn't do attribute access three times, and replaces exception handling with a conditional expression:
from math import sin, pi
def sinc(x):
return (sin(pi * x) / (pi * x)) if x != 0 else 1.0
You could also try avoiding creating the matrix twice (and holding it twice in parallel in memory) by creating a numpy.array directly (not from a list of lists):
def resampleMatrix(Tso, Tsf, o, f):
retval = numpy.zeros((f, o))
for i in xrange(f):
for j in xrange(o):
retval[i][j] = sinc((Tsf*i - Tso*j)/Tso)
return retval
(replace xrange with range on Python 3.0 and above)
Finally, you can create rows with numpy.arange as well as calling numpy.sinc on each row or even on the entire matrix:
def resampleMatrix(Tso, Tsf, o, f):
retval = numpy.zeros((f, o))
for i in xrange(f):
retval[i] = numpy.arange(Tsf*i / Tso, Tsf*i / Tso - o, -1.0)
return numpy.sinc(retval)
This should be significantly faster than your original implementation. Try different combinations of these ideas and test their performance, see which works out the best!

I'm not quite sure what you're trying to do, but there are some speedups you can do to create the matrix. Braincore's suggestion to use numpy.sinc is a first step, but the second is to realize that numpy functions want to work on numpy arrays, where they can do loops at C speen, and can do it faster than on individual elements.
def resampleMatrix(Tso, Tsf, o, f):
retval = numpy.sinc((Tsi*numpy.arange(i)[:,numpy.newaxis]
-Tso*numpy.arange(j)[numpy.newaxis,:])/Tso)
return retval
The trick is that by indexing the aranges with the numpy.newaxis, numpy converts the array with shape i to one with shape i x 1, and the array with shape j, to shape 1 x j. At the subtraction step, numpy will "broadcast" the each input to act as a i x j shaped array and the do the subtraction. ("Broadcast" is numpy's term, reflecting the fact no additional copy is made to stretch the i x 1 to i x j.)
Now the numpy.sinc can iterate over all the elements in compiled code, much quicker than any for-loop you could write.
(There's an additional speed-up available if you do the division before the subtraction, especially since inthe latter the division cancels the multiplication.)
The only drawback is that you now pay for an extra Nx10*N array to hold the difference. This might be a dealbreaker if N is large and memory is an issue.
Otherwise, you should be able to write this using numpy.convolve. From what little I just learned about sinc-interpolation, I'd say you want something like numpy.convolve(orig,numpy.sinc(numpy.arange(j)),mode="same"). But I'm probably wrong about the specifics.

If your only interest is to 'generate a "smooth" plot' I would just go with a simple polynomial spline curve fit:
For any two adjacent data points the coefficients of a third degree polynomial function can be computed from the coordinates of those data points and the two additional points to their left and right (disregarding boundary points.) This will generate points on a nice smooth curve with a continuous first dirivitive. There's a straight forward formula for converting 4 coordinates to 4 polynomial coefficients but I don't want to deprive you of the fun of looking it up ;o).

Here's a minimal example of 1d interpolation with scipy -- not as much fun as reinventing, but.
The plot looks like sinc, which is no coincidence:
try google spline resample "approximate sinc".
(Presumably less local / more taps ⇒ better approximation,
but I have no idea how local UnivariateSplines are.)
""" interpolate with scipy.interpolate.UnivariateSpline """
from __future__ import division
import numpy as np
from scipy.interpolate import UnivariateSpline
import pylab as pl
N = 10
H = 8
x = np.arange(N+1)
xup = np.arange( 0, N, 1/H )
y = np.zeros(N+1); y[N//2] = 100
interpolator = UnivariateSpline( x, y, k=3, s=0 ) # s=0 interpolates
yup = interpolator( xup )
np.set_printoptions( 1, threshold=100, suppress=True ) # .1f
print "yup:", yup
pl.plot( x, y, "green", xup, yup, "blue" )
pl.show()
Added feb 2010: see also basic-spline-interpolation-in-a-few-lines-of-numpy

Small improvement. Use the built-in numpy.sinc(x) function which runs in compiled C code.
Possible larger improvement: Can you do the interpolation on the fly (as the plotting occurs)? Or are you tied to a plotting library that only accepts a matrix?

I recommend that you check your algorithm, as it is a non-trivial problem. Specifically, I suggest you gain access to the article "Function Plotting Using Conic Splines" (IEEE Computer Graphics and Applications) by Hu and Pavlidis (1991). Their algorithm implementation allows for adaptive sampling of the function, such that the rendering time is smaller than with regularly spaced approaches.
The abstract follows:
A method is presented whereby, given a
mathematical description of a
function, a conic spline approximating
the plot of the function is produced.
Conic arcs were selected as the
primitive curves because there are
simple incremental plotting algorithms
for conics already included in some
device drivers, and there are simple
algorithms for local approximations by
conics. A split-and-merge algorithm
for choosing the knots adaptively,
according to shape analysis of the
original function based on its
first-order derivatives, is
introduced.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.