Implement early stopping in emcee given a convergence criterion - python

I am using emcee.EnsembleSampler to sample some likelihoods. I am doing this on millions of different data sets, and so performance is important.
It would be great to be able to get a chain to stop sampling once it has satisfied some convergence criterion on the autocorrelation time.
I can not find any way of doing this in the documentation, or through my searches, so was hoping someone has a recipe for it.

While I too couldn't find anything in the current emcee docs, the latest versions seems to have added a tutorial which happens to show the exact thing you are trying to do: link
In case the link breaks or things change, here is the example:
import emcee
import numpy as np
np.random.seed(42)
# The definition of the log probability function
# We'll also use the "blobs" feature to track the "log prior" for each step
def log_prob(theta):
log_prior = -0.5 * np.sum((theta-1.0)**2 / 100.0)
log_prob = -0.5 * np.sum(theta**2) + log_prior
return log_prob, log_prior
# Initialize the walkers
coords = np.random.randn(32, 5)
nwalkers, ndim = coords.shape
# Set up the backend
# Don't forget to clear it in case the file already exists
filename = "tutorial.h5"
backend = emcee.backends.HDFBackend(filename)
backend.reset(nwalkers, ndim)
# Initialize the sampler
sampler = emcee.EnsembleSampler(nwalkers, ndim, log_prob, backend=backend)
max_n = 100000
# We'll track how the average autocorrelation time estimate changes
index = 0
autocorr = np.empty(max_n)
# This will be useful to testing convergence
old_tau = np.inf
# Now we'll sample for up to max_n steps
for sample in sampler.sample(coords, iterations=max_n, progress=True):
# Only check convergence every 100 steps
if sampler.iteration % 100:
continue
# Compute the autocorrelation time so far
# Using tol=0 means that we'll always get an estimate even
# if it isn't trustworthy
tau = sampler.get_autocorr_time(tol=0)
autocorr[index] = np.mean(tau)
index += 1
# Check convergence
converged = np.all(tau * 100 < sampler.iteration)
converged &= np.all(np.abs(old_tau - tau) / tau < 0.01)
if converged:
break
old_tau = tau

Related

Error when observing on uniform with sampled parameters in PyMC

I’m new to PyMC and am trying to model a situation where you are rolling marbles at a wall and trying to find the block. The data is only for the values where the marble hits the block.
I’m first sampling an x location and size and then calculating a point from those with a uniform, but I’m getting an error.
import pymc3 as pm
import theano.tensor as tt
basic_model = pm.Model()
with basic_model:
# We are assuming independence of these.
x = pm.Uniform("x", lower=1, upper=30)
l = pm.Uniform("l", lower=1, upper=30)
lower = pm.Deterministic('lower', x-0.5*l)
upper = pm.Deterministic('upper', x+0.5*l)
point_x = pm.Uniform('point_x', lower=lower, upper=upper, observed=x_vals)
pm.sample()
With the error:
SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'x_interval__': array(0.), 'l_interval__': array(0.)}
Initial evaluation results:
x_interval__ -1.39
l_interval__ -1.39
point_x -inf
Name: Log-probability of test_point, dtype: float64
Clearly the issue is with point_x. I’m guessing the error has to do with the fact that the observed data may potentially fall outside the lower-upper range depending on the value of x and l sampled. But how might I fix this?
The sampler doesn't know how to handle starting off in an invalid region of the parameter space. A quick and dirty fix is to provide testval arguments that ensure the sampling begins in a logically valid solution. For example, we know the minimum block must have:
l_0 = np.max(x_vals) - np.min(x_vals)
x_0 = np.min(x_vals) + 0.5*l_0
and could use those:
x = pm.Uniform("x", lower=1, upper=30, testval=x_0)
l = pm.Uniform("l", lower=1, upper=30, testval=l_0)
Also, the nature of this model leads to many rejections due to impossibility, so you may want to use Metropolis for sampling, which almost always needs more steps and tuning
pm.sample(tune=10000, draws=10000, step=pm.Metropolis())
Alternative Models
Otherwise, consider reparameterizing so that only valid solutions are in the parameter space. One approach would be to sample l and then use that to constrain x. Something like:
other_model = pm.Model()
x_min = np.min(x_vals)
x_max = np.max(x_vals)
l_0 = x_max - x_min
with other_model:
# these have logical constraints from the data
l = pm.Uniform("l", lower=l_0, upper=30)
x = pm.Uniform("x", lower=x_max - 0.5*l, upper=x_min + 0.5*l)
lower = pm.Deterministic('lower', x - 0.5*l)
upper = pm.Deterministic('upper', x + 0.5*l)
point_x = pm.Uniform('point_x', lower=lower, upper=upper, observed=x_vals)
res = pm.sample(step=pm.NUTS(), return_inferencedata=True)
Another approach would be to sample lower and upper directly, and compute the x and l as deterministic variables from those.

Is there a way where I can optimize the output restricting the parameters in fmin from scipy.optimize

What I am doing: I modified the code from the zombie invasion system to demonstrate how it should be written and tried to optimize the least square error (defined as score function) with the fmin function.
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
from scipy import integrate
from scipy.optimize import fmin
#=====================================================
#Notice we must import the Model Definition
from zombiewithdata import eq
#=====================================================
#1.Get Data
#====================================================
Td=np.array([0.5,1,1.5,2,2.2,3,3.5,4,4.5,5])#time
Zd=np.array([0,2,2,5,2,10,15,50,250,400])#zombie pop
#====================================================
#2.Set up Info for Model System
#===================================================
# model parameters
#----------------------------------------------------
P = 0 # birth rate
d = 0.0001 # natural death percent (per day)
B = 0.0095 # transmission percent (per day)
G = 0.0001 # resurect percent (per day)
A = 0.0001 # destroy perecent (per day)
rates=(P,d,B,G,A)
# model initial conditions
#---------------------------------------------------
S0 = 500. # initial population
Z0 = 0 # initial zombie population
R0 = 0 # initial death population
y0 = [S0, Z0, R0] # initial condition vector
# model steps
#---------------------------------------------------
start_time=0.0
end_time=5.0
intervals=1000
mt=np.linspace(start_time,end_time,intervals)
# model index to compare to data
#----------------------------------------------------
findindex=lambda x:np.where(mt>=x)[0][0]
mindex=map(findindex,Td)
#=======================================================
#3.Score Fit of System
#=========================================================
def score(parms):
#a.Get Solution to system
F0,F1,F2,T=eq(parms,y0,start_time,end_time,intervals)
#b.Pick of Model Points to Compare
Zm=F1[mindex]
#c.Score Difference between model and data points
ss=lambda data,model:((data-model)**2).sum()
return ss(Zd,Zm)
#========================================================
#4.Optimize Fit
#=======================================================
fit_score=score(rates)
answ=fmin(score,(rates),full_output=1,maxiter=1000000)
bestrates=answ[0]
bestscore=answ[1]
P,d,B,G,A=answ[0]
newrates=(P,d,B,G,A)
#=======================================================
#5.Generate Solution to System
#=======================================================
F0,F1,F2,T=eq(newrates,y0,start_time,end_time,intervals)
Zm=F1[mindex]
Tm=T[mindex]
#======================================================
Now in the #optimize fit section, is there any way I can get best possible values of bestrates when I restrict the values of "rates" like lb <= P, d, B, G, A <= ub where lb=lower bound and ub=upper bound and manage to get minimum of score in that restricted region? It need not be the most optimized value. fmin uses Nelder-Mead (simplex) algorithm.
I am quite new to this, so any help in the right direction would be awesome. Feel free to ask any doubts regarding the code and I will answer to best of my knowledge. . Thank You.
I'm not sure why the original author of Adventures in Python : Fitting a Differential Equation System to Data jumps through hoops to get at the samples corresponding to the given data points, the procedure can be greatly simplified by passing to eq a time array instead of its construction parameters
#=======================================================
def eq(par,initial_cond,t):
#differential-eq-system----------------------
def funct(y,t):
Si, Zi, Ri=y
P,d,B,G,A=par
# the model equations (see Munz et al. 2009)
f0 = P - B*Si*Zi - d*Si
f1 = B*Si*Zi + G*Ri - A*Si*Zi
f2 = d*Si + A*Si*Zi - G*Ri
return [f0, f1, f2]
#integrate------------------------------------
ds = odeint(funct,initial_cond,t)
return ds.T
#=======================================================
This can then be called as
T = np.linspace(0, 5.0, 1000+1)
S,Z,R=eq(rates,y0,T)
but also in a way to produce only the values needed in the score function
Tm=np.append([0],Td)
Sm,Zm,Rm=eq(rates,y0,Tm)
This then simplifies the score function to
def score(parms):
#a.Get Solution to system
Sm,Zm,Rm=eq(parms,y0,Tm)
#c.Score Difference between model and data points
ss=lambda data,model:((data-model)**2).sum()
return ss(Zd,Zm[1:])
Now if you want for example strongly reject negative parameters, then you could change the return value to
return ss(Zd,Zm[1:]) + 1e6*sum(max(0,-x)**2 for x in parms)
which indeed renders all parameters positive (where previously in my notebook there was a negative first parameter).

Are these functions equivalent?

I am building a neural network that makes use of T-distribution noise. I am using functions defined in the numpy library np.random.standard_t and the one defined in tensorflow tf.distributions.StudentT. The link to the documentation of the first function is here and that to the second function is here. I am using the said functions like below:
a = np.random.standard_t(df=3, size=10000) # numpy's function
t_dist = tf.distributions.StudentT(df=3.0, loc=0.0, scale=1.0)
sess = tf.Session()
b = sess.run(t_dist.sample(10000))
In the documentation provided for the Tensorflow implementation, there's a parameter called scale whose description reads
The scaling factor(s) for the distribution(s). Note that scale is not technically the standard deviation of this distribution but has semantics more similar to standard deviation than variance.
I have set scale to be 1.0 but I have no way of knowing for sure if these refer to the same distribution.
Can someone help me verify this? Thanks
I would say they are, as their sampling is defined in almost the exact same way in both cases. This is how the sampling of tf.distributions.StudentT is defined:
def _sample_n(self, n, seed=None):
# The sampling method comes from the fact that if:
# X ~ Normal(0, 1)
# Z ~ Chi2(df)
# Y = X / sqrt(Z / df)
# then:
# Y ~ StudentT(df).
seed = seed_stream.SeedStream(seed, "student_t")
shape = tf.concat([[n], self.batch_shape_tensor()], 0)
normal_sample = tf.random.normal(shape, dtype=self.dtype, seed=seed())
df = self.df * tf.ones(self.batch_shape_tensor(), dtype=self.dtype)
gamma_sample = tf.random.gamma([n],
0.5 * df,
beta=0.5,
dtype=self.dtype,
seed=seed())
samples = normal_sample * tf.math.rsqrt(gamma_sample / df)
return samples * self.scale + self.loc # Abs(scale) not wanted.
So it is a standard normal sample divided by the square root of a chi-square sample with parameter df divided by df. The chi-square sample is taken as a gamma sample with parameter 0.5 * df and rate 0.5, which is equivalent (chi-square is a special case of gamma). The scale value, like the loc, only comes into play in the last line, as a way to "relocate" the distribution sample at some point and scale. When scale is one and loc is zero, they do nothing.
Here is the implementation for np.random.standard_t:
double legacy_standard_t(aug_bitgen_t *aug_state, double df) {
double num, denom;
num = legacy_gauss(aug_state);
denom = legacy_standard_gamma(aug_state, df / 2);
return sqrt(df / 2) * num / sqrt(denom);
})
So essentially the same thing, slightly rephrased. Here we have also have a gamma with shape df / 2 but it is standard (rate one). However, the missing 0.5 is now by the numerator as / 2 within the sqrt. So it's just moving the numbers around. Here there is no scale or loc, though.
In truth, the difference is that in the case of TensorFlow the distribution really is a noncentral t-distribution. A simple empirical proof that they are the same for loc=0.0 and scale=1.0 is to plot histograms for both distributions and see how close they look.
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
np.random.seed(0)
t_np = np.random.standard_t(df=3, size=10000)
with tf.Graph().as_default(), tf.Session() as sess:
tf.random.set_random_seed(0)
t_dist = tf.distributions.StudentT(df=3.0, loc=0.0, scale=1.0)
t_tf = sess.run(t_dist.sample(10000))
plt.hist((t_np, t_tf), np.linspace(-10, 10, 20), label=['NumPy', 'TensorFlow'])
plt.legend()
plt.tight_layout()
plt.show()
Output:
That looks pretty close. Obviously, from the point of view of statistical samples, this is not any kind of proof. If you were not still convinced, there are some statistical tools for testing whether a sample comes from a certain distribution or two samples come from the same distribution.

Stepsize control of dopri5 integrator

I am trying to solve a simple example with the dopri5 integrator in scipy.integrate.ode. As the documentation states
This is an explicit runge-kutta method of order (4)5 due to Dormand & Prince (with stepsize control and dense output).
this should work. So here is my example:
import numpy as np
from scipy.integrate import ode
import matplotlib.pyplot as plt
def MassSpring_with_force(t, state):
""" Simple 1DOF dynamics model: m ddx(t) + k x(t) = f(t)"""
# unpack the state vector
x = state[0]
xd = state[1]
# these are our constants
k = 2.5 # Newtons per metre
m = 1.5 # Kilograms
# force
f = force(t)
# compute acceleration xdd
xdd = ( ( -k*x + f) / m )
# return the two state derivatives
return [xd, xdd]
def force(t):
""" Excitation force """
f0 = 1 # force amplitude [N]
freq = 20 # frequency[Hz]
omega = 2 * np.pi *freq # angular frequency [rad/s]
return f0 * np.sin(omega*t)
# Time range
t_start = 0
t_final = 1
# Main program
state_ode_f = ode(MassSpring_with_force)
state_ode_f.set_integrator('dopri5', rtol=1e-6, nsteps=500,
first_step=1e-6, max_step=1e-3)
state2 = [0.0, 0.0] # initial conditions
state_ode_f.set_initial_value(state2, 0)
sol = np.array([[t_start, state2[0], state2[1]]], dtype=float)
print("Time\t\t Timestep\t dx\t\t ddx\t\t state_ode_f.successful()")
while state_ode_f.t < (t_final):
state_ode_f.integrate(t_final, step=True)
sol = np.append(sol, [[state_ode_f.t, state_ode_f.y[0], state_ode_f.y[1]]], axis=0)
print("{0:0.8f}\t {1:0.4e} \t{2:10.3e}\t {3:0.3e}\t {4}".format(
state_ode_f.t, sol[-1, 0]- sol[-2, 0], state_ode_f.y[0], state_ode_f.y[1], state_ode_f.successful()))
The result I get is:
Time Timestep dx ddx state_ode_f.successful()
0.49763822 4.9764e-01 2.475e-03 -8.258e-04 False
0.99863822 5.0100e-01 3.955e-03 -3.754e-03 False
1.00000000 1.3618e-03 3.950e-03 -3.840e-03 False
with a warning:
c:\python34\lib\site-packages\scipy\integrate_ode.py:1018: UserWarning: dopri5: larger nmax is needed
self.messages.get(idid, 'Unexpected idid=%s' % idid))
The result is incorect. If I run the same code with vode integrator, I get the expected result.
Edit
A similar issue is described here:
Using adaptive step sizes with scipy.integrate.ode
The suggested solution recommends setting nsteps=1, which solves the ODE correctly and with step-size control. However the integrator returns state_ode_f.successful() as False.
No, there is nothing wrong. You are telling the integrator to perform an integration step to t_final and it performs that step. Internal steps of the integrator are not reported.
The sensible thing to do is to give the desired sampling points as input of the algorithm, set for example dt=0.1 and use
state_ode_f.integrate( min(state_ode_f.t+dt, t_final) )
There is no single-step method in dopri5, only vode has it defined, see the source code https://github.com/scipy/scipy/blob/v0.14.0/scipy/integrate/_ode.py#L376, this could account for the observed differences.
As you found in Using adaptive step sizes with scipy.integrate.ode, one can force single-step behavior by setting the iteration bound nsteps=1. This will produce a warning every time, so one has to suppress these specific warnings to see a sensible result.
You should not use a parameter (which is a constant for the integration interval) for a time-dependent force. Use inside MassSpring_with_force the evaluation f=force(t). Possibly you could pass the function handle of force as parameter.

How to perform discrete optimization of functions over matrices?

I would like to optimize over all 30 by 30 matrices with entries that are 0 or 1. My objective function is the determinant. One way to do this would be some sort of stochastic gradient descent or simulated annealing.
I looked at scipy.optimize but it doesn't seem to support this sort of optimization as far as I can tell. scipy.optimize.basinhopping looked very tempting but it seems to require continuous variables.
Are there any tools in Python for this sort of general discrete optimization?
I think a genetic algorithm might work quite well in this case. Here's a quick example thrown together using deap, based loosely on their example here:
import numpy as np
import deap
from deap import algorithms, base, tools
import imp
class GeneticDetMinimizer(object):
def __init__(self, N=30, popsize=500):
# we want the creator module to be local to this instance, since
# creator.create() directly adds new classes to the module's globals()
# (yuck!)
cr = imp.load_module('cr', *imp.find_module('creator', deap.__path__))
self._cr = cr
self._cr.create("FitnessMin", base.Fitness, weights=(-1.0,))
self._cr.create("Individual", np.ndarray, fitness=self._cr.FitnessMin)
self._tb = base.Toolbox()
# an 'individual' consists of an (N^2,) flat numpy array of 0s and 1s
self.N = N
self.indiv_size = N * N
self._tb.register("attr_bool", np.random.random_integers, 0, 1)
self._tb.register("individual", tools.initRepeat, self._cr.Individual,
self._tb.attr_bool, n=self.indiv_size)
# the 'population' consists of a list of such individuals
self._tb.register("population", tools.initRepeat, list,
self._tb.individual)
self._tb.register("evaluate", self.fitness)
self._tb.register("mate", self.crossover)
self._tb.register("mutate", tools.mutFlipBit, indpb=0.025)
self._tb.register("select", tools.selTournament, tournsize=3)
# create an initial population, and initialize a hall-of-fame to store
# the best individual
self.pop = self._tb.population(n=popsize)
self.hof = tools.HallOfFame(1, similar=np.array_equal)
# print summary statistics for the population on each iteration
self.stats = tools.Statistics(lambda ind: ind.fitness.values)
self.stats.register("avg", np.mean)
self.stats.register("std", np.std)
self.stats.register("min", np.min)
self.stats.register("max", np.max)
def fitness(self, individual):
"""
assigns a fitness value to each individual, based on the determinant
"""
return np.linalg.det(individual.reshape(self.N, self.N)),
def crossover(self, ind1, ind2):
"""
randomly swaps a subset of array values between two individuals
"""
size = self.indiv_size
cx1 = np.random.random_integers(0, size - 2)
cx2 = np.random.random_integers(cx1, size - 1)
ind1[cx1:cx2], ind2[cx1:cx2] = (
ind2[cx1:cx2].copy(), ind1[cx1:cx2].copy())
return ind1, ind2
def run(self, ngen=int(1E6), mutation_rate=0.3, crossover_rate=0.7):
np.random.seed(seed)
pop, log = algorithms.eaSimple(self.pop, self._tb,
cxpb=crossover_rate,
mutpb=mutation_rate,
ngen=ngen,
stats=self.stats,
halloffame=self.hof)
self.log = log
return self.hof[0].reshape(self.N, self.N), log
if __name__ == "__main__":
np.random.seed(0)
gd = GeneticDetMinimizer()
best, log = gd.run()
It takes about 40 seconds to run 1000 generations on my laptop, which gets me from a minimum determinant value of about -5.7845x108 to -6.41504x1011. I haven't really played around much with the meta-parameters (population size, mutation rate, crossover rate etc.), so I'm sure it's possible to do a lot better.
Here's a greatly improved version that implements a much smarter crossover function that swaps blocks of rows or columns across individuals, and uses a cachetools.LRUCache to guarantee that each mutation step produces a novel configuration, and to skip evaluation of the determinant for configurations that have already been tried:
import numpy as np
import deap
from deap import algorithms, base, tools
import imp
from cachetools import LRUCache
# used to control the size of the cache so that it doesn't exceed system memory
MAX_MEM_BYTES = 11E9
class GeneticDetMinimizer(object):
def __init__(self, N=30, popsize=500, cachesize=None, seed=0):
# an 'individual' consists of an (N^2,) flat numpy array of 0s and 1s
self.N = N
self.indiv_size = N * N
if cachesize is None:
cachesize = int(np.ceil(8 * MAX_MEM_BYTES / self.indiv_size))
self._gen = np.random.RandomState(seed)
# we want the creator module to be local to this instance, since
# creator.create() directly adds new classes to the module's globals()
# (yuck!)
cr = imp.load_module('cr', *imp.find_module('creator', deap.__path__))
self._cr = cr
self._cr.create("FitnessMin", base.Fitness, weights=(-1.0,))
self._cr.create("Individual", np.ndarray, fitness=self._cr.FitnessMin)
self._tb = base.Toolbox()
self._tb.register("attr_bool", self.random_bool)
self._tb.register("individual", tools.initRepeat, self._cr.Individual,
self._tb.attr_bool, n=self.indiv_size)
# the 'population' consists of a list of such individuals
self._tb.register("population", tools.initRepeat, list,
self._tb.individual)
self._tb.register("evaluate", self.fitness)
self._tb.register("mate", self.crossover)
self._tb.register("mutate", self.mutate, rate=0.002)
self._tb.register("select", tools.selTournament, tournsize=3)
# create an initial population, and initialize a hall-of-fame to store
# the best individual
self.pop = self._tb.population(n=popsize)
self.hof = tools.HallOfFame(1, similar=np.array_equal)
# print summary statistics for the population on each iteration
self.stats = tools.Statistics(lambda ind: ind.fitness.values)
self.stats.register("avg", np.mean)
self.stats.register("std", np.std)
self.stats.register("min", np.min)
self.stats.register("max", np.max)
# keep track of configurations that have already been visited
self.tabu = LRUCache(cachesize)
def random_bool(self, *args):
return self._gen.rand(*args) < 0.5
def mutate(self, ind, rate=1E-3):
"""
mutate an individual by bit-flipping one or more randomly chosen
elements
"""
# ensure that each mutation always introduces a novel configuration
while np.packbits(ind.astype(np.uint8)).tostring() in self.tabu:
n_flip = self._gen.binomial(self.indiv_size, rate)
if not n_flip:
continue
idx = self._gen.random_integers(0, self.indiv_size - 1, n_flip)
ind[idx] = ~ind[idx]
return ind,
def fitness(self, individual):
"""
assigns a fitness value to each individual, based on the determinant
"""
h = np.packbits(individual.astype(np.uint8)).tostring()
# look up the fitness for this configuration if it has already been
# encountered
if h not in self.tabu:
fitness = np.linalg.det(individual.reshape(self.N, self.N))
self.tabu.update({h: fitness})
else:
fitness = self.tabu[h]
return fitness,
def crossover(self, ind1, ind2):
"""
randomly swaps a block of rows or columns between two individuals
"""
cx1 = self._gen.random_integers(0, self.N - 2)
cx2 = self._gen.random_integers(cx1, self.N - 1)
ind1.shape = ind2.shape = self.N, self.N
if self._gen.rand() < 0.5:
# row swap
ind1[cx1:cx2, :], ind2[cx1:cx2, :] = (
ind2[cx1:cx2, :].copy(), ind1[cx1:cx2, :].copy())
else:
# column swap
ind1[:, cx1:cx2], ind2[:, cx1:cx2] = (
ind2[:, cx1:cx2].copy(), ind1[:, cx1:cx2].copy())
ind1.shape = ind2.shape = self.indiv_size,
return ind1, ind2
def run(self, ngen=int(1E6), mutation_rate=0.3, crossover_rate=0.7):
pop, log = algorithms.eaSimple(self.pop, self._tb,
cxpb=crossover_rate,
mutpb=mutation_rate,
ngen=ngen,
stats=self.stats,
halloffame=self.hof)
self.log = log
return self.hof[0].reshape(self.N, self.N), log
if __name__ == "__main__":
np.random.seed(0)
gd = GeneticDetMinimizer(0)
best, log = gd.run()
My best score thus far is about -3.23718x1013 -3.92366x1013 after 10000 1000 generations, which takes about 45 seconds on my machine.
Based on the solution cthonicdaemon linked to in the comments, the maximum determinant for a 31x31 Hadamard matrix must be at least 75960984159088×230 ~= 8.1562x1022 (it's not yet proven whether that solution is optimal). The maximum determinant for an (n-1 x n-1) binary matrix is 21-n times the value for an (n x n) Hadamard matrix, i.e. 8.1562x1022 x 2-30 ~= 7.5961x1013, so the genetic algorithm gets within an order of magnitude of the current best known solution.
However, the fitness function seems to plateau around here, and I'm having a hard time breaking -4x1013. Since it's a heuristic search there is no guarantee that it will eventually find the global optimum.
I don't know of any straight-forward method for discrete optimization in scipy. One alternative is using the simanneal package from pip or github, which allows you to introduce your own move function, such that you can restrict it to moves within your domain:
import random
import numpy as np
import simanneal
class BinaryAnnealer(simanneal.Annealer):
def move(self):
# choose a random entry in the matrix
i = random.randrange(self.state.size)
# flip the entry 0 <=> 1
self.state.flat[i] = 1 - self.state.flat[i]
def energy(self):
# evaluate the function to minimize
return -np.linalg.det(self.state)
matrix = np.zeros((5, 5))
opt = BinaryAnnealer(matrix)
print(opt.anneal())
I have looked into this a bit.
A couple of things first off: 1) 56 million is the max value when the size of the matrix is 21x21, not 30x30:https://en.wikipedia.org/wiki/Hadamard%27s_maximal_determinant_problem#Connection_of_the_maximal_determinant_problems_for_.7B1.2C.C2.A0.E2.88.921.7D_and_.7B0.2C.C2.A01.7D_matrices.
But that is also an upper bound on -1, 1 matrices, not 1,0.
EDIT: Reading more carefully from that link:
The maximal determinants of {1, −1} matrices up to size n = 21 are given in the following table. Size 22 is the smallest open case. In the table, D(n) represents the maximal determinant divided by 2n−1. Equivalently, D(n) represents the maximal determinant of a {0, 1} matrix of size n−1.
So that table can be used for upper bounds, but remember they're divided by 2n−1. Also note that 22 is the smallest open case, so trying to find the maximum of a 30x30 matrix has not been done, and is not even close to being done just yet.
2) The reason David Zwicker's code gives an answer of 30 million is probably due to the fact that he's minimising. Not maximising.
return -np.linalg.det(self.state)
See how he's got the minus sign there?
3) Also, the solution space for this problem is very big. I calculate the number of different matrices to be 2^(30*30) i.e. in the order of 10^270. So looking at each matrix is simply impossible, and even look at most of them is too.
I have a bit of code here (adapted from David Zwicker's code) that runs, but I have no idea how close it is to the actual maximum. It takes around 45 mins to do 10 million iterations on my PC, or only about 2 mins for 1 mill iterations. I get a max value of around 3.4 billion. But again, I have no idea how close this is to the theoretical maximum.
import numpy as np
import random
import time
MATRIX_SIZE = 30
def Main():
startTime = time.time()
mat = np.zeros((MATRIX_SIZE, MATRIX_SIZE), dtype = int)
for i in range(MATRIX_SIZE):
for j in range(MATRIX_SIZE):
mat[i,j] = random.randrange(2)
print("Starting matrix:\n", mat)
maxDeterminant = 0
for i in range(1000000):
# choose a random entry in the matrix
x = random.randrange(MATRIX_SIZE)
y = random.randrange(MATRIX_SIZE)
mat[x,y] = 1 - mat[x,y]
#print(mat)
detValue = np.linalg.det(mat)
if detValue > maxDeterminant:
maxDeterminant = detValue
timeTakenStr = "\nTotal time to complete: " + str(round(time.time() - startTime, 4)) + " seconds"
print(timeTakenStr )
print(maxDeterminant)
Main()
Does this help?

Categories

Resources