SciPy constrained optimization bug
Constrained optimization tasks are fundamental, and occur commonly in
real-world scenarios. A common example is a physical mixture, whose components
must sum to 100% and be non-negative. Constraints can also involve more
complex, nonlinear relationships between variables. Surprisingly, there can
still be some difficulty finding good software for the task. Historically,
a lot of the best software was commercial. This is still the case for certain
classes of optimization problems, such as mixed integer nonlinear programming
problems. In recent memory, Matlab’s fmincon
function was the most often
recommended routine for purely continuous inputs, as it supports arbitrary
constraints. SciPy, which I have used for many years, has an analogous routine
with the minimize(method='trust-constr')
method. Early in the year I was
running a simulation that used 1000s of calls to
minimize(method='trust-constr')
with various constraints. I was surprised to
find that the simulation errored out relatively often, saying that the bound
constraints were violated. I reported the issue
here. I implemented a quick fix
to suit my needs here, and
eventually implemented a more straightforward solution
here. It seems like this fix will
be merged soon. Without going in to details (you can read the comments on the
PRs and look at the diffs, if you like), there are workarounds that can be used
if you experience this issue and don’t want to modify the SciPy source code.
This issue only occurs when using the built in numerical methods to calculate
derivative functions (Jacobian and Hessian) of your optimization function and
constraint functions. It is always preferred to use analytical derivative
functions when possible, since it requires many additional function evaluations
to estimate them numerically, but this isn’t always practical or possible. If
you look at the PRs, you will see the issue is in the
_differentiable_functions.py
classes in the SciPy optimization module, since
the methods used for calculating gradients, etc., will not always keep the
solutions in the feasible (bound-constrained) domain. The most simple
workaround, short of implementing one of the fixes into your local SciPy, is to
supply your own general functions for computing the Jacobian and Hessian. In
spite of the apparent complexity of the classes in
differentiable_functions.py
, this can be done very simply. For instance, look
at the example case I gave showing the bug:
import numpy as np
import scipy.optimize as opt
of = lambda x: np.exp(x[0])*(4*x[0]**2 + 2*x[1]**2 + 4*x[0]*x[1] + 2*x[1] + 1)
nce = lambda x: x[0]**2 + x[1]
nci = lambda x: x[0]*x[1]
x0 = np.array((0.99, -0.99))
# This causes the bug - using a better starting position, [-1, 1], does not
nlcs = [opt.NonlinearConstraint(nci, -10, np.inf),
opt.NonlinearConstraint(nce, 1, 1)]
bnds = opt.Bounds(lb = np.array((-1, -1)), ub = np.array((1, 1)),
keep_feasible=True)
tst = opt.minimize(fun = of, x0 = x0, method = 'trust-constr', bounds = bnds,
constraints = nlcs)
This will error out. However, we can define some general functions for numerically computing Jacobians and Hessians, and specify these functions as the derivative functions to be used. In this case I will use complex step differentiation, based on the implementations by Yi Cao here and here.
def hesscs(x, fun):
n = x.flatten().shape[0]
H = np.zeros((n, n))
h = np.sqrt(np.finfo(float).eps)
h2 = h*h
i = np.complex(0, 1)
for k in range(n):
x1 = np.cdouble(x)
x1[k] = x1[k] + (h*i)
for l in range(k, n):
x2 = x1.copy()
x2[l]= x2[l] + h
u1 = fun(x2)
x2[l] = x1[l] - h
u2 = fun(x2)
H[k,l] = np.imag(u1-u2)/h2/2
H[l,k] = H[k,l]
return H
def gradcs(x, fun):
n = x.flatten().shape[0]
g = np.zeros((n))
h = np.sqrt(np.finfo(float).eps)
i = np.complex(0, 1)
for k in range(n):
x1 = np.cdouble(x)
x1[k] = x1[k] + (h*i)
v = fun(x1)
g[k] = np.imag(v)/h
return g
With these two functions, we can specify we want to use these methods to compute the Jacobian and Hessian of our optimization function and constraints.
jac_fun = lambda x: gradcs(x, of)
hess_fun = lambda x: hesscs(x, of)
nlcs_wj = [opt.NonlinearConstraint(nci, -10, np.inf,
jac = lambda x: gradcs(x,nci)),
opt.NonlinearConstraint(nce, 1, 1, jac = lambda x: gradcs(x, nce))]
tst = opt.minimize(fun = of, x0 = x0, jac = jac_fun, hess = hess_fun,
method = 'trust-constr', bounds = bnds,
constraints = nlcs_wj)
This will run successfully, in that it will not throw an error. It will not give the “correct” answer, since the supplied starting solution is quite bad, but even so, it should not be necessary to throw an error in such a case.