The Ellipsoid Method

post-date: 07/25/2017

The Ellipsoid method is an approach proposed by Schor [1] to solve convex optimization problems. It was further developed by Yudin and Nemirovskii [2], and Leonid Khachiyan [3, 4] later used it in his derivation of the first polynomial-time algorithm for linear programming. On a theoretical basis, Khachiyan's ellipsoid algorithm has polynomial bounds and is thus superior to the Simplex algorithm [5] as a linear program, however in practice the Simplex algorithm is more efficient. Despite this, Khachiyan's ellipsoid method is of fundamental importance to the analysis of linear programs and more generally in combinatorial optimization theory. In this post I summarize an approach [6-10] based on Khachiyan's algorithm. I also wrote a small c++ library that implements the algorithm. In part II of this post, I discuss the better-known Karmarkar's algorithm [11], which was the first efficient polynomial time linear program.

Home  /  Google Scholar  /  Github  /  LinkedIn

Brief derivation:

To demonstrate the core idea of the Ellipsoid method, we can consider the minimum volume enclosing ellipse, or MVEE, problem. The goal is to find an \(\epsilon\)-proximal minimum volume affine hull over the finite set \(X \in \mathbb{R}^{d \times n}\). To start, we recall that the general form of an ellipsoid can be expressed as a manifold in \(\mathbb{R}^{d}\): \[ \varepsilon(E, \overline{x}) := \\{x\in \mathbb{R}^{d} \ : \ (x-\overline{x})^{T}E(x-\overline{x}) \leq d \\} \tag{Eq. 1}\label{Eq. 1} \] where \( \overline{x} \) is the centroid of the ellipsoid. The affine transformation, \( E \), maps balls to ellipsoids and is symmetric (i.e., \( E = E^T \) ) positive definite (i.e., \( v^{t}Ev > 0 \ : \ v \neq 0 \)). The volume of \( \varepsilon \) is: \[ Vol(\varepsilon) = v_{0}\big(\text{det}(E)\big)^{-1/2} \tag{Eq. 2}\label{Eq. 2} \] where \( v_{0} \) is the volume of the unit hypersphere. The MVEE optimization problem therefore aims to find the centroid \( x_{0} \in \mathbb{R}^d \) and transformation \( E \in \mathbb{R}^{d \times d} \) that minimizes (2). The primal form is: \[ \begin{split} \text{min}_{E,\overline{x}} & \quad \text{det}(E^{-1}) \\ s.t. & \quad x_{i}^{T}Ex_{i} \leq 1, \quad i=1,2,..,n, \\ & \quad E > 0 \end{split} \tag{Eq. 3}\label{Eq. 3} \] which is not convex. Redefining (1) as \( \varepsilon(E, \overline{x}) = x\in \mathbb{R}^{d} \ : \ \|Ax-b\|_{2} \), \( \ \) where \( A = E^{1/2} \) and \( b=E^{1/2}x_{0} \), we can rewrite this as: \[ \begin{split} \text{min}_{A,b} & \quad \text{-log det}(A^{-1}) \\ s.t. & \quad ||Ax_{i}-b||_{2} \leq 1, \quad i=1,2,..,n, \\ & \quad A > 0 \end{split} \tag{Eq. 4}\label{Eq. 4} \] which is convex in the variables \( A \) and \( b \). To solve this, it is desirable to restrict \( P \) to be centrosymmetric, but this constraint is of limited practical use. To solve this problem, Khachiyan used the projection \( x \in \mathbb{R}^{d} \rightarrow q \in \mathbb{R}^{d+1} \) and solved the dual problem in \( R^{d+1} \). We first rewrite (4) as: \[ \begin{split} \text{min}_{M} & \quad \text{-log det}(M^{-1}) \\ s.t. & \quad q_{i}^{T}Mq_{i} \leq 1, \quad i=1,2,..,n, \\ & \quad A > 0 \end{split} \tag{Eq. 5}\label{Eq. 5} \] where \( q_{1},...,q_{n} \) are the columns of \( Q \in \mathbb{R}^{(d+1) \times n} \), \( Q^{T} = \big[X^{T}, \textbf{1}\big] \), and \( M \in \mathbb{R}^{n \times n} \) is symmetric positive definite. Note that we still have an inverse in the log loss function. We therefore consider the Lagrangian of (5): \[ \begin{split} L(M,\lambda) = \text{-log det}M + \sum_{i=1}^{n}\lambda_{i}\big(Q^{T}MQ-1\big) \end{split} \tag{Eq. 6}\label{Eq. 6} \] where \( \lambda \in \mathbb{R}^{n} \) is the Lagrange multiplier. Applying the Kuhn—Tucker condition we get: \[ \begin{split} \frac{\partial L}{\partial M} &= -M^{-1} + Q \Lambda Q^{T} \\ &= 0 \\ \therefore M &= Q^{T}\Lambda Q \end{split} \tag{Eq. 7}\label{Eq. 7} \] where \( \Lambda \) = Diag(\( \lambda \)). For clarity, we'll consider the simple variable substitution \( u = \lambda/(d+1) \), yielding a tidy dual representation of (6): \[ \begin{split} \text{max}_{u} \quad & \text{-log det}(QUQ^{T}) \\ s.t. \quad & \textbf{1}^{T}u = 1 \\ & u \geq 0 \end{split} \tag{Eq. 8}\label{Eq. 8} \] where \( U = diag(u) \in \mathbb{R}^{n \times n} \). The objective in (8) is strictly concave, so we can use coordinate ascent methods to solve it. We first take the gradient of the objective in (6) w.r.t. \( u \): \[ \begin{split} \Omega(u) &= \nabla (\text{log det} (QUQ^{T})) \\ &= \frac{\partial \text{log det}QUQ^{T}}{\partial u} \\ &= Q^{T}U^{-1}Q \end{split} \tag{Eq. 9}\label{Eq. 9} \] and use it in our update rule. Let \( \overline{u} \) be the current value of \( u \); at each step compute \( \Omega(\overline{u}) \) and solve the linear optimization problem: \[ \begin{split} \text{max}_{u} & \quad \Omega(\overline{u}) \\ s.t. & \quad\textbf{1}^{T}u &= 1 \\ & \quad u \geq 0 \end{split} \tag{Eq. 10}\label{Eq. 10} \] in the vicinity of \( \overline{u} \). The solution is \( e_{j} \in \mathbb{R}^{n} \), where \( j = argmax_{j} \ \Omega(\overline{u}) \). At each step, we are pushing \( u \) in the direction of \( \triangle u = e_{j} - \overline{u} \). Khachiyan showed that the step size, \( \alpha \), is given by \( max_{\alpha \in [0,1]} \text{log det}(QU(\overline{u} + \alpha (e_{j} - \overline{u}))) \), which has a closed form solution: \[ \alpha = \frac{\Omega_{j}(\overline{u})-(d+1)}{(d+1)(\Omega_{j}(\overline{u})-1)}. \tag{Eq. 11}\label{Eq. 11} \] Provided an optimal solution \( \hat{u} \), we seek the corresponding primal solution, \( \hat{M} \), from which we can extract the desired ellipsoid, (\( E, \overline{x} \)). To recap, our ellipsoid exists in three forms: \[ \begin{split} MVEE &= \{x\in \mathbb{R}^{d} : (x-\overline{x})^{T}\hat{E}(x-\overline{x}) \leq 1 \} \\ &= \{x\in \mathbb{R}^{d} : Q^{T}\hat{M}Q \leq 1 \} \\ &= \{x\in \mathbb{R}^{d} : Q^{T}\big(QUQ^{T}\big)^{-1} Q \leq 1 \} \end{split} \tag{Eq. 12}\label{Eq. 12} \] It can be shown that \( Q^{T}\big(QUQ^{T}\big)^{-1}Q = (x - Xu)^{T}E(x - Xu) \) (I'm omitting the proof for brevity); this allows us to extract (\( \hat{E},\hat{\overline{x}} \)) from the dual optimzation problem defined in (8): \[ \hat{E} = \frac{1}{d}\bigg(X\hat{U}X^{T} - X\hat{u}\big(X\hat{u}\big)^{T} \bigg)^{-1} \\ \hat{\overline{x}} = X\hat{u} \tag{Eq. 13}\label{Eq. 13} \] The complexity of this algorithm is shown below: \[ Runtime = O\bigg(nd^{2} \big((1 + \epsilon )^{\frac{2}{d+1}} -1\big)^{-1} + \text{ln} \ d \bigg) \tag{Eq. 14}\label{Eq. 14} \]


MVEE Implementation

Below is a c++ implementation of the algorithm. If you are interested in using this code, install this library.

#include <vector>
#include <Eigen/Dense>
std::vector<double> toStdVec(Eigen::VectorXd& fromX, double (*f)(double))
{
std::vector<double> toX(fromX.size(), 0.);
for (int i=0; i<fromX.size(); i++)
{
toX[i] = (*f)(fromX(i));
}
return toX;
}
std::vector<std::vector<double>> toStdMat(Eigen::MatrixXd& fromX, double (*f)(double))
{
std::vector<std::vector<double>>
toX(fromX.rows(), std::vector<double>(fromX.cols(), 0.));
for (int i=0; i<fromX.rows(); i++)
{
for (int j=0; j<fromX.cols(); j++)
{
toX[i][j] = (*f)(fromX(i, j));
}
}
return toX;
}
void decompose(MatrixXd& A, VectorXd& s, MatrixXd& U, MatrixXd& V)
{
JacobiSVD<MatrixXd> SVD(A, ComputeThinU | ComputeThinV);
s = SVD.singularValues();
V = SVD.matrixV();
U = SVD.matrixU();
}
void khachiyan(MatrixXd& data, double eps, double lmcoeff)
{
// Khachiyan's algorithm
auto t0 = chrono::system_clock::now();
iters = 0;
long double err = INFINITY;
double alpha;
int n = data.rows();
int d = data.cols();
MatrixXd X = data.transpose();
MatrixXd Q(X.rows() + 1, X.cols());
Q.row(0) = X.row(0);
Q.row(1) = X.row(1);
Q.row(2).setOnes();
VectorXd u = (1 / (double) n) * VectorXd::Ones(n);
VectorXd uhat = u;
MatrixXd G(d + 1, d + 1);
MatrixXd noiseye = MatrixXd::Identity(G.rows(), G.cols()) * lmcoeff;
VectorXd g(n);
double m; int i;
while (err > eps)
{
G = Q * u.asDiagonal() * Q.transpose() + noiseye;
g = (Q.transpose() * G.inverse() * Q).diagonal();
m = g.maxCoeff();
i = findIdx(g, m);
alpha = (m - d - 1) / ((d + 1) * (m - 1));
uhat = (1 - alpha) * u;
uhat(i) += alpha;
err = (uhat - u).norm();
u = uhat;
iters++;
}
time = chrono::duration<double>(
chrono::system_clock::now() - t0
).count();
// Decompose US^2U^T
MatrixXd E = (1 / (double) d) * (
X * u.asDiagonal() * X.transpose()
- (X * u) * (X * u).transpose()
).inverse();
VectorXd x = X * u;
VectorXd s(d);
MatrixXd V(d, d);
MatrixXd U(d, d);
decompose(E, s, U, V);
// Cache result
_centroid = toStdVec(x, [](double z) { return z; });
_radii = toStdVec(s, [](double z) { return 1 / sqrt(z); });
_pose = toStdMat(V, [](double z) { return z; });

Results

Below I show some plots that visually demonstrate what this algorithm does. I randomly generated toy data (blue) and ran the program to produce the bounding manifolds (red). As expected, the algorithm produces a tightly-bounded hull over the test set in \( \mathbb{R}^{2} \), even for highly irregular data sampled from the locus of an ellipse. The second figure summarizes the runtime behavior of the algorithm. The runtime is linear in the size of the data, \( n \), and logarithmic in its dimension, \( d \); exactly what we expect from expression (14). As I alluded to earlier, Khachiyan's algorithm isn't very efficient in practice. This can be confirmed by running the algorithm on higher dimensional data. I confirmed this using a randomly generated multivariate Guassian \( \in \ \mathbb{R}^{5000x500} \), observing a runtime of \( 40.6 \) seconds (Intel Core i7-6850k). In the next post, I'll discuss Karmakar's algorithm [11] and its variants which are signficantly more efficient.

(a) MVEE fit over a multimodal bivariate Gaussian, (b) MVEE fit over an imbalanced multimodal bivariate Gaussian, (c) MVEE fit for two non-overlapping ellipses, and (d) MVEE fit for two overlapping ellipses.


(a) Number of iterations to convergence vs. N (tol=0.01), (b) time to convergence vs. N (2013 Intel Core i5 CPU, tol=0.01), (c) Number of iterations to convergence vs. tolerance (N=10), (b) time to convergence vs. tolerance (2013 Intel Core i5 CPU, N=10).

References
  1. N. Z. Shor. Cut—off Method with Space Extension in Convex Programming Problems, Kibernetika, 13(1):94—95, 1977.
  2. D. Yudin, A. Nemirovskil, Evaluation of the Informational Complexity of Mathematical Programming Problems, Ekonomika i Mathematicheskie Metody, 12:128—142, 1976.
  3. L. Khachiyan, A polynomial algorithm in linear programming, Soviet Math. Doklady 20 (1) (1979) 191—194.
  4. L. Khachiyan, Polynomial algorithms for linear programming, USSR Comp. Math. and Math. Phys. 20 (2) (1980) 51—68.
  5. G. Dantzig, A. Orden, P. Wolfe The generalized simplex method for minimizing a linear form under linear inequality restraints, Pacific Journal of Mathematics (1953).
  6. Kumar et al., Mimimum volume enclosing ellipsoids and core sets, (2005).
  7. Moshtagh, Minimum volume enclosing ellipsoid, Convex Optimization (2005).
  8. Peng et al., Computation of minimum volume covering ellipsoids, Op. Res. (2004).
  9. P. Gacs, L. Lovasz. Khachiyan`s Algorithm for Linear Programming, Mathematical Programming Study, 14:61—68, 1981.
  10. R. G. Bland, D. Goldfarb, M. Todd. The Ellipsoid Method: A Survey, Operations Research, 29(6):1039—1091, 1981.
  11. N. Karmarkar. A New Polynomial—Time Algorithm for Linear Programming, Combinatorica, 4(4):373—395, 1984.