![]()
|
I am interested in designing algorithms to solve linear inverse problems, especially those that arise from compressed sensing.
From a generic point-of-view, these problems assume that we have data collected via (noisy) linear measurements: [; b = Ax_0 + z ;]. To recover the unknown signal [; x_0 ;], we look at the set of signals that agree with the measurements: [;C = \{ x : \|Ax - b \|_2 \le \epsilon \} ;] for a reasonable value of [; \epsilon ;].
To select the "best" signal from this class, there are many options. A common choice is choosing the signal with least norm, for some choice of norm. In particular, the [; \ell_1 ;] norm is often used because it induces sparsity. So the problem to solve is {; \min_x \|x\|_1 \quad\text{subject to}\quad x \in C. ;}
There are quite a few algorithms that solve similar problems, such as {; \min_x \|x\|_1 + \lambda/2 \|Ax-b\|_2^2 ;} which are equivalent for some choice of [; \lambda ;] and [; \epsilon ;], but this relation is not usually known. Our algorithm, TFOCS, is one of the few first-order algorithms that can efficiently solve the constrained version of this problem.
TFOCS can also deal with variations. Consider the following problem: {; \min_x \|Wx\|_1 + \lambda \|x\|_{TV} \quad\text{subject to}\quad x\in C;} where [; W ;] is a wavelet transform (in the wavelet domain, images are generally compressible) and [; \|\cdot\|_{TV} ;] is the total-variation norm, which penalizes changes in an image. Below are some results on the classic "cameraman" test image, where we have applied various formulations that we solved with TFOCS. This is a denoising problem, so the linear operator [; A ;] is the identity, and we add noise [; z ;].
![]() The original image without noise |
![]() The noisy version of the image (this is what we want to clean-up) |
![]() The cleaned-up image if we use wavelet thresholding (using an oracle to pick the optimal threshold). There are some artifacts. |
![]() The cleaned-up image using the wavelet and TV composite function (see equation above). The TV norm helps reduce the artifacts. |
To solve the problems efficiently, it is preferable to minimize a smooth function rather than a non-smooth function, because gradient descent methods have better convergence rates than sub-gradient descent methods.
The approach of TFOCS is to add a strongly convex term to the primal problem. This doesn't make the primal smooth, but rather it makes the dual smooth, and so we solve the dual problem via gradient descent (using an optimal version of gradient descent, such as Neterov's algorithm). The picture below shows the effect of the strongly convex term added to the [; \ell_1 ;] norm (the red line, labeled "dual smoothed"). As you can see, it retains the shark kink at 0, which is important since this is what promotes sparsity. Also plotted is the Huber function (the green line, labeled "primal smoothed"), which is a smooth approximation of the [; \ell_1 ;] norm that we used in NESTA.
The smoothing is controlled by a parameter [; \mu ;], where the extra term added to the objective is [; \mu/2 \|x-y\|_2 ;] for some [; y ;]. We want [; \mu ;] to be small so that we closely approximate the original problem, but we also want it large because then the smoothed problem is easier to solve (i.e. faster convergence). The standard solution is to use continuation: we solve for a large value of [; \mu ;] and then use this solution to "seed" the algorithm using a small value of [; \mu ;]. Continuation can also be done by keeping [; \my ;] fixed and updating [; y ;] with a better guess.
The TFOCS paper describes an "accelerated continuation" scheme that improves on this. We analyze the fixed-[;\mu;] version of continuation in terms of the proximal point method, and show that continuation is just the gradient step of a proximal point method. Thus we can apply Nesterov's accelerated gradient algorithm. The plot below shows this: we are plotting the error between the smoothed solution and non-smoothed solution. Each iteration is an update of [; y ;] in either the simple fashion ("regular continuation") or the accelerated fashion.
TFOCS also discusses some cases where the strongly convex parameter doesn't affect the solution, in so-called "exact regularization". See also the paper by Michael Friedlander and Paul Tseng.
Many compressed sensing problems enjoy a local strong convexity, and it is possible to exploit this using a restarted Nesterov algorithm. Note that, unlike gradient descent, accelerated algorithms do not automatically have linear convergence for strongly convex objectives.
The 1990s saw the explosion of interior-point methods (IPM), and by the end of the decade, IPM solvers were mature and robust in both theory and practice. However, they do have scaling issues: for a primal variable of dimension [; n ;], each step requires the solving of a linear system on the order of [; 2n \times 2n ;], so this is automatically an order [; n^3 ;] algorithm (even worse, the number of steps scales slightly in [; \sqrt{n} ;] ). The KKT system can sometimes be solved with an iterative solver like conjugate gradient, but this introduces other issues.
Below are some plots of a very respected IPM solver (SeDuMi, called via cvx), as well as l1-Magic (which is custom for the basis pursuit problem and uses conjugate gradient to solve the KKT system). You can see that the TFOCS first-order method is way faster at large problem sizes, and (not shown) requires a small memory footprint. SeDuMi required more than 4 GB of RAM on the largest problem and consequently memory calls thrashed.
The plot shows two curves for each algorithm: one curve used a loose tolerance, and one used a higher tolerance. LIPSOL is a Mehrotra-based primal-dual IPM built into MATLAB's linprog, and Simplex is also using MATLAB's implementation in linprog.
But are first-order methods like TFOCS as accurate as IPM methods? For this problem (a noiseless basis pursuit problem, with a sparse signal), the answer is yes. In fact, TFOCS returns an answer exact to machine precision, and exactly identifies the support, whereas IPM never exactly find zero entries (since, by definition, their solution is always on the interior).
Also, the Simplex method failed on most of the problem instances, and SeDuMi failed on some of the larger problems (it returned NaN, so those points are not plotted).
Thanks to Tex the world for script to generate latex images on this site.
Page last modified Mar 21, 2011