Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

[Boyd]_cvxslides

.pdf
Скачиваний:
7
Добавлен:
03.06.2015
Размер:
1.74 Mб
Скачать

Classical convergence analysis

assumptions

f strongly convex on S with constant m

2f is Lipschitz continuous on S, with constant L > 0:

k 2f(x) − 2f(y)k2 ≤ Lkx − yk2

(L measures how well f can be approximated by a quadratic function)

outline: there exist constants η (0, m2/L), γ > 0 such that

if k f(x)k2 ≥ η, then f(x(k+1)) − f(x(k)) ≤ −γ

if k f(x)k2 < η, then

2m2 k f(x(k+1))k2

2m2 k f(x(k))k2

L

 

 

L

2

Unconstrained minimization

10–18

damped Newton phase (k f(x)k2 ≥ η)

most iterations require backtracking steps

function value decreases by at least γ

if p > −∞, this phase ends after at most (f(x(0)) − p )/γ iterations

quadratically convergent phase (k f(x)k2 < η)

all iterations use step size t = 1

k f(x)k2 converges to zero quadratically: if k f(x(k))k2 < η, then

2m2 k f(xl)k2

2m2 k f(xk)k2

2

, l ≥ k

L

 

 

L

2lk

1

 

2lk

Unconstrained minimization

10–19

conclusion: number of iterations until f(x) − p ≤ ǫ is bounded above by

f(x(0)) − p

γ + log2 log20/ǫ)

• γ, ǫ0 are constants that depend on m, L, x(0)

• second term is small (of the order of 6) and almost constant for practical purposes

• in practice, constants m, L (hence γ, ǫ0) are usually unknown

• provides qualitative insight in convergence properties (I.E., explains two algorithm phases)

Unconstrained minimization

10–20

Examples

example in R2 (page 10–9)

x(0)

x(1)

backtracking parameters α = 0.1, β

converges in only 5 steps

quadratic local convergence

 

105

 

 

 

 

 

 

100

 

 

 

 

 

) − p

10

−5

 

 

 

 

 

)

 

 

 

 

 

 

 

 

 

 

 

(k

 

 

 

 

 

 

 

 

 

 

 

 

 

f(x

10−10

 

 

 

 

 

 

10−15

1

2

3

4

5

 

 

0

 

 

 

 

 

k

 

 

= 0.7

 

 

 

 

 

Unconstrained minimization

10–21

example in R100 (page 10–10)

 

105

 

 

 

 

 

 

2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

exact line search

 

 

100

 

 

 

 

 

(k)

1.5

 

 

 

 

p

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

backtracking

t

 

 

 

 

 

)k

10−5

 

 

 

size

1

 

 

 

 

 

 

 

 

 

 

 

 

 

)

 

 

 

 

 

 

 

 

 

 

 

 

(

 

 

 

 

 

 

step

 

 

 

 

 

(fx

 

exact line search

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

10−10

 

 

 

 

 

 

0.5

backtracking

 

 

 

10−15

2

4

6

8

10

 

0

2

4

6

8

 

0

 

0

 

 

 

 

k

 

 

 

 

 

k

 

 

backtracking parameters α = 0.01, β = 0.5

backtracking line search almost as fast as exact l.s. (and much simpler)

clearly shows two phases in algorithm

Unconstrained minimization

10–22

example in R10000 (with sparse ai)

 

 

 

 

 

10000

 

100000

log(bi − aiT x)

f(x) = −

log(1 − xi2) −

i=1

 

 

i=1

 

 

 

 

X

 

X

 

 

105

 

 

 

 

 

 

 

 

 

 

p

100

 

 

 

 

) −

 

 

 

 

 

 

 

 

 

(k)

 

 

 

 

 

f(x

 

 

 

 

 

 

10−5

 

 

 

 

 

0

5

10

15

20

 

 

 

k

 

 

• backtracking parameters α = 0.01, β = 0.5.

 

• performance similar as for small examples

 

Unconstrained minimization

10–23

Self-concordance

shortcomings of classical convergence analysis

depends on unknown constants (m, L, . . . )

bound is not a nely invariant, although Newton’s method is

convergence analysis via self-concordance (Nesterov and Nemirovski)

does not depend on any unknown constants

gives a ne-invariant bound

applies to special class of convex functions (‘self-concordant’ functions)

developed to analyze polynomial-time interior-point methods for convex optimization

Unconstrained minimization

10–24

Self-concordant functions

definition

• convex f : R → R is self-concordant if |f′′′(x)| ≤ 2f′′(x)3/2 for all x dom f

f : Rn → R is self-concordant if g(t) = f(x + tv) is self-concordant for all x dom f, v Rn

examples on R

linear and quadratic functions

negative logarithm f(x) = − log x

negative entropy plus negative logarithm: f(x) = x log x − log x

 

 

 

 

 

 

˜

 

 

 

a ne invariance: if f : R → R is s.c., then f(y) = f(ay + b) is s.c.:

˜′′′

3

f

′′′

(ay + b),

˜′′

2

f

′′

(ay + b)

f

(y) = a

 

f

(y) = a

 

Unconstrained minimization

10–25

Self-concordant calculus

properties

preserved under positive scaling α ≥ 1, and sum

preserved under composition with a ne function

• if g is convex with dom g = R++ and |g′′′(x)| ≤ 3g′′(x)/x then

f(x) = log(−g(x)) − log x

is self-concordant

examples: properties can be used to show that the following are s.c.

• f(x) = − Pm log(bi − aT x) on {x | aT x < bi, i = 1, . . . , m}

i=1 i i

f(X) = − log det X on Sn++

f(x) = − log(y2 − xT x) on {(x, y) | kxk2 < y}

Unconstrained minimization

10–26

Convergence analysis for self-concordant functions

summary: there exist constants η (0, 1/4], γ > 0 such that

• if λ(x) > η, then

f(x(k+1)) − f(x(k)) ≤ −γ

• if λ(x) ≤ η, then

2

2λ(x(k+1)) ≤ 2λ(x(k))

(η and γ only depend on backtracking parameters α, β) complexity bound: number of Newton iterations bounded by

f(x(0)) − p + log2 log2(1/ǫ)

γ

for α = 0.1, β = 0.8, ǫ = 10−10, bound evaluates to 375(f(x(0)) − p ) + 6

Unconstrained minimization

10–27