Optimal Control and Pontryagin Maximum Principle

by allenlu2007

 

Optimal control 可視為 calculus of variation + dynamic system constraint (Lagrange multiplier 的類比) 的問題;或是 Lagrangian/Hamiltonian mechanics 的 “generalization”.  在 Pontryagin 和 Bellman 的努力下,optimal control 已變為獨立的 subject.

Optimal Control =

Calculus of Variations with Dynamic System Constraint =  

Lagrangian + Lagrange Multiplier =

Hamiltonian Optimization =

Pontryagin Maximum (or Minimum) Principle (PMP)


Lagrange multiplier λ 相對應 optimal control 中的 costate function p*(t)  (or λ*(t) )


Optimal Control Problem Statement 

NewImage

基本上 optimal control 包含兩部份:

ODE: 微分方程定義 dynamic system behavior

P: payoff/loss function 或是 object function

目標是找 a(t) 最後 maximize payoff function

 

幾個觀察:

1. x_dot = a(t)  ( i.e. f(x(t), a(t)) = a(t))  and tau(x(t), a(t) = tau(x(t), x_dot(t)) 就(退化)變成 calculus of variation problem.

1a. tau = L 就變成 Least action problem, 就是 Lagrangian mechanics.   

2. tau = 1, 就變成 least time optimization.  How to choose a(t)?

2a.  Another way is to parametrized time in x and x_dot.

3. x_dot = f(x, a) 也可看為 contraint, 類似 Lagrange multiplier

 

兩個想法 (Assume n degree of freedom and m degree of control)

1. 從 control + optimization 的想法出發,brutal force 解法,最後變成 calculus of variations 的問題。類似解 Euler-Lagrange equation n 個 2nd order differential equations. (假如可以寫出這樣的 equation). 非常繁複且難解。

2. 從 Hamiltonian mechanics 出發。Hamiltonian 借者引入 Legendre transform, 把 (calculus of variations Euler Lagrange equation) optimization 問題,轉換成解 2n 個 1st order differential equations.   Pontryagin 可能是從此得到靈感,定義出 control Hamiltonian 以及 costate p.  最後變成解另一個 optimization 的問題 (非 integral optimization, 而是類似 convex optimization, gradient H w.r.t. u = 0)  

Hamiltonian mechanics –> 解 2n 個 1st order differential equation 

Optimal control –> 不是直接解 differential equations, 而是變成另一個 optimization 問題。仍有 2n 個 1st order differential equations (including m control variables unknown).  加上解一般 optimization 問題 (多了 m 個 1st order differential equations for gradient H w.r.t. u =0).

In summary, method 1 是解 n 個 2nd order differential equations + 原來 m 個 constraint (?) equation

method 2 則是解 (2n+m) 個 1st order differential equations  

 

另一種相同的 statement

The following part is cited from an excellent optimal control note from Berkeley. 

NewImage

 

在引入 control Hamilitonian 之後 (positive definition of F),  變成另一個 optimization problem.

(4.1) -> (4.13);  (4.2) -> (4.11); (4.3) -> (4.12).   這兩種 statements 完全等價。

F 就類比 mechanics 的 Lagrangian;  H 就類比 mechanics 的 Hamiltonian (flip sign).

NewImage

 

Why Control (Pseudo) Hamiltonian?

引入 Hamiltonian (costate), 不是僅僅為了美學的原因。(三個非常簡潔對稱的 equations)

(1) 在 classic mechanics, Hamiltionia mechanics 把 Lagrangina mechanics 的 n 個 2nd order equations 變成 2n 個 1st order equation.

(2) 經過 Legendre transform 後, (q_dot, q) 轉為 (p, q) coordinate (p = DL/Dq_dot).  For some unknown reason (一種說法是 energy conservation, but how about for lossy dynamic system? 是否有更 general, Legendre transform base explanation?) 我們可用 phaser plot/diagram 解釋並預測 dynamic system 的演進。

(3) 在 optimal control 問題中,Control Hamiltonian 變成 constraint optimization 問題,而不是 phaser plot.  可以直接對應一般 constraint optimization (using Lagrange multiplier 解)問題。

(2)和(3) 雖然看來類似,但並不相同,之間的幾點差異:

(2) 的Hamiltonian mechanics 我們似乎只做了座標轉換 (q_dot -> p) 得到  phaser plot.  Both p and H are clearly defined.  p = DL/Dq_dot   H = p q – L

Lagrangian 沒有 constraint, 最後 Hamiltonian 也沒有對應的 optimization 問題 (指 Hamiltonian, 非 Lagrangian.  Lagrangian 有對應的 least action optimization).  從另一個角度,Hamiltonian 只有 p, q, but no u, 只要解 (p, q) 的微分方程。(TBA: let u=q_dot to go over this special case)

(3) 則沒有 clearly defined 的 p, 只有 clearly defined 的 H.   p 是解 equation 才得到,H = p f(x, u) – L(x, u)  

既然沒有 clearly defined p, 應該也沒有 phaser plot (really), 但 Hamiltonian 則變成 optimization 問題, 多了對 u 的 stationary point 可用來解 (p, q, u) 的微分方程。2n 個 1st order (for p and q) 加上 m 個 1st order (for u) 微分方程式。

Boscain’s article 提供下列 table. 

NewImage

 

Optimal Control = Calculus of Variations with Constraint =  Lagrangian + Lagrange Multiplier = Hamiltonian Optimization = Pontryagin Maximum (or Minimum) Principle (PMP)

下文是說明 x_dot = f(x, u) 可被視為 constraint, 而 p (costate) 則是 Lagrange multiplier. 因此可以用 Lagrangian + Lagrange multiplier 來解 optimal control 的問題。因此最後可以化簡成 Hamiltonian optimization with solution state (x) and conjugate costate (p) 的結果。 

 

(這和 convex optimization 的 primal and dual problems 是否類似?  state x -> primal;  conjugate cost ate -> dual?)

NewImage

NewImage

 

以下就是 Pontryagin maximum principle, 可以說是從 Hamiltonian mechanics 得到的靈感。

NewImage

Calculus of variations: Lagrangian + Lagrange multiplier –> Pontryagin maximum principle -> convex optimization? (steepest coordinate descending method, x and p coordinate, primal and dual optimization) 

 

到底 PMP 是 Pontryagin Maximum Principle 或是 Pontryagin Minimum Prinicple?

不同的文獻有的是 Pontryagin Maximum principle, 有的是 Pontryagin Minimum principle.  Pontryagin principle 是 apply 在 control Hamiltonian.   其實這和兩個定義有關:

1.  Payoff function or loss function tau or Lagrangian 是 maximum or minimum.  一般而言 payoff function 是找 maximum, loss function 是 minimum,  Lagrangian 是 minimum (least action principle).

2.  H 的定義是 :  H = p*f – L  或是  H = p*f + L.   如果是 “-“,  H 和 L 就是反向。如果是 “+”, H 和 L 就是同向。

For example, 如果 L 是 minimum (Lagrangian convention),  H 是和 L 反向,  那就是 Maximum Principle [ref]

如果 L 是 maximum payoff function,  H 和 L 同向,那也是 Maximum Principle [ref]

如果 L 是 minimum loss function,  H 和 L 同向,就變成 Minimum Principle [ref]

我最 prefer 第一種定義,因為直接和 Physics 中的 Lagrangian and Hamiltonian mechanics 定義一致。不過在 physics 只有 least action principle (Lagrangian)。Hamiltonian 在 conservative field 基本上是 constant.  How about motion in damped/dissipation field? 

  

Some Examples

Example 0:  u = x_dot  回到 Lagrangian mechanics and Hamiltonian mechanics case

Conservation field, Noether theorem, etc.  TBA

 

Example 1a: Linear quadratic regulator (LQR) 

quadratic cost functional

J=\tfrac{1}{2} \mathbf{x}^{\text{T}}(t_f)\mathbf{S}_f\mathbf{x}(t_f) + \tfrac{1}{2} \int_{t_0}^{t_f} [\,\mathbf{x}^{\text{T}}(t)\mathbf{Q}(t)\mathbf{x}(t) + \mathbf{u}^{\text{T}}(t)\mathbf{R}(t)\mathbf{u}(t)\,]\, \operatorname{d}t

Subject to the linear first-order dynamic constraints

\dot{\mathbf{x}}(t)=\mathbf{A}(t) \mathbf{x}(t) + \mathbf{B}(t) \mathbf{u}(t),

and the initial condition

 \mathbf{x}(t_0) = \mathbf{x}_0

 

A particular form of the LQ problem that arises in many control system problems is that of the linear quadratic regulator (LQR) where all of the matrices (i.e., \mathbf{A}\mathbf{B}, \mathbf{Q}, and \mathbf{R}) are constant, the initial time is arbitrarily set to zero, and the terminal time is taken in the limit t_f\rightarrow\infty (this last assumption is what is known as infinite horizon).

The LQR problem is stated as follows. Minimize the infinite horizon quadratic continuous-time cost functional

J=\tfrac{1}{2} \int_{0}^{\infty}[\,\mathbf{x}^{\text{T}}(t)\mathbf{Q}\mathbf{x}(t) + \mathbf{u}^{\text{T}}(t)\mathbf{R}\mathbf{u}(t)\,]\, \operatorname{d}t

Subject to the linear time-invariant first-order dynamic constraints

\dot{\mathbf{x}}(t)=\mathbf{A} \mathbf{x}(t) + \mathbf{B} \mathbf{u}(t),

and the initial condition

 \mathbf{x}(t_0) = \mathbf{x}_0
 

It has been shown in classical optimal control theory that the LQ (or LQR) optimal control has the feedback form

\mathbf{u}(t)=-\mathbf{K}(t)\mathbf{x}(t)

where \mathbf{K}(t) is a properly dimensioned matrix, given as

\mathbf{K}(t)=\mathbf{R}^{-1}\mathbf{B}^{\text{T}}\mathbf{S}(t),

and \mathbf{S}(t) is the solution of the differential Riccati equation. The differential Riccati equation is given as

\dot{\mathbf{S}}(t) = -\mathbf{S}(t)\mathbf{A}-\mathbf{A}^{\text{T}}\mathbf{S}(t)+\mathbf{S}(t)\mathbf{B}\mathbf{R}^{-1}\mathbf{B}^{\text{T}}\mathbf{S}(t)-\mathbf{Q}

For the infinite horizon LQR problem, the differential Riccati equation is replaced with the algebraic Riccati equation (ARE) given as

\mathbf{0} = -\mathbf{S}\mathbf{A}-\mathbf{A}^{\text{T}}\mathbf{S}+\mathbf{S}\mathbf{B}\mathbf{R}^{-1}\mathbf{B}^{\text{T}}\mathbf{S}-\mathbf{Q}

 

Example 1b: Linear quadratic regulator (LQR) with cross term

基本上是原來 LQR 再加上 2 x(t)’ N u(t) 的 cross term 如下。N=0 變回 LQR.

NewImage

Solution 如下:

NewImage

NewImage 

如果設 N=0 and P = S,  Example 1a 化簡成 1b.

Close loop system 

NewImage

 

另一個問題是 control Hamiltonian (What is it?)

 NewImage

Hamiltonian 的 eigenvalues 就是 optimal control 時的 close loop eigenvalues.

NewImage 


Example 2: Production and Consumption (Bang-Bang controller)

NewImage

NewImage

NewImage

 

  最佳解是 bang-bang controller, 意即 solution 是 either switch to 1 or switch to 0, 沒有中間值。

NewImage

 

Example 3:  damped 單擺 least time optimization

NewImage

NewImage

 

Example 4:  

NewImage 

 

Example LQR in LQC dynamic system

?? Lagrangian mechanics cannot deal with friction force.

How about optimal control?

Advertisements