allenlu2007

A great WordPress.com site

Unify Framework for VAE and GAN

 

 

 

Reference:

1. Implicit generative models: dual vs. primal approaches

2. The Information-Autoencoding Family: A Lagrangian Perspective on Latent Variable Generative Modeling

3. Training Generative Adversarial Network Via Primal-Dual Subgradient Method

 

太陽底下無新鮮事。Lagrangian Primal-Dual 在 optimisation 的各個領域一再出現。


VAE and GAN 在 Ian Goodfellow 的分類是:

VAE: Maximum Likelihood -> Explicit density -> Approximate density

GAN: Maximum Likelihood -> Implicit density -> Direct

乍看之下除了都是 Maximum Likelihood, 兩者的 approaches 完全不同。這兩個說法都有問題。

1. GAN 並非都是 Likelihood-base, 也能有 Likehood-free (reference 2).

2. VAE and GAN 可視為 Lagrangian primal-dual 的不同面向。

 

NewImage


Reference 1 的 outline:

NewImage

 

VAE: Lagrangian primal formulation to minimize KL-divergence.

GAN and f-GAN: Lagrangian dual formulation to minimize f-divergence (including KL-divergence)

所以 VAE and GAN 是同一 Lagrangian 的兩面 (primal and dual).


Review of Lagrangian Primal and Dual

完整的 Lagrangian primal and dual 可以參考 Stanford Boyd 的 “Convex Optimisation”.

也可以參考前文。


NewImage

Px 可以是 image dataset, for example.

如何讓 PG 趨近 Px?  乍看之下應該 min Df(PG || Px)   Df 是 divergence, 類似距離。

問題是 (1) 我們沒有 Px distribution; (2) 另外要用 NN, 需要引入 latent variable z.


NewImage

PG(x|z) 在 variational autoencoder 其實是 generative model 的 decoder,一般用 NN 來實現。 


先看

NewImage

NewImage

 

 

 NewImage

 Where G is generator.  T is discriminator.

Define V = Ex(T(X)) – Ez [ f*( T(G(Z)) ) ]  =>   min max V(G, D)

 

 

 

NewImage 


KL(Px||PG) = int Px log (Px / PG) dx = int Px log Px – int Px log PG = C – EPx { log PG  }}

VAE 等價 maximise expectation of log likelihood.


NewImage

 

NewImage

 

Mutual information and KL divergence relationship:

NewImage

NewImage

 










 

 

 

 

 

 

 

Advertisements

Lagrangian Primal and Dual

 

Reference:

1. Stanford EE364A lecture note.

2. University of Alberta, “Convex Analysis, Duality and Optimization

 

Lagrange Primal and Dual Theorem 貫穿所有 optimization problem.

幾個問題:

1. 數學的原理

2. 幾何的直覺

3. Lagrangian and Hamiltonian mechanics 的關係


Lagrange primal and dual theorem 基本是 convex optimization + minimax 

 

Convex optimization 標準式是 (EE364A of Stanford):


NewImage

NewImage

NewImage

注意 fo(x) 和不等式 (inequality) constraints 都是 convex functions.

但等式 constraints (equality) 部分是 Affine function (linear function)! 

這在 Lagrangian dual form 非常重要!  才可以保證 dual form 的 objective function (𝜈 function) 是 convex!

 

 

Lagrange Primal Form:

NewImage

解 Lagrangian primal form 是用 Lagrange multiplier method, 就是微分法。一般使用數值方法如 CVX.

 

Lagrange Dual Form:

Dual form 最重要的精神是把 convex optimization 轉換到另一個 domain (ν, λ) 的 convex optimization 問題。

主要是 convex function over ν, λ > 0 是 constraints.  

NewImage



乍看很神奇,如何做到?現成的例子就是 Legendre transform or (Fenchel) conjugate.  

****************************************************************************

f(x) 是 convex function.    F*(ν) = max( νx – f(x) ) or sup( νx – f(x) ).  

F*(v) 也是 convex function. (F*(v))* = f(x), i.e. 

f(x) = max( vx – F*(v) )  or sup( vx – F*(v) )


 f(x) 是 convex function, 可以定義 min f(x)

解 min f(x) 有兩種作法:

1. 微分法:f’(x) = 0  ==> 反推 F*(v=0) = max ( -f(x) ) = -1 * min f(x)

min f(x) = -F*(v=0)


2. minimax method:

min_x f(x) = min_x max_v ( vx – F*(v) )

假設可以交換 min_x and max_v (TBD later)

≈ max_v min_x ( vx – F*(v) ) = max_v { -F*(v) + min_x (vx) }

因為 min_x (vx) = -∞  除了 v = 0!!

所以 max_v { -F*(v) + min_x (vx) } = -F*(v=0) 

****************************************************************************


比較 Legendre conjugate 和 Lagrangian dual function, 差異如下:
1. Legendre conjugate 是 max( vx – f(x) );  和 Lagrangian dual function 的 f(x) 差個負號。
2. Legendre conjugate 的 linear term: vx; 和 Lagrangian dual function 的 linear term: vi hi(x) ~ vi ai’ x 差個 scaling factor.
    還有 hi(x)=0 是 equality constraint; 但在 Legendre transform vx 是 open term.
3. Legendre transform 把 convex function 轉為 convex function (v domain).  Lagrangian dual function 則轉為 concave function!!!!

結論:如果不論 λi, Lagrangian dual function 和 Legendre transform 基本是類似的轉換。但 Legrangian dual function 變為 concave function (due to the negative sign); Legendre transform 仍然是 convex function!
 
因此找 convex optimization in:
Legendre conjugate:   min_x f(x) = -F*(v=0).  推導如上。
Lagrangian dual form: min f(x) = fo(x) =
NewImage

因為 g(λ, ν) 是 concave function.  目標是找 d* = max( g(λ, ν) ), 也是 primal 問題的 lower bound.   p* – d* 稱為 duality gap.
在特定條件下 (i.e. KKT condition), d* = p*  也就是 duality gap = 0
注意 min f(x) ≠ g(ν=0) even when no inequality !!


NewImage

p* ≽ d*

 

Combine Lagrange dual function and Legendre conjugate

雖然 Legendre conjugate 和 Lagrange dual function 不同。可以結合一起如下。
Legendre conjugate 可以簡化 Lagrange dual function.
NewImage
 
 
 

NewImage 

Minimax 就是如何交換 min max f(x,y) —> max min f(x, y)

 

幾何詮釋

切線配 convex function; TBA


Minimax Theorem

Weak Duality (p* ≽ d*)

NewImage

 

Strong Duality (p* = d*)

NewImage

 

Let’s Prove Weak Duality  (p* ≽ d*)

NewImage

p* = min f(x) given constraints = min_x max_uv L(x, u, v) ≽ max_uv min_x L(x, u, v) = max_uv g(u, v) = d*

 

NewImage

NewImage 

重點是 min max L 是解 saddle-point problem!! 

 

 

Saddle Point Problem 


考慮 Lagrangian Mechanics  

L(q, q_dot, t)  and  S = int { L(q, q_dot, t) }  =>  min S = min int L

H(p, q, t) = max (p q_dot – L)  where p = dL/dq_dot

min int L = min int max_p (p q – H) = min_q max_p int (p q – H) ~ max min int (p q – H)   ??   fighting between p and q??


GAN

fighting between G and D


 

 

 

 

 

Probabilistic Generative Model 有什麼用

Reference:

 

Probabilistic generative model 一般的目標是找到 Pdata(X) 的 PDF PG(x).

X 多半是 high dimension space.  實務上 Pdata(X) 有什麼用?

 

1. Sampling:  從 PG(x) 產生 (random) samples for future.  For example, RL or gaming, etc.

2. Estimation:  given {x1, x2, .., xn}, find statistics of the distribution

3. Point-wise likelihood evaluation.  Given x, evaluate the likelihood Q(x).

F-GAN 理論框架

Reference: 

1. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization

2. 李宏毅:  Youtube GAN and

3. 李宏毅:  Improved GAN

 

f-GAN 介紹一個數學框架。Reference 3 表示如下:

NewImage

 

Basic Idea of GAN

1. Game theory combating concept:

NewImage

Start with a normal distribution or uniform distribution, pass to a Generator NN (low dimension to high dimension) 

Step 0: 首先 NN “weak generator” generate some “fake images”, PG(x).  

Step 1: 把 real images, (from Pdata(x)) 和 fake images, pass to a discriminative NN (high dimension to low dimension) and train the discriminative NN.  產生 version 1 的 “weak discriminator”.  

Step 2: 把 NN generator 和 NN discriminator 接在一起。固定 NN discriminator, train NN generator 讓 version 1 discriminator output 1.  此時 “weak generator” 變得更強一點。

重複 step0-2, 先改善 discriminator NN, 再改善 generator NN, …  

直覺:PG(x) —>  Pdata(x) 藉由 D(x) 如下:

NewImage 

 

以下為 Ian Goodfellow 提出的 original GAN.  V 是 cost function.  

對於 D (discriminator), maximize V.   對於 G (generator), minimize V.

Min max V 的拉扯 (saddle point),這是 original GAN 的基本問題。

NewImage

 

 

 F-divergence and Legendre transform/Fenchel conjugate


F-divergence:

NewImage

Df(P||Q) = 0 if P == Q

Df(P||Q) >= 0

NewImage

Excellent Proof!!

 

NewImage

 

Convex function => Use Legendre Transform and Fenchel Conjugate

NewImage 


NewImage


 

NewImage

 

NewImage


明顯 min max V 是一個 saddle point optimization.  

NewImage

 

NewImage 

 Vanilla GAN:

NewImage 

 

 

 


 

f(x) 是 convex function —>  Legendre transform dual  —> min max —> bound

 

Variational Autoencoder 的原理

 

Reference:

Variational autoencoder:

https://jaan.io/what-is-variational-autoencoder-vae-tutorial/

http://kvfrans.com/variational-autoencoders-explained/

https://wiseodd.github.io/techblog/2016/12/10/variational-autoencoder/

Excellent article: https://arxiv.org/pdf/1606.05908.pdf

 

Generative model 的三大代表方式:

1. PixelRNN/PixelCNN; 2. Variational autoencoder; 3. GAN.

PixelRNN/CNN 需要大量的計算。比較適合小規模的問題。

本文聚焦在 VAE 的理論。

NewImage

 

再細分 (unsupervised) probabilistic generative models 又可分為: 

NewImage

 

Variational Autoencoder 

顧名思義,variational autoencoder 是 variation + autoenocder.  

Autoencoder 是用 neural network 做 feature extraction.  剛好對應 Probabilistic graph model (PGF) 的 latent variable 如下。

NewImage


 

Variation 簡單說就是用 Gaussian distribution 近似 latent variable distribution (Z).   如下圖用紅色sample (Z) 近似 X encoder 之後的 variable.

 

X 是 observed output (image, waveform, etc.), 是 unknown and complicated distribution.  經由 nonlinear NN maps X to Z (or Z to X).

NewImage

 

 

NewImage

 

這是 general formula.  P(X|z; θ) 可以是任何 probability distribution.  θ 是 NN decoder network 的 parameters (NN 的 weight)

 

P(X) = Ez ( P(X|z;θ) ) 

 

如果是 conventional autoencoder 的 NN decoder network z to X 是一個 deterministic function, f(z; θ).

 

可視為 P(X|z; θ) 的 special case, 變為 delta function:  δ(X- f(z;θ)).  也就是 P(X|z; θ) 只有在 X = f(z; θ) 才有機率分佈。 

 

 

 

Variational Autoencoder (VAE) 的重點是用 N(f(z; θ), σ) 來 approximate P(X|z;θ)!!!!

 

1. 如果 σ -> 0,  N(f(z; θ), σ) -> δ function!!  變為 conventional autoencoder.  

 

2. 多出來的 σ 可視為 extra flexibility: (a) force z ~ N(0, 1); and (b) z 可以任意選 N(0, 1) 中的 value and generate reasonable output image.

 

3. Make P(X|z) ~ N(f(z; θ), σ) 可以用 gradient ascent to maximize (1).

 


 

From (1):

 

NewImage

 

Problem: 1. need many samples; 2. No good metric for gradient ascent

 

 

 

 

 

此時引入完全反向的 objective

 

From (1), for most z, P(X|z) 是 zero, except within N(f(z; θ), σ).

 

The key idea behind the variational autoencoder is to attempt to sample values of z that are likely to have produced X, and compute P(x) from those!!

 


 

所以我們需要一個 new function Q(z|X).  就是找到 z 可以有 X.  不用 (1) 需要 integrate all z! 
NewImage

of Q(z).   問題是 Q(z) 的 distribution 不是 N(0, 1).  z 的 distribution 才是 N(0, 1).

 

Using the following definition of KL divergence and apply Bayes rule:
NewImage
where Q(z) can be ay distribution.  We can make Q(z) = P(z) to verify it is correct.

 

Now we can make Q(z) as Q(z|X) since Q(z|X) is a legitimate probability distribution.
NewImage
Eq (5) 是 variational autoencoder 的核心! It is a general equality without assumption.
P(X|z) 是 “decoder network”.  但不是 conventional autoencoder 的 deterministic NN decoder network! 
Q(z|X) 自然就是 “encoder network”.  我們希望 Q(z|X) 接近 P(z|X) to make the LHS 近似 log P(X) (log likelihood).

 通常 assume: Q(z|X) = 

NewImage

 

因此 maximum log likelihood 可以用 maximise RHS 達成。

 

RHS 包含兩項:

第一項 maximize E(log P(X|z)),對於 normal distribution 就是 minimise L2 distance of f(z; theta) and X?

第二項 maximize E(D[P(z|X)||P(z)]) 是 minimize 兩個 normal distribution 的 KL divergence. 有 close form. 

NewImage


NewImage


Loss Function for VAE

NewImage

NewImage

因為 D >= 0.  上述 loss function 是 upper bound.


 

generation_loss = mean(square(generated_image - real_image)) 
latent_loss = KL-Divergence(latent_variable, unit_gaussian) 
loss = generation_loss + latent_loss 

進一步展開

# z_mean and z_stddev are two vectors generated by encoder network 
latent_loss = 0.5 * tf.reduce_sum(tf.square(z_mean) + tf.square(z_stddev) - tf.log(tf.square(z_stddev)) - 1,1) 


samples = tf.random_normal([batchsize,n_z],0,1,dtype=tf.float32) 
sampled_z = z_mean + (z_stddev * samples) 

 

 

 

————  The following is to be modified —————– 

(a) Conventional Autoencoder

First check the conventional autoencoder (purely deterministic).

x -> encoder -> low dimension features vectors -> decoder -> x’    

with Loss function = L2(x, x’) + optional regularisation

 

藉著 create bottleneck (encoder conv layer) 得到 latent vector/variable, 也就是 feature vectors.  

Decoder network (deconv) 就變成 generator 可以生成 image.  

問題是我們並不知道或控制 latent vector 的分佈。也無法由此產生新的 images.

 

Neural Network Role in Probabilistic Machine Learning

Ref: https://taweihuang.hpd.io/2017/03/21/mlbayes/

Variational autoencoder:

https://jaan.io/what-is-variational-autoencoder-vae-tutorial/

http://kvfrans.com/variational-autoencoders-explained/

https://wiseodd.github.io/techblog/2016/12/10/variational-autoencoder/

Excellent article: https://arxiv.org/pdf/1606.05908.pdf

 

 

 

One thing always puzzles me in machine learning.

Neural network (NN) is deterministic, essentially a nonlinear mapping/function y = G(x) 

Probabilistic Graph Model (PGF) is probabilistic, estimated various probability (joint, conditional) with graph model.

Why can we solve machine learning problems such as image classification, image generation using both technicals?

 

To be more specific, machine learning problem can be viewed in a general probabilistic framework.  

Machine learning 從一堆已知的 big data 學習,藉此推論新的 data.  基本是 probabilistic problem.

For example, (image) classification problem on the surface is deterministic.  It is more adequate to describe it using probability.   

為什麼 NN, a deterministic mapping, 可以如此廣泛用於 machine learning?

 

我們可以用 discriminative and generative problems/models 來看這個問題。

Discriminative problem is used to estimate posterior probability (and distribution): p(Y|X),  

where Y is the labels and X is the input image or feature vectors.

X is much higher dimension than Y (binary or 10-1000 categories). 

In discriminative model, we estimate p(Y|X) directly.  How to do it using NN?

有兩部分:

(a) inference:  NN 可視為前處理把 X 轉換為 a deterministic and mostly lower dimension vectors, which is suitable for linear classifier (e.g. logistic or softmax).  

NN 並非直接參與 posterior probability (and distribution)!  

而是最後一層的 logistic or softmax 臨門一腳決定 posterior probability (and distribution).  可以 reference https://taweihuang.hpd.io/2017/03/21/mlbayes/ or Andrew paper: http://ai.stanford.edu/~ang/papers/nips01-discriminativegenerative.pdf for logistic regression role in discriminative classifiers. 

(b) training: training 時不關心 global data distribution, 重點在於 training posterior accuracy p(Yi|Xi) as high as possible without overfitting.  Cost function = min (training error) + regularization loss

In summary, in discriminative problem, NN 可視為前處理把 X (very high dimension) distribution maps to a (low dimension) linear separable distribution!  最後一層的 logistic or softmax 臨門一腳決定 posterior probability (and distribution).

 

Generative problem is used to estimate joint probability distribution: P(X) or P(X, Y) or P(X|Y)  or P(Y|X) = P(x,y)/p(x)   

P(X) : E.g. 看過一堆 unlabeled images (Xi) 找出 X distribution in very high dimension or generate a new image based on the distribution (unsupervised learning).

P(X, Y):  E.g. 看過一堆 unlabeled images (Xi, Yi) 找出 (X, Y) joint distribution in very high dimension or generate a new image P(X|Ybased on the label (supervised learning).

在 generative problem 中,NN 扮演什麼角色?剛好和 discriminative problem 相反。是把一個 simple (low dimension) distribution Z (normal distribution or uniform distribution) maps to G(Z) a (very high dimension) complicated distribution Pg(Z) and 盡可能近似 X distribution P(X)!!

NewImage


Generative problem 最困難的部份是 training.  因為很難定義 high dimension (image) distribution’s error based on individual sample (on the contrary, (labeled) discriminative problem 容易定義 low dimension (分類) 的 error based on training samples (individually label and summable)! 


There are two common generative models:

1. Variational approach: variational autoencoder.

 

2. GAN approach.  Define a G and D.  Use GAN to iterative approach the sampled PDF.  

 

Variational approach 

(a) training: 主要是讓 Pg(Z) 盡量接近 P(X).  就是 minimize Pg(Z) and P(X) 的距離 (or divergence)

Cost function = divergence(Pg(Z), P(X))   ==> Wrong!!  問題是我們沒有 P(X) 的 distribution!  

但是我們有 X samples, 因此應該是用  maximal log likelihood of P(X) via optimize θ as follows: 
 
NewImage
這是 general formula.  P(X|z;θ) 可以是任何 probability distribution.  
P(X) = E( P(X|z) ) over z

(1) 可以改寫為 (by applying Bayes rule):
NewImage
Eq (5) 是 variational autoencoder 的核心!
P(X|z) 是 “decoder network”.  但不是 conventional autoencoder 的 deterministic NN decoder network! 
Q(z|X) 自然就是 “encoder network”.  

Cost function 就是 (5) 的 RHS.  
第一項是 reconstruction (or generation) loss.  
Physical insight: variational approach 假設 P(X|z) 是 normal distribution. Take log and minimize expectation over Q 等價 minimize L2 norm 如下圖左的 X -> Q -> μ/Σ -> Sample -> P -> L2 norm. 

第二項是 latent (or regularization) loss.   
Physical insight:  就是限制 Q(z|X) 近似 N(0, 1).  
如下圖左的 X -> Q -> μ/Σ -> Sample -> KL 

BTW, 為了使 back propagation works, 不能有 sample 在 back-prop path.  因此需要轉變成下圖右!!
??? Q(z|X) and P(X|z) 到底是 deterministic function or distribution in encoder/decoder ???
Ans:  if Q and P are deterministic NN, Q(z|X) and P(X|z) 的 PDF 是 delta function.
但 NN 若有 transition probability or random dropout or noise injection, 則 Q(z|X) and P(X|z) 可用 normal distribution 趨近。


NewImage
 
 
 
(b) Interference: generative model 的 inference is trivial.  就是找一個 (new) sample, P(X_new_image) or P(X_new_image | car).  只要由 simple (low dimension) distribution random generate a sample, 經由 NN map to a (high dimension) image or speech. 

NewImage


因此在 generative model inference (in variational approach), NN 扮演的角色是 transform probability distribution.   在 variational inference 只要 decoder NN P, 把 normal distribution maps 成近似 P(X) 的 distribution; from low dimension to high dimension.  這和 discriminative model 剛好相反。  
 
不過 Q (encoder) 把 X distribution map to normal distribution, i.e. high dimension to low dimension.  非常類似 discriminative model. 
 
 
In summary, NN roles

Discriminative model:
Training model: NN maps high dimension distribution to low dimension LINEAR SEPARABLE distribution!
Training cost function:   Minimal discriminative error (classification error)!

Inference model:  same as training model.  NN maps high dimension distribution to low dimension linear separable distribution.

Generative model (Variational): 
Training model: Encoder NN maps high dimension to low dimension pre-determined distribution (normal distribution) + decoder NN maps low dimension to high dimension distribution
Training cost function:  maximum likelihood => generation loss + regularization loss

Inference: Decoder NN maps low dimension (normal) distribution to high dimension distribution.

Generative model (GAN):
Training model: Discriminative NN maps high dimension to low dimension binary separable distribution. + 
Generative NN maps low dimension to high dimension distribution
Training cots function:  maximum likelihood => minimize divergence!

Inference: Generative NN maps low dimension (uniform or normal) distribution to high dimension distribution.



 

 

 

註解:

不論是 supervised or unsupervised generative learning, 主要是找出 very high dimension PDF of P(X) or P(X, Y).  也就是 density estimation. 實務上 nobody care distribution!  重要的是 generative samples!



———————  The following sentence is to be modified ——————– 

unlike discriminative problem (many to one or a few), the generative is typically a one to many problem or a harder problem.
The key is to find (a very high dimension) P(X) distribution (not just a probability since it’s in very high dimension, not a simple classification problem).
The final result of a generative problem is typically a 2D graph of P(X, y) or P(X|y).  or lots of pictures with different parameters.

For this type of question, we only need to add a logistic function or softmax function to covert deterministic NN results into probability.

How about the other way around.  If the problem is purely probabilistic.  For example, we want to get/generate the PDF of a random variable (vector).   How can we use NN to rescue?

If’s not difficult, we need to use a latent random variable Z with PDF either normal or uniform distribution or any other.   

We define Y = G(Z) to the final PDF, where G() is implemented using NN (could be a very complicated function mapping).  

The PDF of Y is very complicated and most likely intractable.  

However, we can use learning method to approximate PDF of Y to a given distribution by the following methods.

 

GAN and Lagrangian Mechanics

 

GAN can be regards as f-divergence and Legendre conjugate fighting (or saddle point in Nash equilibrium point)

min max V  where V is the loss function.

 

Can Lagrangian also be regards to similar process?  time and energy?

conjugate p  and q  (position and momentum,  time and energy, angular and angular momentum, etc?) the trajectory is a balance between p and q; E and t?

one direction is the max V, the other is to min V, iteratively?

 

Derivation:

1. Ian Goodfellow’s NIP GAN talk 

2. Prof. Lee’s derivation in GAN

 

 

0.  probability density function is sum to one and positive, but not convex!!

1. Start from f-divergence,  f-divergence is forced to be convex because f is a concept of distance!!!  TBA

2. use Legendre conjugate to find f*(t)

3. Define V and G*= max D   ;  D = min V

How to LEARN the backprop automatically?

So far the backprop is generated by math directly (differentiation) because of min or max the cost function.

Can we automatically figure out or LEARN the backprop?  as if we live in the age prior to calculus??????

 

Use GAN?  or Reinforcement learning?  

 

Why to do this? ???? ===> avoid the explicit backprop using differentiation!!!!!!!!!!!!!

If doing a binary weight network, the new method can LEARN  how to do the backprop automatically!!!!

 

Another neural network to help to do the backprop learning?

Build Speech Command APK from Android Studio

Reference:

1. https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android#prebuilt-components

2. Android Studio

3. Android tensorflow: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/docs_src/mobile/android_build.md

 

Build the demo using Android Studio

Prerequisites

If you haven’t already, do the following two things:

  • Install Android Studio, following the instructions on their website.

  • Clone the TensorFlow repository from Github:

     git clone https://github.com/tensorflow/tensorflow

Building

  1. Open Android Studio, and from the Welcome screen, select Open an existing Android Studio project.

  2. From the Open File or Project window that appears, navigate to and select the tensorflow/examples/android directory from wherever you cloned the TensorFlow Github repo. Click OK.

    If it asks you to do a Gradle Sync, click OK.

    You may also need to install various platforms and tools, if you get errors like “Failed to find target with hash string ‘android-23’ and similar.

     

    Go to Gradle sync window and install missing platform!!!

 

 

Step 2:  

Building in Android Studio using the TensorFlow AAR from JCenter

The simplest way to compile the demo app yourself, and try out changes to the project code is to use AndroidStudio. Simply set this android directory as the project root.

 

 

Building

  1. Open Android Studio, and from the Welcome screen, select Open an existing Android Studio project.

  2. From the Open File or Project window that appears, navigate to and select the tensorflow/examples/android directory from wherever you cloned the TensorFlow Github repo. Click OK.

    If it asks you to do a Gradle Sync, click OK.

    You may also need to install various platforms and tools, if you get errors like “Failed to find target with hash string ‘android-23’ and similar.

  3. Open the build.gradle file (you can go to 1:Project in the side panel and find it under the Gradle Scripts zippy under Android). Look for the nativeBuildSystem variable and set it to none if it isn’t already:

     
     
    // set to 'bazel', 'cmake', 'makefile', 'none'
    def nativeBuildSystem ='none'
  4. Click the Run button (the green arrow) or use Run -> Run ‘android’ from the top menu.

    If it asks you to use Instant Run, click Proceed Without Instant Run.

    Also, you need to have an Android device plugged in with developer options enabled at this point. See here for more details on setting up developer devices.

This installs three apps on your phone that are all part of the TensorFlow Demo. See Android Sample Apps for more information about them.

 

Build Tensorflow from Source Code with GPU

Reference: 

1. Build tensor flow from source 

2. tensorflow 官網

 

前文 install tensorflow 都是用 pip (under anaconda or in shell directly).  

一個問題是 tensorflow/example 和 model/research/tutorial 缺一些 directory.  

或是 speech_commands 無法執行。因此改為 build from source.

 

Step 0: Install desktop sharing (vnc) and emacs and git

* Desktop sharing (with vnc)  的設定可以參考前文或是 How to Setup A Ubuntu Remote Desktop

* sudo apt install emacs

* sudo apt install git

ps. edit /etc/default/locale to change the date format from lzh_TW -> en_US.UTF-8

 

Step 1: Install Ubuntu LTS 16.04 and Nvidia driver

Reference: 

* Install Ubuntu desktop version.  Need to solve the x-win problem when install nvidia driver.

* setup -> software update —> additional driver —> GTX 1080 —> choose Nvidia driver 384.90 to install Nvidia driver.

 

Step 2: Install Nvidia CUDA8 and cuDNN6.0

因為 tensorflow 官網推薦 CUDA8 (not use CUDA9) and cuDNN6.0 

Reference:

* CUDA 8.0 

  1. sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
  2. sudo apt-get update
  3. sudo apt-get install cuda
  4. sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-cublas-performance-update_8.0.61-1_amd64.deb 

$nvcc —version (may need to do   $sudo apt install nvidia-cuds-toolkit)

$nvidia-smi

NewImage

Setup environment variables

export CUDA_HOME=/usr/local/cuda-8.0 
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64 
 
PATH=${CUDA_HOME}/bin:${PATH} 
export PATH

 

To verify:

$ cp -rf /usr/local/cuda-8.0/samples .

$ make

the following error:

/usr/bin/ld: cannot find -lnvcuvid

collect2: error: ld returned 1 exit status

Makefile:381: recipe for target ‘cudaDecodeGL’ failed

make[1]: *** [cudaDecodeGL] Error 1

make[1]: Leaving directory ‘/home/allen/samples/3_Imaging/cudaDecodeGL’

Makefile:52: recipe for target ‘3_Imaging/cudaDecodeGL/Makefile.ph_build’ failed

make: *** [3_Imaging/cudaDecodeGL/Makefile.ph_build] Error 2

 
Solution:
The makefiles of the samples have wrong nvidia-xxx version numbers. Substitute them with: sed -i "s/nvidia-367/nvidia-375/g" `grep "nvidia-367" -r ./ -l` and try to make again.

 

  

* cuDNN: 6.0 

  • Navigate to your <cudnnpath> directory containing cuDNN Debian file.
  • Install the runtime library, for example:
    sudo dpkg -i libcudnn6_7.0.2.43-1+cuda8.0_amd64.deb
  • Install the developer library, for example:
    sudo dpkg -i libcudnn6-dev_7.0.2.43-1+cuda9.0_amd64.deb
  • Install the code samples and the cuDNN Library User Guide, for example:
    sudo dpkg -i libcudnn6-doc_7.0.2.43-1+cuda9.0_amd64.deb

注意在 install cuDNN 後,依照 Nvidia cuDNN installation guide (reference 3) compile mnistCUDNN example.

$ cp -rf /usr/src/cudnn_sample_v6 .

compile the mnistCUDNN

遇到以下錯誤。

NewImage

Solution

Open the file:

/usr/include/cudnn.h

 

And try change the line:

#include “driver_types.h”

to:

#include <driver_types.h>

#include “driver_types.h”

to:
#include <driver_types.h>

 

 

 

-------------------------------- 

在 2.3.1 installing from a Tar File.

Copy the following files into the CUDA Toolkit directory.

$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include

$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64

$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

不過用 2.3.2 installing from a Debian File (as I did).  在以上目錄 (/usr/local/cuda/include and lib) 卻找不到 cudnn or libcudnn files.

反而是在 /usr/include/x86_64-linux-gnu/include and lib, why?

NewImage

 

1.2. https://developer.nvidia.com/cuda-downloads  : CUDA

1.3. http://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html:cudnn

 

 

Step 2: Install Python and related packages

Only one word, use Anaconda.  I use Anaconda2 (Python 2.7.14) to avoid any compatibility problem.

Download Anaconda2, then

$ bash Anaconda2.xxxx.sh

 

Step 3: Build tensorflow from source!

Reference:

https://www.tensorflow.org/install/install_sources

Prepare environment for Linux

Step 3.1:  Install bazel.

Use condo:

conda install -c conda-forge bazel


 

Step 3.2:

$ sudo apt-get install libcupti-dev

 

Step3.3: clone source code

$ git clone https://github.com/tensorflow/tensorflow

同時 clone tensorflow 很多 models.

$ git clone https://github.com/tensorflow/models


 

$cd tensorflow

$ ./configure

set all options

$ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package  
$bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda //tensorflow/tools/pip_package:build_pip_package


 next step:

$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

 

Step 4: Download Android Studio and Build demo 

=> reference to the next article.

 

 

Step 5: Conda OpenCV (v.3.3.0) and Keras (v.2.0.9)

reference: https://anaconda.org/conda-forge/keras

conda install -c conda-forge opencv

conda install -c conda-forge keras