Variational Autoencoder – VAE

by allenlu2007

Reference: http://ithelp.ithome.com.tw/articles/10188453

先回到原始的 multi-layer fully connected autoencoder, 不是 multilayer convolutional autoencoder.

Autoencoder 本身是 non-probablistic model.  不過加上 variational => be aware! 引入 Gaussian probability density model!

Back to fundmental: probability model pro and con. 

先說 con, 主要是複雜度,不論是了解或是實際計算。pro 則是有比較好的數學 background (based on probability and statistics) 和彈性,可以解釋,處理,模擬更複雜的問題。例如 prediction, missing data, small sample, etc.  一個(不合適的)比擬就是量子力學 vs. 古典力學。

1. —> find the distribution of the source —> can tolerant missing data (application in reconstruction) —> only need a small samples (application in reconstruction and semi-supervised learning)  —> think about how brain works!  Probability based, imagination to patch the missing data!!!

2. —> if know the prior —> bayesian model

 

 

Variational autoencoder 是 probability (generative) model, 而且是假設 Gaussian approximation.  對於 Gaussian pdf, 只需要 mean and covariance.  

NewImage

 

Probabilistic model 可以再細分如下:Variation XXX 多半是 explicit density -> approximate density (Gaussian density).

NewImage

 

在 encoder 一開始的地方和 Autoencoder 一樣,循序的降維.這裡定義了維度從 784 維到 500 維到 200 維(再到 2 維).

Encoder

    W_e_1 = weights([784, 500], "w_encoder_1")
    W.append(W_e_1)
    b_e_1 = bias([500], "b_encoder_1")
    h_e_1 = tf.nn.relu(tf.add(tf.matmul(X, W_e_1),b_e_1))
    
    W_e_2 = weights([500, 200], "w_encoder_2")
    W.append(W_e_2)
    b_e_2 = bias([200], "b_encoder_2")
    h_e_2 = tf.nn.relu(tf.add(tf.matmul(h_e_1, W_e_2),b_e_2))

接下來就是有趣的地方了(就是 variational 部分),再進入 2 維前它把前面經過低維權重輸出的向量複製兩份,來建立 Gaussian,而且一份做成 mean 一份做成 stddev …

再來把它轉成二維的 code layer, Z 是最後的 Gaussian variable

    W_latent = weights([200, n_z], "w_latent")
    W.append(W_latent)
    b_latent = bias([n_z], "b_latent")
    z_mean = tf.add(tf.matmul(h_e_2, W_latent), b_latent)
    z_log_sigma = tf.add(tf.matmul(h_e_2, W_latent), b_latent)
    
    eps = tf.random_normal((batch_size, n_z), 0, 1, dtype = tf.float32)
    Z = tf.add(z_mean, tf.multiply(tf.sqrt(tf.exp(z_log_sigma)), eps))
 

Decoder

    W_d_1 = weights([n_z, 200], "w_decoder_1")
    W.append(W_d_1)
    b_d_1 = bias([200], "b_decoder_1")
    h_d_1 = tf.nn.relu(tf.add(tf.matmul(Z, W_d_1), b_d_1))
    
    W_d_2 = weights([200, 500], "w_decoder_2")
    W.append(W_d_2)
    b_d_2 = bias([500], "b_decoder_2")
    h_d_2 = tf.nn.relu(tf.add(tf.matmul(h_d_1, W_d_2), b_d_2))
    
    W_d_3 = weights([500, 784], "w_decoder_3")
    W.append(W_d_3)
    b_d_3 = bias([784], "b_decoder_3")
    h_d_3 = tf.nn.sigmoid(tf.add(tf.matmul(h_d_2, W_d_3), b_d_3))
    
    X_reconstruct = h_d_3

Cost function

Cost function 包含兩項:reconstruct loss (as expected) 加上特殊的一項 latent_loss.  

reconstruct_loss = -tf.reduce_sum(X * tf.log(1e-10 + X_reconstruct) + (1-X) * tf.log(1e-10 + 1 – X_reconstruct), 1)   

latent_loss = -0.5 * tf.reduce_sum(1 + z_log_sigma – tf.square(z_mean) – tf.exp(z_log_sigma), 1)    

cost = tf.reduce_mean(reconstruct_loss + latent_loss)

結果

我們變動 code layer 的數值來看看對應的 decoder 輸出是什麼

https://i2.wp.com/ithelp.ithome.com.tw/upload/images/20170105/20103494HQm826jBNP.png

 

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot = True)
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
def weights(shape, name):
    initial = tf.truncated_normal(shape = shape, stddev = 0.1)
    return tf.Variable(initial, name)

def bias(shape, name):
    initial = tf.constant(0.1, shape = shape)
    return tf.Variable(initial, name)
 

建立模型

Encoder

tf.reset_default_graph()
sess = tf.InteractiveSession()

n_z = 2

X = tf.placeholder(tf.float32, shape = [None, 784])
batch_size = 50

def build_vae():
    W = []
    W_e_1 = weights([784, 500], "w_encoder_1")
    W.append(W_e_1)
    b_e_1 = bias([500], "b_encoder_1")
    h_e_1 = tf.nn.relu(tf.add(tf.matmul(X, W_e_1),b_e_1))
    
    W_e_2 = weights([500, 200], "w_encoder_2")
    W.append(W_e_2)
    b_e_2 = bias([200], "b_encoder_2")
    h_e_2 = tf.nn.relu(tf.add(tf.matmul(h_e_1, W_e_2),b_e_2))
    
    W_latent = weights([200, n_z], "w_latent")
    W.append(W_latent)
    b_latent = bias([n_z], "b_latent")
    z_mean = tf.add(tf.matmul(h_e_2, W_latent), b_latent)
    z_log_sigma = tf.add(tf.matmul(h_e_2, W_latent), b_latent)
    
    eps = tf.random_normal((batch_size, n_z), 0, 1, dtype = tf.float32)
    Z = tf.add(z_mean, tf.multiply(tf.sqrt(tf.exp(z_log_sigma)), eps))
    
    W_d_1 = weights([n_z, 200], "w_decoder_1")
    W.append(W_d_1)
    b_d_1 = bias([200], "b_decoder_1")
    h_d_1 = tf.nn.relu(tf.add(tf.matmul(Z, W_d_1), b_d_1))
    
    W_d_2 = weights([200, 500], "w_decoder_2")
    W.append(W_d_2)
    b_d_2 = bias([500], "b_decoder_2")
    h_d_2 = tf.nn.relu(tf.add(tf.matmul(h_d_1, W_d_2), b_d_2))
    
    W_d_3 = weights([500, 784], "w_decoder_3")
    W.append(W_d_3)
    b_d_3 = bias([784], "b_decoder_3")
    h_d_3 = tf.nn.sigmoid(tf.add(tf.matmul(h_d_2, W_d_3), b_d_3))
    
    X_reconstruct = h_d_3
    
    reconstruct_loss = -tf.reduce_sum(X * tf.log(1e-10 + X_reconstruct) + (1-X) * tf.log(1e-10 + 1 - X_reconstruct), 1)
    latent_loss = -0.5 * tf.reduce_sum(1 + z_log_sigma - tf.square(z_mean) - tf.exp(z_log_sigma), 1)
    l2_loss = reduce(lambda x, y: x + y, map(lambda x: tf.nn.l2_loss(x), W))
    cost = tf.reduce_mean(reconstruct_loss + latent_loss)
    
    return Z, X_reconstruct, cost
Z, X_reconstruct, loss = build_vae()
optimizer = tf.train.AdamOptimizer(0.001).minimize(loss)

Loss

init_op = tf.global_variables_initializer()
sess.run(init_op)
for i in range(20000):
    batch = mnist.train.next_batch(batch_size)
    if i%100 == 0:
        print("step %d, loss %g"%(i, loss.eval(feed_dict={X:batch[0]})))
    optimizer.run(feed_dict={X: batch[0]})
step 0, loss 605.198
step 100, loss 207.343
step 200, loss 179.588
step 300, loss 203.732
step 400, loss 177.644
step 500, loss 183.692
step 600, loss 178.643
step 700, loss 180.738
step 800, loss 177.984
step 900, loss 170.849
step 1000, loss 190.2
...
step 18900, loss 163.14
step 19000, loss 166.592
step 19100, loss 168.936
step 19200, loss 159.927
step 19300, loss 162.26
step 19400, loss 166.656
step 19500, loss 164.643
step 19600, loss 167.563
step 19700, loss 154.026
step 19800, loss 155.236
step 19900, loss 163.766
d = np.zeros([batch_size,2],dtype='float32')
nx = ny = 20
x_values = np.linspace(-8, 2, nx)
y_values = np.linspace(-8, 2, ny)
canvas = np.empty((28*ny, 28*nx))
for i, yi in enumerate(x_values):
    for j, xi in enumerate(y_values):
        z_mu = np.array([[xi, yi]])
        d[0] = z_mu
        x_mean = sess.run(X_reconstruct, feed_dict={Z: d})
        canvas[(nx-i-1)*28😦nx-i)*28, j*28😦j+1)*28] = x_mean[0].reshape(28, 28)

plt.figure(figsize=(8, 10))
Xi, Yi = np.meshgrid(x_values, y_values)
plt.imshow(canvas, origin="upper", vmin=0, vmax=1,interpolation='none',cmap=plt.get_cmap('gray'))
plt.tight_layout()
 
  
 
Advertisements