MNIST with Autoencoder

by allenlu2007

Reference:

MNIST example using antoencoder : https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/autoencoder.ipynb

Stanford UFLDL: http://ufldl.stanford.edu/tutorial/unsupervised/Autoencoders/

中文翻譯:https://read01.com/JD5Qg5.html

Tensorflow 學習筆記:http://ithelp.ithome.com.tw/articles/10188120

Hinton paper:  Reducing the Dimensionality of Data with Neural Networks

Deep learning 的大師 Hinton 是當初很多研究者放棄 neural network, 他仍然堅持的少數學者之一。

他當初研究的 multi-layer RBM (restricted Boltzmann machine) and autoencoder turns out 都成為日後 deep learning 的重要基礎。

一般提到 deep learning, 大多 credit to big data training and GPU, 往往忽略了 neural network 架構本身以及運作原理。一個簡單的問題,deep learning neural network 是 highly nonlinear, overfit, and lots of local minimum.  Overfit 可以用 regularization, dropout, pruning, 以及 big data 解決或緩解。

但一直困擾我的問題是:如何保證 deep neural network training 時收斂到正確的值,而不陷入 bad local minimum?  難道也是用以上的方法?

It turns out autoencoder 提供一條道路。更有趣的事 autoencoder 是 unsupervised learning, 但 deep learning neural network 一般是 supervised learning problem.  從中可以窺見 unsupervised learning 在 machine learning / deep learning 的重要性, sort of.

What is Autoencoder?

Autoencoder 是一種無監督 (unsupervised) 的訓練方式,也就是說在訓練的時候是不需要給定目標的,那它是怎麼做到的呢? Unsupervised learning 沒有 label (ground truth), 主要是 explore input data structure.  有名的例子像 PCA (principle component analysis) for dimension reduction;  GMM (Gaussian Mixture Model) or K-means for clustering

Autoencoder 可視為一種 dimension reduction 的 unsupervised learning algorithm.  PCA 的基本原理是 maximize input data variance (information/variation) 的 dimension reduction algorithm. 

Autoencoder 的基本原理很有趣:是 dimension reduction (encoder) 後再 dimension expansion to generate the original data (decoder).  Algorithm 是在 minimize input data and encoder/decoder data 的差異。最終我們只用 encoder, 稱為 autoencoder.   

Hinton autoencoder 的開山 paper 發表在大名鼎鼎的 Science.  其中的例子就是 MNIST.  它的概念可以從以下的圖來說明.可以看到 autoencoder 有兩個部分 Encoder 以及 Decoder,而中間有一個重疊的 code layer.模型輸入是從 Encoder 輸入,(模擬的) 輸出則是 Decoder 輸出.而 Encoder 中有許多的 hidden layer (隱層),其中節點數量是遞減的 (dimension reduction);相對應的 Decoder 則是遞增的 (dimension expansion).

NewImage

在這樣對稱的網路結構中,給定的訓練條件是輸出越接近輸入越好,

Input face dataset (Olivetti face data set: http://www.cs.nyu.edu/~roweis/data.html)

Grayscale faces 8-bit [0-255], 64×64 size, 400 total images.

所以 input image dimension 是 64×64 = 4096.

Encoder dimension 遞減:4096 -> 2000 -> 1000 -> 500 -> 30.  約略 /2 per stage (except last stage)

Decoder dimension 遞增:30 -> 500 -> 1000 -> 2000 -> 4096.  約略 x2 per stage (except first stage)

Hinton paper 是寫 625 -> 2000 -> 1000 -> 500 -> 30 (Why 625?)

Training 的方式後敘。Autoencoder 基本上是 dimension reduction from 4096 to 30!

可以直接和 PCA 同樣的 dimension reduction from 4096 to 30 比較。結果如下:

第一行是原始 face image (4096D),

第二行是 autoencoder 的降維結果 (30D),

第三行是 PCA 的降維結果 (30D).

NewImage

很明顯 autoencoder 的降維效果比 PCA 好很多。可能的原因:

1. Autoencoder 是 nonlinear dimension reduction, keep as much feature?  PCA (without kernel trick) 是 linear dimension reduction.  PCA 雖然儘量 maximize variance, 但一定會 loss some variance (information).   Autoencoder 可以保留比較多的 information?

2. Autoencoder 的 training 方式似乎可以避免陷入 bad local minimum.  這也提供 deep learning neural network training 的方法。

MNIST Autoencoder

MNIST 的 image size 是 28×28 = 784

Autoencoder dimension: 784 -> 1000 -> 500 -> 250 -> 30

Decoder dimension: 30 -> 250 -> 500 -> 1000 -> 784

第一行是原始 MNIST image (784D),

第二行是 autoencoder 的降維結果 (30D),

第三行是 logistic (kernel) PCA 的降維結果 (30D).

第四行是 standard (linear) PCA 的降維結果 (30D).

同樣 autoencoder 的效果最好。

NewImage

直接看 autoencoder 降維到 2D (784->1000->500->250->2).

下圖 A 是 PCA,圖 B 是 autoencoder 的 2D 結果。很明顯 autoencoder 即使在 dimension reduction to 2D 仍然 keep original data set feature.   在 autoencoder (unsupervised learning) 之後再加上 classifier (e.g. softmax),應該比只用 softmax 要好。見下段。準確率從 91% 改善到 97.9%!  不過 computation 的代價也很高, 每層都是 fully connected RBM (sigmoid).  這也催生 convolution neural network.

NewImage

下例是一個 MNIST autoencoder 簡化例子。

(1) 784 -> 256 -> 128.  每一層都是 fully connected RBM with Sigmoid function.

這和 deep learning 的不同 (a) fully connected > block convolution kernel (3x3, 5x5, etc.) with shared factors

(b) Sigmoid > ReLU. 

(2) Minimize input data 和 encode/decode data 差異如下:  Minimize (X-decoder(encoder(X)))^2

這部分 back propagation 和 deep learning 一樣。

 
%matplotlib inline
from __future__ import division, print_function, absolute_import

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

# Import MINST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
# Parameters
learning_rate = 0.01
training_epochs = 20
batch_size = 256
display_step = 1
examples_to_show = 10

# Network Parameters
n_hidden_1 = 256 # 1st layer num features
n_hidden_2 = 128 # 2nd layer num features
n_input = 784 # MNIST data input (img shape: 28*28)

# tf Graph input (only pictures)
X = tf.placeholder("float", [None, n_input])

weights = {
    'encoder_h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    'encoder_h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
    'decoder_h1': tf.Variable(tf.random_normal([n_hidden_2, n_hidden_1])),
    'decoder_h2': tf.Variable(tf.random_normal([n_hidden_1, n_input])),
}
biases = {
    'encoder_b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'encoder_b2': tf.Variable(tf.random_normal([n_hidden_2])),
    'decoder_b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'decoder_b2': tf.Variable(tf.random_normal([n_input])),
}
# Building the encoder
def encoder(x):
    # Encoder Hidden layer with sigmoid activation #1
    layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x, weights['encoder_h1']),
                                   biases['encoder_b1']))
    # Encoder Hidden layer with sigmoid activation #2
    layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['encoder_h2']),
                                   biases['encoder_b2']))
    return layer_2


# Building the decoder
def decoder(x):
    # Decoder Hidden layer with sigmoid activation #1
    layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x, weights['decoder_h1']),
                                   biases['decoder_b1']))
    # Decoder Hidden layer with sigmoid activation #2
    layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['decoder_h2']),
                                   biases['decoder_b2']))
    return layer_2

# Construct model
encoder_op = encoder(X)
decoder_op = decoder(encoder_op)

# Prediction
y_pred = decoder_op
# Targets (Labels) are the input data.
y_true = X

# Define loss and optimizer, minimize the squared error
cost = tf.reduce_mean(tf.pow(y_true - y_pred, 2))
optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(cost)

# Initializing the variables
init = tf.global_variables_initializer()
# Launch the graph
# Using InteractiveSession (more convenient while using Notebooks)
sess = tf.InteractiveSession()
sess.run(init)

total_batch = int(mnist.train.num_examples/batch_size)
# Training cycle
for epoch in range(training_epochs):
    # Loop over all batches
    for i in range(total_batch):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        # Run optimization op (backprop) and cost op (to get loss value)
        _, c = sess.run([optimizer, cost], feed_dict={X: batch_xs})
    # Display logs per epoch step
    if epoch % display_step == 0:
        print("Epoch:", '%04d' % (epoch+1),
              "cost=", "{:.9f}".format(c))

print("Optimization Finished!")

# Applying encode and decode over test set
encode_decode = sess.run(
    y_pred, feed_dict={X: mnist.test.images[:examples_to_show]})
# Compare original images with their reconstructions
f, a = plt.subplots(2, 10, figsize=(10, 2))
for i in range(examples_to_show):
    a[0][i].imshow(np.reshape(mnist.test.images[i], (28, 28)), cmap="seismic")
    a[1][i].imshow(np.reshape(encode_decode[i], (28, 28)), cmap="seismic")
f.show()
plt.draw()
Epoch: 0001 cost= 0.216608047
Epoch: 0002 cost= 0.179501235
Epoch: 0003 cost= 0.157594383
Epoch: 0004 cost= 0.151753381
Epoch: 0005 cost= 0.139330998
Epoch: 0006 cost= 0.136263803
Epoch: 0007 cost= 0.133583695
Epoch: 0008 cost= 0.124246269
Epoch: 0009 cost= 0.119985826
Epoch: 0010 cost= 0.113663390
Epoch: 0011 cost= 0.110967830
Epoch: 0012 cost= 0.107760429
Epoch: 0013 cost= 0.108637080
Epoch: 0014 cost= 0.107707120
Epoch: 0015 cost= 0.105424456
Epoch: 0016 cost= 0.103843495
Epoch: 0017 cost= 0.101778753
Epoch: 0018 cost= 0.103673883
Epoch: 0019 cost= 0.096094199
Epoch: 0020 cost= 0.097751543
Optimization Finished!

NewImage

Complete code: 2-layer (784->200->200) autoencoder + softmax

http://eric-yuan.me/ufldl-exercise-deep-networks/

http://deeplearning.stanford.edu/wiki/index.php/Exercise:_Implement_deep_networks_for_digit_classification

% stackedAEExercise.m
%% CS294A/CS294W Stacked Autoencoder Exercise
% Instructions
% ------------
% 
% This file contains code that helps you get started on the
% sstacked autoencoder exercise. You will need to complete code in
% stackedAECost.m
% You will also need to have implemented sparseAutoencoderCost.m and 
% softmaxCost.m from previous exercises. You will need the initializeParameters.m
% loadMNISTImages.m, and loadMNISTLabels.m files from previous exercises.
% 
% For the purpose of completing the assignment, you do not need to
% change the code in this file. 
%
%%======================================================================
%% STEP 0: Here we provide the relevant parameters values that will
% allow your sparse autoencoder to get good filters; you do not need to 
% change the parameters below.   inputSize = 28 * 28;
numClasses = 10;
hiddenSizeL1 = 200; % Layer 1 Hidden Size
hiddenSizeL2 = 200; % Layer 2 Hidden Size
sparsityParam = 0.1; % desired average activation of the hidden units.
% (This was denoted by the Greek alphabet rho, which looks like a lower-case "p",
% in the lecture notes). 
lambda = 3e-3; % weight decay parameter 
beta = 3; % weight of sparsity penalty term 
maxIter = 200;
%%======================================================================
%% STEP 1: Load data from the MNIST database
%
% This loads our training data from the MNIST database files.
% Load MNIST database files
trainData = loadMNISTImages('mnist/train-images-idx3-ubyte');
trainLabels = loadMNISTLabels('mnist/train-labels-idx1-ubyte');
trainLabels(trainLabels == 0) = 10; % Remap 0 to 10 since our labels need to start from 1
%%======================================================================
%% STEP 2: Train the first sparse autoencoder
% This trains the first sparse autoencoder on the unlabelled STL training
% images.
% If you've correctly implemented sparseAutoencoderCost.m, you don't need
% to change anything here.
% Randomly initialize the parameters
sae1Theta = initializeParameters(hiddenSizeL1, inputSize);   %% ---------------------- YOUR CODE HERE ---------------------------------
% Instructions: Train the first layer sparse autoencoder, this layer has
% an hidden size of "hiddenSizeL1"
% You should store the optimal parameters in sae1OptTheta   % Use minFunc to minimize the function
addpath minFunc/
options.Method = 'lbfgs'; % Here, we use L-BFGS to optimize our cost
% function. Generally, for minFunc to work, you
% need a function pointer with two outputs: the
% function value and the gradient. In our problem,
% sparseAutoencoderCost.m satisfies this.
options.maxIter = maxIter;% Maximum number of iterations of L-BFGS to run 
options.display = 'on';   [sae1OptTheta, cost] = minFunc( @(p) sparseAutoencoderCost(p, ...
inputSize, hiddenSizeL1, ...
lambda, sparsityParam, ...
beta, trainData), ...
sae1Theta, options);
% -------------------------------------------------------------------------
%%======================================================================
%% STEP 2: Train the second sparse autoencoder
% This trains the second sparse autoencoder on the first autoencoder
% featurse.
% If you've correctly implemented sparseAutoencoderCost.m, you don't need
% to change anything here.   [sae1Features] = feedForwardAutoencoder(sae1OptTheta, hiddenSizeL1, ...
inputSize, trainData);   % Randomly initialize the parameters
sae2Theta = initializeParameters(hiddenSizeL2, hiddenSizeL1);   %% ---------------------- YOUR CODE HERE ---------------------------------
% Instructions: Train the second layer sparse autoencoder, this layer has
% an hidden size of "hiddenSizeL2" and an inputsize of
% "hiddenSizeL1"
%
% You should store the optimal parameters in sae2OptTheta   [sae2OptTheta, cost] = minFunc( @(p) sparseAutoencoderCost(p, ...
hiddenSizeL1, hiddenSizeL2, ...
lambda, sparsityParam, ...
beta, sae1Features), ...
sae2Theta, options);
% -------------------------------------------------------------------------
%%======================================================================
%% STEP 3: Train the softmax classifier
% This trains the sparse autoencoder on the second autoencoder features.
% If you've correctly implemented softmaxCost.m, you don't need
% to change anything here.   [sae2Features] = feedForwardAutoencoder(sae2OptTheta, hiddenSizeL2, ...
hiddenSizeL1, sae1Features);   % Randomly initialize the parameters
saeSoftmaxTheta = 0.005 * randn(hiddenSizeL2 * numClasses, 1);   %% ---------------------- YOUR CODE HERE ---------------------------------
% Instructions: Train the softmax classifier, the classifier takes in
% input of dimension "hiddenSizeL2" corresponding to the
% hidden layer size of the 2nd layer.
%
% You should store the optimal parameters in saeSoftmaxOptTheta 
%
% NOTE: If you used softmaxTrain to complete this part of the exercise,
% set saeSoftmaxOptTheta = softmaxModel.optTheta(:);
softmaxModel = struct;
options2.maxIter = maxIter;
lambda = 1e-4;
softmaxModel = softmaxTrain(hiddenSizeL2, numClasses, lambda, ...
sae2Features, trainLabels, options2);
saeSoftmaxOptTheta = softmaxModel.optTheta(:);
% -------------------------------------------------------------------------
%%======================================================================
%% STEP 5: Finetune softmax model   % Implement the stackedAECost to give the combined cost of the whole model
% then run this cell.   % Initialize the stack using the parameters learned
stack = cell(2,1);
stack{1}.w = reshape(sae1OptTheta(1:hiddenSizeL1*inputSize), ...
hiddenSizeL1, inputSize);
stack{1}.b = sae1OptTheta(2*hiddenSizeL1*inputSize+1:2*hiddenSizeL1*inputSize+hiddenSizeL1);
stack{2}.w = reshape(sae2OptTheta(1:hiddenSizeL2*hiddenSizeL1), ...
hiddenSizeL2, hiddenSizeL1);
stack{2}.b = sae2OptTheta(2*hiddenSizeL2*hiddenSizeL1+1:2*hiddenSizeL2*hiddenSizeL1+hiddenSizeL2);   % Initialize the parameters for the deep model
[stackparams, netconfig] = stack2params(stack);
stackedAETheta = [ saeSoftmaxOptTheta ; stackparams ];   %% ---------------------- YOUR CODE HERE ---------------------------------
% Instructions: Train the deep network, hidden size here refers to the '
% dimension of the input to the classifier, which corresponds 
% to "hiddenSizeL2".
%
%
[stackedAEOptTheta, cost] = minFunc( @(p) stackedAECost(p, ...
inputSize, hiddenSizeL2, ...
numClasses, netconfig, ...
lambda, trainData, ...
trainLabels), ...
stackedAETheta, options);
% -------------------------------------------------------------------------
%%======================================================================
%% STEP 6: Test 
% Instructions: You will need to complete the code in stackedAEPredict.m
% before running this part of the code
%   % Get labelled test images
% Note that we apply the same kind of preprocessing as the training set
testData = loadMNISTImages('mnist/t10k-images-idx3-ubyte');
testLabels = loadMNISTLabels('mnist/t10k-labels-idx1-ubyte');   testLabels(testLabels == 0) = 10; % Remap 0 to 10   [pred] = stackedAEPredict(stackedAETheta, inputSize, hiddenSizeL2, ...
numClasses, netconfig, testData);   acc = mean(testLabels(:) == pred(:));
fprintf('Before Finetuning Test Accuracy: %0.3f%%\n', acc * 100);   [pred] = stackedAEPredict(stackedAEOptTheta, inputSize, hiddenSizeL2, ...
numClasses, netconfig, testData);   acc = mean(testLabels(:) == pred(:));
fprintf('After Finetuning Test Accuracy: %0.3f%%\n', acc * 100);   % Accuracy is the proportion of correctly classified images
% The results for our implementation were:
%
% Before Finetuning Test Accuracy: 87.7%
% After Finetuning Test Accuracy: 97.6%
%
% If your values are too low (accuracy less than 95%), you should check 
% your code for errors, and make sure you are training on the 
% entire data set of 60000 28x28 training images 
% (unless you modified the loading code, this should be the case)
% stackedAEPredict.m
function [pred] = stackedAEPredict(theta, inputSize, hiddenSize, numClasses, netconfig, data)   % stackedAEPredict: Takes a trained theta and a test data set,
% and returns the predicted labels for each example.   % theta: trained weights from the autoencoder
% visibleSize: the number of input units
% hiddenSize: the number of hidden units *at the 2nd layer*
% numClasses: the number of categories
% data: Our matrix containing the training data as columns. So, data(:,i) is the i-th training example.   % Your code should produce the prediction matrix 
% pred, where pred(i) is argmax_c P(y(c) | x(i)).   %% Unroll theta parameter   % We first extract the part which compute the softmax gradient
softmaxTheta = reshape(theta(1:hiddenSize*numClasses), numClasses, hiddenSize);   % Extract out the "stack"
stack = params2stack(theta(hiddenSize*numClasses+1:end), netconfig);   %% ---------- YOUR CODE HERE --------------------------------------
% Instructions: Compute pred using theta assuming that the labels start 
% from 1.
[nfeatures, nsamples] = size(data);
depth = numel(stack);
a = cell(depth + 1, 1);
a{1} = data;   for layer = 1 : depth
a{layer + 1} = bsxfun(@plus, stack{layer}.w * a{layer}, stack{layer}.b);
a{layer + 1} = sigmoid(a{layer + 1});
end   M = softmaxTheta * a{depth + 1};
M = bsxfun(@minus, M, max(M));
p = bsxfun(@rdivide, exp(M), sum(exp(M)));
[Max, pred] = max(log(p));   % -----------------------------------------------------------   end   % You might find this useful
function sigm = sigmoid(x)
sigm = 1 ./ (1 + exp(-x));
end
 

 

% stackedAECost.m
function [ cost, grad ] = stackedAECost(theta, inputSize, hiddenSize, ...
numClasses, netconfig, ...
lambda, data, labels)   % stackedAECost: Takes a trained softmaxTheta and a training data set with labels,
% and returns cost and gradient using a stacked autoencoder model. Used for
% finetuning.   % theta: trained weights from the autoencoder
% visibleSize: the number of input units
% hiddenSize: the number of hidden units *at the 2nd layer*
% numClasses: the number of categories
% netconfig: the network configuration of the stack
% lambda: the weight regularization penalty
% data: Our matrix containing the training data as columns. So, data(:,i) is the i-th training example. 
% labels: A vector containing labels, where labels(i) is the label for the
% i-th training example   %% Unroll softmaxTheta parameter   % We first extract the part which compute the softmax gradient
softmaxTheta = reshape(theta(1:hiddenSize*numClasses), numClasses, hiddenSize);   % Extract out the "stack"
stack = params2stack(theta(hiddenSize*numClasses+1:end), netconfig);   % You will need to compute the following gradients
softmaxThetaGrad = zeros(size(softmaxTheta));
stackgrad = cell(size(stack));
for d = 1:numel(stack)
stackgrad{d}.w = zeros(size(stack{d}.w));
stackgrad{d}.b = zeros(size(stack{d}.b));
end   cost = 0; % You need to compute this   % You might find these variables useful
M = size(data, 2);
groundTruth = full(sparse(labels, 1:M, 1));   %% --------------------------- YOUR CODE HERE -----------------------------
% Instructions: Compute the cost function and gradient vector for 
% the stacked autoencoder.
%
% You are given a stack variable which is a cell-array of
% the weights and biases for every layer. In particular, you
% can refer to the weights of Layer d, using stack{d}.w and
% the biases using stack{d}.b . To get the total number of
% layers, you can use numel(stack).
%
% The last layer of the network is connected to the softmax
% classification layer, softmaxTheta.
%
% You should compute the gradients for the softmaxTheta,
% storing that in softmaxThetaGrad. Similarly, you should
% compute the gradients for each layer in the stack, storing
% the gradients in stackgrad{d}.w and stackgrad{d}.b
% Note that the size of the matrices in stackgrad should
% match exactly that of the size of the matrices in stack.
%
[nfeatures, nsamples] = size(data);
depth = numel(stack);
a = cell(depth + 1, 1);
a{1} = data;
for layer = 1 : depth
a{layer + 1} = bsxfun(@plus, stack{layer}.w * a{layer}, stack{layer}.b);
a{layer + 1} = sigmoid(a{layer + 1});
end
M = softmaxTheta * a{depth + 1};
M = bsxfun(@minus, M, max(M));
p = bsxfun(@rdivide, exp(M), sum(exp(M)));
cost = - sum(sum(groundTruth .* log(p))) ./ nsamples;
cost = cost + sum(sum(softmaxTheta .^ 2)) .* lambda ./ 2;
temp = (groundTruth - p) * a{depth+1}';
temp = - temp ./ nsamples;
softmaxThetaGrad = temp + lambda .* softmaxTheta;
delta = cell(depth + 1);
delta{depth+1} = -(softmaxTheta' * (groundTruth - p)) .* dsigmoid(a{depth+1});
for layer = depth : -1 : 2
delta{layer} = (stack{layer}.w' * delta{layer+1}) .* dsigmoid(a{layer});
end
for layer = depth : -1 : 1
stackgrad{layer}.w = delta{layer+1} * a{layer}';
stackgrad{layer}.b = sum(delta{layer+1}, 2);
stackgrad{layer}.w = stackgrad{layer}.w ./ nsamples;
stackgrad{layer}.b = stackgrad{layer}.b ./ nsamples;
end
% -------------------------------------------------------------------------   %% Roll gradient vector
grad = [softmaxThetaGrad(:) ; stack2params(stackgrad)];   end   % You might find this useful
function sigm = sigmoid(x)
sigm = 1 ./ (1 + exp(-x));
end   %-------------------------------------------------------------------
% This function calculate dSigmoid
%
function dsigm = dsigmoid(a)
dsigm = a .* (1.0 - a);   end

      197        203     1.00000e+00     4.54840e-03     2.60206e-02
        198        204     1.00000e+00     4.52363e-03     2.42239e-02
        199        205     1.00000e+00     4.49705e-03     2.45634e-02
        200        206     1.00000e+00     4.45694e-03     2.69277e-02
Exceeded Maximum Number of Iterations
Before Finetuning Test Accuracy: 10.090%
After Finetuning Test Accuracy: 97.230%

How is Autoencoder?

Autoencoder 一般分為三個階段: pretraining; unrolling; fine-tuning.

Pretraining: 切分為斷開的 RBM 先做 pretraining.  再做 unrolling 形成完整的 multi-layer encoder/decoder.  最後做 fine-tuning.

NewImage

Advertisements