Convolutional Autoencoder for Denoise

by allenlu2007

Reference:  利用卷積自編碼器對圖片進行降噪

Autoencoder: 主要的目的是 reduce dimension/降維, sparsity/稀疏, and (本文) denoise/降噪。

 

BasicAE: 只有一層 hidden layer (e.g. size=64), fully connected to input and output layers, 作為 baseline autoencoder. 

NewImage

test 的結果:

NewImage

 

EasyDAE: 用以上的 auto encoder 做 denoise.  不過 hidden layer size 32.  Denoise 的重點是 training.

Input 是用 noisy image; auto encoder 的 output (encoder+decoder) 是用 clean image 來 training.

 

testing 的結果:

NewImage

 

ConvDAE: 使用 (1) multi-layers; (2) convolution kernel (3×3).  架構如下:

NewImage

非線性 activation function 是用 ReLU, both on encoder and decoder.  (activation=tf.nn.relu)

是為了保持 encoder 和 decoder 的對稱性?

同樣對應 pooling layer (encoder) 的是 upsampling layer (decoder), 也是保持對稱性。

tf.layers.max_pooling (encoder) vs. tf.image.resize_nearest_neighbor (decoder)

 

testing 的結果:

NewImage

 

Code 如後

 

 

ConvDAE

卷積自編碼器

在之前,我們實現了一個簡單的自編碼器來對圖像進行復現。現在我們通過增加捲積層來提高我們自編碼器的復現能力。

import numpy as np
import tensorflow as tf
print("TensorFlow Version: %s" % tf.__version__)
TensorFlow Version: 1.0.0
import matplotlib.pyplot as plt
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', validation_size=0, one_hot=False)
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
img = mnist.train.images[20]
plt.imshow(img.reshape((28, 28)),cmap='jet')
<matplotlib.image.AxesImage at 0x7f8f36d83390>

構造模型

輸入

inputs_ = tf.placeholder(tf.float32, (None, 28, 28, 1), name='inputs_')
targets_ = tf.placeholder(tf.float32, (None, 28, 28, 1), name='targets_')

Encoder

3 layer convolution

conv1 = tf.layers.conv2d(inputs_, 64, (3,3), padding='same', activation=tf.nn.relu)
conv1 = tf.layers.max_pooling2d(conv1, (2,2), (2,2), padding='same')

conv2 = tf.layers.conv2d(conv1, 64, (3,3), padding='same', activation=tf.nn.relu)
conv2 = tf.layers.max_pooling2d(conv2, (2,2), (2,2), padding='same')

conv3 = tf.layers.conv2d(conv2, 32, (3,3), padding='same', activation=tf.nn.relu)
conv3 = tf.layers.max_pooling2d(conv3, (2,2), (2,2), padding='same')

Decoder

conv4 = tf.image.resize_nearest_neighbor(conv3, (7,7))
conv4 = tf.layers.conv2d(conv4, 32, (3,3), padding='same', activation=tf.nn.relu)

conv5 = tf.image.resize_nearest_neighbor(conv4, (14,14))
conv5 = tf.layers.conv2d(conv5, 64, (3,3), padding='same', activation=tf.nn.relu)

conv6 = tf.image.resize_nearest_neighbor(conv5, (28,28))
conv6 = tf.layers.conv2d(conv6, 64, (3,3), padding='same', activation=tf.nn.relu)

logits and outputs

logits_ = tf.layers.conv2d(conv6, 1, (3,3), padding='same', activation=None)

outputs_ = tf.nn.sigmoid(logits_, name='outputs_')

Loss and Optimizer

loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=targets_, logits=logits_)
cost = tf.reduce_mean(loss)

optimizer = tf.train.AdamOptimizer(0.001).minimize(cost)

訓練

sess = tf.Session()
noise_factor = 0.5
epochs = 10
#epochs = 2
batch_size = 128
sess.run(tf.global_variables_initializer())

for e in range(epochs):
    for idx in range(mnist.train.num_examples//batch_size):
        batch = mnist.train.next_batch(batch_size)
        imgs = batch[0].reshape((-1, 28, 28, 1))
        
        # 加入噪聲
        noisy_imgs = imgs + noise_factor * np.random.randn(*imgs.shape)
        noisy_imgs = np.clip(noisy_imgs, 0., 1.)
        batch_cost, _ = sess.run([cost, optimizer],
                           feed_dict={inputs_: noisy_imgs,
                                     targets_: imgs})
        
        print("Epoch: {}/{} ".format(e+1, epochs),
             "Training loss: {:.4f}".format(batch_cost))
('Epoch: 1/10 ', 'Training loss: 0.6948')
('Epoch: 1/10 ', 'Training loss: 0.6551')
('Epoch: 1/10 ', 'Training loss: 0.5925')
('Epoch: 1/10 ', 'Training loss: 0.5083')
('Epoch: 1/10 ', 'Training loss: 0.4955')
('Epoch: 1/10 ', 'Training loss: 0.5577')
('Epoch: 1/10 ', 'Training loss: 0.5016')
('Epoch: 1/10 ', 'Training loss: 0.4584')
('Epoch: 1/10 ', 'Training loss: 0.4388')
('Epoch: 1/10 ', 'Training loss: 0.4718')
….
('Epoch: 10/10 ', 'Training loss: 0.0982')
('Epoch: 10/10 ', 'Training loss: 0.1035')
('Epoch: 10/10 ', 'Training loss: 0.0975')
('Epoch: 10/10 ', 'Training loss: 0.1015')
('Epoch: 10/10 ', 'Training loss: 0.0979')
('Epoch: 10/10 ', 'Training loss: 0.0939')
('Epoch: 10/10 ', 'Training loss: 0.0985')
('Epoch: 10/10 ', 'Training loss: 0.1043')
('Epoch: 10/10 ', 'Training loss: 0.0975')
('Epoch: 10/10 ', 'Training loss: 0.1002')
('Epoch: 10/10 ', 'Training loss: 0.0984')
('Epoch: 10/10 ', 'Training loss: 0.0995')
('Epoch: 10/10 ', 'Training loss: 0.1015')
('Epoch: 10/10 ', 'Training loss: 0.0963')
('Epoch: 10/10 ', 'Training loss: 0.0983')
('Epoch: 10/10 ', 'Training loss: 0.0972')
('Epoch: 10/10 ', 'Training loss: 0.1023')
('Epoch: 10/10 ', 'Training loss: 0.0979')
('Epoch: 10/10 ', 'Training loss: 0.0963')
('Epoch: 10/10 ', 'Training loss: 0.1016')
('Epoch: 10/10 ', 'Training loss: 0.1004')
('Epoch: 10/10 ', 'Training loss: 0.0994')
('Epoch: 10/10 ', 'Training loss: 0.0970')
('Epoch: 10/10 ', 'Training loss: 0.0973')
fig, axes = plt.subplots(nrows=2, ncols=10, sharex=True, sharey=True, figsize=(20,4))
in_imgs = mnist.test.images[10:20]
noisy_imgs = in_imgs + noise_factor * np.random.randn(*in_imgs.shape)
noisy_imgs = np.clip(noisy_imgs, 0., 1.)

reconstructed = sess.run(outputs_,
                         feed_dict={inputs_: noisy_imgs.reshape((10, 28, 28, 1))})

for images, row in zip([noisy_imgs, reconstructed], axes):
    for img, ax in zip(images, row):
        ax.imshow(img.reshape((28, 28)),cmap='jet')
        ax.get_xaxis().set_visible(False)
        ax.get_yaxis().set_visible(False)

fig.tight_layout(pad=0.1)
NewImage
sess.close()
Advertisements