Caffe CNN 實例: Cifar-10 影像分類

by allenlu2007

 

前文提到 MNIST 手寫數字辨識。 Caffe 中有另一個例子是 Cifar-10 影像分類。

來自 University of Tronto 80M tiny images (32×32 pixels) 的 subset.  

主要由 Alex Krizhenvsky,  Vinod Nair, and Hinton 收集。

Cifar-10 包含 60K (6 萬) 32×32 pixels 彩色 images.  10大類: 每類有 6000 images.  50K 是 training images.

10K 是 test images.  (overfit for 3 layer CNN?)

Dataset 分為 5 個 training batches 以及 1 個 test batch.  每個 batch 都是 10000 張 images.


10 大類如下圖。所有影像是 mutually exclusive.  Automobile 和 truck 完全分開。

airplane
automobile
bird
cat
deer
dog
frog
horse
ship
truck

 

Baseline performance:  11%-13% error for 2 convolution layers + 2 local layer (same as convolution but no shared weight) + 1 fully connected layer.

[data]
type=data
dataIdx=0
 
[labels]
type=data
dataIdx=1
 
[conv1]
type=conv
inputs=data
channels=3
filters=64
padding=2
stride=1
filterSize=5
neuron=relu
initW=0.0001
partialSum=4
sharedBiases=1
 
[pool1]
type=pool
pool=max
inputs=conv1
start=0
sizeX=3
stride=2
outputsX=0
channels=64
 
[rnorm1]
type=cmrnorm
inputs=pool1
channels=64
size=9
 
[conv2]
type=conv
inputs=rnorm1
filters=64
padding=2
stride=1
filterSize=5
channels=64
neuron=relu
initW=0.01
partialSum=8
sharedBiases=1
 
[rnorm2]
type=cmrnorm
inputs=conv2
channels=64
size=9
 
[pool2]
type=pool
pool=max
inputs=rnorm2
start=0
sizeX=3
stride=2
outputsX=0
channels=64
 
[local3]
type=local
inputs=pool2
filters=64
padding=1
stride=1
filterSize=3
channels=64
neuron=relu
initW=0.04
 
[local4]
type=local
inputs=local3
filters=32
padding=1
stride=1
filterSize=3
channels=64
neuron=relu
initW=0.04
 
[fc10]
type=fc
outputs=10
inputs=local4
initW=0.01
 
[probs]
type=softmax
inputs=fc10
 
[logprob]
type=cost.logreg
inputs=labels,probs
 

 

再來看 Caffe 的 layer defintion.

The CIFAR-10 model is a CNN that composes layers of convolution, pooling, rectified linear unit (ReLU) nonlinearities, and local contrast normalization with a linear classifier on top of it all. We have defined the model in the CAFFE_ROOT/examples/cifar10 directory’s cifar10_quick_train_test.prototxt. 

 

CIFAR10_quick

3 convolution layers + 2 fully connected layer (for classification): 大約 24% error. (13% for baseline)

比較 baseline, caffe_quick 沒有 rnorm (contrast normalization?) layer; 以及少了 2 層 local layers (same as convolution layer but no share weight) 但多了一層 convolution layer; 以及多了一層 fully connected layer.

==> 問題是出在 rnorm? or local layer?  我猜是 rnorm.   ==> 可以試試看。

name: “CIFAR10_quick”
  layers {
  name: “cifar”
  type: DATA
  top: “data”
  top: “label”
  data_param {
  source: “examples/cifar10/cifar10_train_leveldb”
  batch_size: 100
  }
  transform_param {
  mean_file: “examples/cifar10/mean.binaryproto”
  }
  include: { phase: TRAIN }
  }
  layers {
  name: “cifar”
  type: DATA
  top: “data”
  top: “label”
  data_param {
  source: “examples/cifar10/cifar10_test_leveldb”
  batch_size: 100
  }
  transform_param {
  mean_file: “examples/cifar10/mean.binaryproto”
  }
  include: { phase: TEST }
  }
  layers {
  name: “conv1”
  type: CONVOLUTION
  bottom: “data”
  top: “conv1”
  blobs_lr: 1
  blobs_lr: 2
  convolution_param {
  num_output: 32
  pad: 2
  kernel_size: 5
  stride: 1
  weight_filler {
  type: “gaussian”
  std: 0.0001
  }
  bias_filler {
  type: “constant”
  }
  }
  }
  layers {
  name: “pool1”
  type: POOLING
  bottom: “conv1”
  top: “pool1”
  pooling_param {
  pool: MAX
  kernel_size: 3
  stride: 2
  }
  }
  layers {
  name: “relu1”
  type: RELU
  bottom: “pool1”
  top: “pool1”
  }
  layers {
  name: “conv2”
  type: CONVOLUTION
  bottom: “pool1”
  top: “conv2”
  blobs_lr: 1
  blobs_lr: 2
  convolution_param {
  num_output: 32
  pad: 2
  kernel_size: 5
  stride: 1
  weight_filler {
  type: “gaussian”
  std: 0.01
  }
  bias_filler {
  type: “constant”
  }
  }
  }
  layers {
  name: “relu2”
  type: RELU
  bottom: “conv2”
  top: “conv2”
  }
  layers {
  name: “pool2”
  type: POOLING
  bottom: “conv2”
  top: “pool2”
  pooling_param {
  pool: AVE
  kernel_size: 3
  stride: 2
  }
  }
  layers {
  name: “conv3”
  type: CONVOLUTION
  bottom: “pool2”
  top: “conv3”
  blobs_lr: 1
  blobs_lr: 2
  convolution_param {
  num_output: 64
  pad: 2
  kernel_size: 5
  stride: 1
  weight_filler {
  type: “gaussian”
  std: 0.01
  }
  bias_filler {
  type: “constant”
  }
  }
  }
  layers {
  name: “relu3”
  type: RELU
  bottom: “conv3”
  top: “conv3”
  }
  layers {
  name: “pool3”
  type: POOLING
  bottom: “conv3”
  top: “pool3”
  pooling_param {
  pool: AVE
  kernel_size: 3
  stride: 2
  }
  }
  layers {
  name: “ip1”
  type: INNER_PRODUCT
  bottom: “pool3”
  top: “ip1”
  blobs_lr: 1
  blobs_lr: 2
  inner_product_param {
  num_output: 64
  weight_filler {
  type: “gaussian”
  std: 0.1
  }
  bias_filler {
  type: “constant”
  }
  }
  }
  layers {
  name: “ip2”
  type: INNER_PRODUCT
  bottom: “ip1”
  top: “ip2”
  blobs_lr: 1
  blobs_lr: 2
  inner_product_param {
  num_output: 10
  weight_filler {
  type: “gaussian”
  std: 0.1
  }
  bias_filler {
  type: “constant”
  }
  }
  }
  layers {
  name: “accuracy”
  type: ACCURACY
  bottom: “ip2”
  bottom: “label”
  top: “accuracy”
  include: { phase: TEST }
  }
  layers {
  name: “loss”
  type: SOFTMAX_LOSS
  bottom: “ip2”
  bottom: “label”
  top: “loss”
  }

 

 

 

Advertisements