物体識別を行う際、VGG16による転移学習を行いますが、生成されるモデルのサイズが大きすぎるという問題があります。VGG16のモデルサイズが大きくなる原因は、全結合層です。

以下はVGG16のSummaryです。最終段で、25088 * 4096の内積が発生しており、重みで392MB近く消費します。モデルサイズは528MBになります。

N_CATEGORIES = 64
IMAGE_SIZE = 224
input_tensor = Input(shape=(IMAGE_SIZE, IMAGE_SIZE, 3))
original_model = VGG16(weights='imagenet', include_top=True,input_tensor=input_tensor)
original_model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544 
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
predictions (Dense)          (None, 1000)              4097000   
=================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
_________________________________________________________________

VGG16を使用して、転移学習を行う場合、全結合層を置き換えるのが一般的です。64カテゴリで学習を行った場合、モデルサイズは154MBになります。

  N_CATEGORIES = 64
  IMAGE_SIZE = 224
  input_tensor = Input(shape=(IMAGE_SIZE, IMAGE_SIZE, 3))
  base_model = VGG16(weights='imagenet', include_top=False,input_tensor=input_tensor)
  x = base_model.output
  x = Flatten()(x)
  x = Dense(1024, activation='relu')(x)
  predictions = Dense(N_CATEGORIES, activation='softmax')(x)
  model = Model(inputs=base_model.input, outputs=predictions)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 25088)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 1024)              25691136  
_________________________________________________________________
dense_2 (Dense)              (None, 64)                65600     
=================================================================
Total params: 40,471,424
Trainable params: 40,471,424
Non-trainable params: 0
_________________________________________________________________

GlobalAveragePoolingは2014年に提案されました。GlobalAveragePoolingを使用することで、全結合層を省略して、モデルサイズを小さくすることができます。GlobalAveragePoolingは特徴マップをチャンネルごとに平均化するレイヤーです。1x1の畳み込みで認識カテゴリ数分の特徴マップに集約したあと、平均化しsoftmaxをかけることで、全結合層を代用することができます。モデルサイズは56MBになります。

class_activation_mapping
(出典:Global Average Pooling Layers for Object Localization

  N_CATEGORIES = 64
  IMAGE_SIZE = 224
  input_tensor = Input(shape=(IMAGE_SIZE, IMAGE_SIZE, 3))
  base_model = VGG16(weights='imagenet', include_top=False,input_tensor=input_tensor)
  x = base_model.output
  x = Convolution2D(N_CATEGORIES, (1, 1), padding='valid', name='conv10')(x)
  x = Activation('relu', name='relu_conv10')(x)
  x = GlobalAveragePooling2D()(x)
  predictions = Activation('softmax', name='loss')(x)
  model = Model(inputs=base_model.input, outputs=predictions)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
conv10 (Conv2D)              (None, 7, 7, 64)          32832     
_________________________________________________________________
relu_conv10 (Activation)     (None, 7, 7, 64)          0         
_________________________________________________________________
global_average_pooling2d_1 ( (None, 64)                0         
_________________________________________________________________
loss (Activation)            (None, 64)                0         
=================================================================
Total params: 14,747,520
Trainable params: 14,747,520
Non-trainable params: 0
_________________________________________________________________

GlobalAveragePoolingは特徴マップを平均化する機能しかないため、GlobalAveragePoolingへの入力は、自動的にヒートマップが学習されることになります。そのため、画像のどの部分を見てそう判断したかを特徴マップを見て把握できるという特徴があります。この特徴から、内積に比べて過学習も防止すると言われています。

dog_localization
(出典:Global Average Pooling Layers for Object Localization

さらにモデルサイズを小さくしたい場合は、SqueezeNetを使用することになります。畳み込みをチャンネル内とチャンネル間に分解して行うことで、畳込み部分の重みを削減します。モデルサイズは216KBになります。

IMAGE_SIZE=227
import sys
sys.path.append('./keras-squeezenet-master')
from keras_squeezenet import SqueezeNet
input_tensor = Input(shape=(IMAGE_SIZE, IMAGE_SIZE, 3))
base_model = SqueezeNet(weights="imagenet", include_top=False, input_tensor=input_tensor)
x = base_model.output
x = Dropout(0.5, name='drop9')(x)
x = Convolution2D(N_CATEGORIES, (1, 1), padding='valid', name='conv10')(x)
x = Activation('relu', name='relu_conv10')(x)
x = GlobalAveragePooling2D()(x)
predictions = Activation('softmax', name='loss')(x)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_2 (InputLayer)            (None, 227, 227, 3)  0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 113, 113, 64) 1792        input_2[0][0]                    
__________________________________________________________________________________________________
relu_conv1 (Activation)         (None, 113, 113, 64) 0           conv1[0][0]                      
__________________________________________________________________________________________________
pool1 (MaxPooling2D)            (None, 56, 56, 64)   0           relu_conv1[0][0]                 
__________________________________________________________________________________________________
fire2/squeeze1x1 (Conv2D)       (None, 56, 56, 16)   1040        pool1[0][0]                      
__________________________________________________________________________________________________
fire2/relu_squeeze1x1 (Activati (None, 56, 56, 16)   0           fire2/squeeze1x1[0][0]           
__________________________________________________________________________________________________
fire2/expand1x1 (Conv2D)        (None, 56, 56, 64)   1088        fire2/relu_squeeze1x1[0][0]      
__________________________________________________________________________________________________
fire2/expand3x3 (Conv2D)        (None, 56, 56, 64)   9280        fire2/relu_squeeze1x1[0][0]      
__________________________________________________________________________________________________
fire2/relu_expand1x1 (Activatio (None, 56, 56, 64)   0           fire2/expand1x1[0][0]            
__________________________________________________________________________________________________
fire2/relu_expand3x3 (Activatio (None, 56, 56, 64)   0           fire2/expand3x3[0][0]            
__________________________________________________________________________________________________
fire2/concat (Concatenate)      (None, 56, 56, 128)  0           fire2/relu_expand1x1[0][0]       
                                                                 fire2/relu_expand3x3[0][0]       
...
__________________________________________________________________________________________________
fire9/squeeze1x1 (Conv2D)       (None, 13, 13, 64)   32832       fire8/concat[0][0]               
__________________________________________________________________________________________________
fire9/relu_squeeze1x1 (Activati (None, 13, 13, 64)   0           fire9/squeeze1x1[0][0]           
__________________________________________________________________________________________________
fire9/expand1x1 (Conv2D)        (None, 13, 13, 256)  16640       fire9/relu_squeeze1x1[0][0]      
__________________________________________________________________________________________________
fire9/expand3x3 (Conv2D)        (None, 13, 13, 256)  147712      fire9/relu_squeeze1x1[0][0]      
__________________________________________________________________________________________________
fire9/relu_expand1x1 (Activatio (None, 13, 13, 256)  0           fire9/expand1x1[0][0]            
__________________________________________________________________________________________________
fire9/relu_expand3x3 (Activatio (None, 13, 13, 256)  0           fire9/expand3x3[0][0]            
__________________________________________________________________________________________________
fire9/concat (Concatenate)      (None, 13, 13, 512)  0           fire9/relu_expand1x1[0][0]       
                                                                 fire9/relu_expand3x3[0][0]       
__________________________________________________________________________________________________
drop9 (Dropout)                 (None, 13, 13, 512)  0           fire9/concat[0][0]               
__________________________________________________________________________________________________
conv10 (Conv2D)                 (None, 13, 13, 64)   32832       drop9[0][0]                      
__________________________________________________________________________________________________
relu_conv10 (Activation)        (None, 13, 13, 64)   0           conv10[0][0]                     
__________________________________________________________________________________________________
global_average_pooling2d_2 (Glo (None, 64)           0           relu_conv10[0][0]                
__________________________________________________________________________________________________
loss (Activation)               (None, 64)           0           global_average_pooling2d_2[0][0] 
==================================================================================================
Total params: 755,328
Trainable params: 755,328
Non-trainable params: 0
__________________________________________________________________________________________________