【NeuralScale】2020-CVPR-NeuralScale: Efficient Scaling of Neurons for Resource-Constrained Deep Neural Networks-论文阅读

【NeuralScale】2020-CVPR-NeuralScale: Efficient Scaling of Neurons for Resource-Constrained Deep Neural Networks-论文阅读
NeuralScale

2020-CVPR-NeuralScale: Efficient Scaling of Neurons for Resource-Constrained Deep Neural Networks

来源: ChenBong 博客园
- Institute：National Chiao Tung University
- Author：Eugene Lee、Chen-Yi Lee (H40)
- GitHub：https://github.com/eugenelet/NeuralScale
- Citation：3
Introduction

提出了一种按照各层的敏感性, 进行layer-wise的缩放最终达到目标参数量的方法, 区别于uniform的缩放。

Motivation

Contribution

Method

进行 P个 epoch的模型预训练, 在预训练模型的基础上开始迭代剪枝

每次迭代剪枝后, 每一层可以获得一个数据点: (xi_{l}=left{ au, phi_{l} ight}) , 其中 ( au) 是模型总参数量, (phi_{l}) 是第 (l) 层的 filter个数

N次迭代后, 每一层可以获得N个数据点: (oldsymbol{xi}_{l}=left{left{ au^{(n)}, phi_{l}^{(n)} ight}_{n=1}^{N} ight})

迭代filter剪枝直到 filter总数 < 原始 filter总数的 (epsilon=0.05) 时, 结束剪枝

将每一层的数据点 (oldsymbol{xi}_{l}) 画出来, 就得到每一层 filter个数关于总参数量的敏感性曲线:

对曲线进行函数拟合:

(phi_{l}left( au mid alpha_{l}, eta_{l} ight)=alpha_{l} au^{eta_{l}}) ,

(ln phi_{l}left( au mid alpha_{l}, eta_{l} ight)=ln alpha_{l}+eta_{l} ln au)

所有层的 layer-wise filter数量记为: (Phi( au mid Theta)={phi_1, phi_2, ..., phi_l,}) , (Theta={alpha_1, eta_1, alpha_2, eta_2, ..., alpha_l, eta_l})

得到各层的拟合函数 (Phi( au mid Theta)={phi_1, phi_2, ..., phi_l,}) 以后, 为了得到目标参数量 (hat au) 下的 layer-wise filter数量, 只需要将 (hat au) 代入 (Phi(hat au mid Theta)) , 即可获得layer-wise filter数量

但此时的模型的实际总参数量 (h(f(oldsymbol{x} mid oldsymbol{W}, oldsymbol{Phi}(hat{ au} mid oldsymbol{Theta})))) 与目标 (hat au) 存在差距, 作者提出了, 从初始化 ( au=hat au) 开始, 对 ( au) 进行梯度下降, 找到一个合适的 ( au) , 使得模型实际总参数量 (h(f)) 精确等于 (hat au) , 作者将这个过程称为 Architecture Descent

Experiments

Setup
- GPU: single 1080ti
- CIFAR10 / CIFAR100
  - pre-trian: 10epoch
  - 迭代剪枝
  - fine-tune?
    
    300 epochs
    
    lr=0.1, decay by 10 at 100, 200, 250 epoch
    
    weight decay=(5^{-4}) , ≈0.0016
- TinyImageNet
  - pre-trian: 10epoch
  - 迭代剪枝
  - fine-tune?
    
    150 epochs
    
    lr=0.1, decay by 10 at 50, 100 epoch
    
    weight decay=(5^{-4}) , ≈0.0016
Importance of Architecture Descent

横轴表示 ( au) 的SGD迭代次数, 纵轴表示层数, 颜色表示该层的卷积核个数:

Benchmarking of NeuralScale

param vs acc

latency vs acc

main result

Conclusion

Summary

Reference
相关阅读:
java+selenium 3.x的火狐自动化测试环境
 Jmeter+badboy环境搭建
 Linux环境下搭建Tomcat+mysql+jdk环境
 线程池的配置说明
 关于事务的使用规范
 生产事故 java.lang.OutOfMemoryError: GC overhead limit exceeded
linux检查网络运行情况命令
 百万数据迁移的线程分组
 XML报文拼接乱码
 创建数据源、连接数据库
原文地址：https://www.cnblogs.com/chenbong/p/14801135.html

【NeuralScale】2020-CVPR-NeuralScale: Efficient Scaling of Neurons for Resource-Constrained Deep Neural Networks-论文阅读

NeuralScale

Introduction

Motivation

Contribution

Method

Experiments

Setup

Importance of Architecture Descent

Benchmarking of NeuralScale

param vs acc

latency vs acc

main result

Conclusion

Summary

Reference