如何估算模型训练T(FL)OPS efficiency

如何估算模型训练T(FL)OPS efficiency
Naive方法
以Torch Vision ResNet50-v1.5为例。
- Step 1: 获取模型的前向理论需求MACs(Multiply–ACcumulate)
  可使用thop得到模型的前向MACS。使用如下代码可得Torch Vision ResNet50-v1.5的前向MACs为4.112G。
  
  from torchvision.models import resnet50 from thop import profile, clever_format import torch model = resnet50() input = torch.randn(1, 3, 224, 224) macs, params = profile(model, inputs=(input,)) print(clever_format([macs, params], "%.3f"))
- Step 2: 估算模型在某个实测性能下每秒需求的T(FL)OPS
  估算公式以OpenAI AI and Compute估算公式为基础：
  
  required_T(FL)OPS = (MACs per forward pass) * (2 (FL)OPs/MAC) * (3 for forward and backward pass) * (number of examples per second)
  
  再由实测性能数据：
  
  accelerator data type bs IPS
  
  V100 FP16 256 1325
  
  V100 FP32 128 303.1
  
  以V100 FP16训练为例，有：
  MACs per forward pass = 4.112G
  number of examples per second = 1325
  required_(FL)OPS = 4.112G * 2 * 3 * 1325 = 32.69 T
  汇总结果为：
  
  accelerator data type bs IPS required T(FL)OPS
  
  V100 FP16 256 1325 32.69
  
  V100 FP32 128 303.1 7.478
- Step 3: 估算模型理论峰值算力利用率
  
  理论峰值算力
  
  理论峰值算力利用率
  
  required_T(FL)OPS / peak_T(FL)OPS
  
  accelerator data type bs IPS required TF(L)OPS peak ratio
  
  V100 FP16 256 1325 32.69 29.2%
  
  V100 FP32 128 303.1 7.478 53%
References
相关阅读:
MySQL补充
 不同操作系统下虚拟环境的搭建
 Python导学基础（二）变量与基础数据类型
 Python导学基础（一）介绍
 KM 算法
 题解-CF1065E Side Transmutations
题解-CF1140E Palindrome-less Arrays
题解-CF677D Vanya and Treasure
splay文艺平衡树
 splay区间操作（bzoj1500）
原文地址：https://www.cnblogs.com/Matrix_Yao/p/15747398.html

accelerator	data type	bs	IPS	required T(FL)OPS
V100	FP16	256	1325	32.69
V100	FP32	128	303.1	7.478

accelerator	data type	bs	IPS	required TF(L)OPS	peak ratio
V100	FP16	256	1325	32.69	29.2%
V100	FP32	128	303.1	7.478	53%

如何估算模型训练T(FL)OPS efficiency

References