• deeplearning模型库


    deeplearning模型库

    1. 图像分类

    数据集:ImageNet1000类

    1.1  量化

     

    分类模型Lite时延(ms)

     

    设备

    模型类型

    压缩策略

    armv7 Thread 1

    armv7 Thread 2

    armv7 Thread 4

    armv8 Thread 1

    armv8 Thread 2

    armv8 Thread 4

    高通835

    MobileNetV1

    FP32 baseline

    96.1942

    53.2058

    32.4468

    88.4955

    47.95

    27.5189

    高通835

    MobileNetV1

    quant_aware

    60.8186

    32.1931

    16.4275

    56.4311

    29.5446

    15.1053

    高通835

    MobileNetV1

    quant_post

    60.5615

    32.4016

    16.6596

    56.5266

    29.7178

    15.1459

    高通835

    MobileNetV2

    FP32 baseline

    65.715

    38.1346

    25.155

    61.3593

    36.2038

    22.849

    高通835

    MobileNetV2

    quant_aware

    48.3655

    30.2021

    21.9303

    46.1487

    27.3146

    18.3053

    高通835

    MobileNetV2

    quant_post

    48.3495

    30.3069

    22.1506

    45.8715

    27.4105

    18.2223

    高通835

    ResNet50

    FP32 baseline

    526.811

    319.6486

    205.8345

    506.1138

    335.1584

    214.8936

    高通835

    ResNet50

    quant_aware

    475.4538

    256.8672

    139.699

    461.7344

    247.9506

    145.9847

    高通835

    ResNet50

    quant_post

    476.0507

    256.5963

    139.7266

    461.9176

    248.3795

    149.353

    高通855

    MobileNetV1

    FP32 baseline

    33.5086

    19.5773

    11.7534

    31.3474

    18.5382

    10.0811

    高通855

    MobileNetV1

    quant_aware

    36.7067

    21.628

    11.0372

    14.0238

    8.199

    4.2588

    高通855

    MobileNetV1

    quant_post

    37.0498

    21.7081

    11.0779

    14.0947

    8.1926

    4.2934

    高通855

    MobileNetV2

    FP32 baseline

    25.0396

    15.2862

    9.6609

    22.909

    14.1797

    8.8325

    高通855

    MobileNetV2

    quant_aware

    28.1583

    18.3317

    11.8103

    16.9158

    11.1606

    7.4148

    高通855

    MobileNetV2

    quant_post

    28.1631

    18.3917

    11.8333

    16.9399

    11.1772

    7.4176

    高通855

    ResNet50

    FP32 baseline

    185.3705

    113.0825

    87.0741

    177.7367

    110.0433

    74.4114

    高通855

    ResNet50

    quant_aware

    327.6883

    202.4536

    106.243

    243.5621

    150.0542

    78.4205

    高通855

    ResNet50

    quant_post

    328.2683

    201.9937

    106.744

    242.6397

    150.0338

    79.8659

    麒麟970

    MobileNetV1

    FP32 baseline

    101.2455

    56.4053

    35.6484

    94.8985

    51.7251

    31.9511

    麒麟970

    MobileNetV1

    quant_aware

    62.5012

    32.1863

    16.6018

    57.7477

    29.2116

    15.0703

    麒麟970

    MobileNetV1

    quant_post

    62.4412

    32.2585

    16.6215

    57.825

    29.2573

    15.1206

    麒麟970

    MobileNetV2

    FP32 baseline

    70.4176

    42.0795

    25.1939

    68.9597

    39.2145

    22.6617

    麒麟970

    MobileNetV2

    quant_aware

    52.9961

    31.5323

    22.1447

    49.4858

    28.0856

    18.7287

    麒麟970

    MobileNetV2

    quant_post

    53.0961

    31.7987

    21.8334

    49.383

    28.2358

    18.3642

    麒麟970

    ResNet50

    FP32 baseline

    586.8943

    344.0858

    228.2293

    573.3344

    351.4332

    225.8006

    麒麟970

    ResNet50

    quant_aware

    488.361

    260.1697

    142.416

    479.5668

    249.8485

    138.1742

    麒麟970

    ResNet50

    quant_post

    489.6188

    258.3279

    142.6063

    480.0064

    249.5339

    138.5284

    1.2 剪裁

    PaddleLite推理耗时说明:

    环境:Qualcomm SnapDragon 845 + armv8

    速度指标:Thread1/Thread2/Thread4耗时

    PaddleLite版本: v2.3

    模型

    压缩方法

    Top-1/Top-5 Acc

    模型体积(MB)

    GFLOPs

    PaddleLite推理耗时

    TensorRT推理速度(FPS)

    MobileNetV1

    Baseline

    70.99%/89.68%

    17

    1.11

    66.05235.801419.5762

    -

    MobileNetV1

    uniform -50%

    69.4%/88.66% (-1.59%/-1.02%)

    9

    0.56

    33.563618.683410.5076

    -

    MobileNetV1

    sensitive -30%

    70.4%/89.3% (-0.59%/-0.38%)

    12

    0.74

    46.595825.309813.6982

    -

    MobileNetV1

    sensitive -50%

    69.8% / 88.9% (-1.19%/-0.78%)

    9

    0.56

    37.989220.788211.3144

    -

    MobileNetV2

    -

    72.15%/90.65%

    15

    0.59

    41.787423.37513.3998

    -

    MobileNetV2

    uniform -50%

    65.79%/86.11% (-6.35%/-4.47%)

    11

    0.296

    23.884213.86988.5572

    -

    ResNet34

    -

    72.15%/90.65%

    84

    7.36

    217.808139.94396.7504

    342.32

    ResNet34

    uniform -50%

    70.99%/89.95% (-1.36%/-0.87%)

    41

    3.67

    114.78775.033251.8438

    452.41

    ResNet34

    auto -55.05%

    70.24%/89.63% (-2.04%/-1.06%)

    33

    3.31

    105.92469.322248.0246

    457.25

    1.3 蒸馏

    模型

    压缩方法

    Top-1/Top-5 Acc

    模型体积(MB)

    MobileNetV1

    student

    70.99%/89.68%

    17

    ResNet50_vd

    teacher

    79.12%/94.44%

    99

    MobileNetV1

    ResNet50_vd1 distill

    72.77%/90.68% (+1.78%/+1.00%)

    17

    MobileNetV2

    student

    72.15%/90.65%

    15

    MobileNetV2

    ResNet50_vd distill

    74.28%/91.53% (+2.13%/+0.88%)

    15

    ResNet50

    student

    76.50%/93.00%

    99

    ResNet101

    teacher

    77.56%/93.64%

    173

    ResNet50

    ResNet101 distill

    77.29%/93.65% (+0.79%/+0.65%)

    99

    注意:带”_vd”后缀代表该预训练模型使用了Mixup,Mixup相关介绍参考mixup: Beyond Empirical Risk Minimization

    1.4 搜索

    数据集: ImageNet1000

    模型

    压缩方法

    Top-1/Top-5 Acc

    模型体积(MB)

    GFLOPs

    MobileNetV2

    -

    72.15%/90.65%

    15

    0.59

    MobileNetV2

    SANAS

    71.518%/90.208% (-0.632%/-0.442%)

    14

    0.295

    数据集: Cifar10

    模型

    压缩方法

    Acc

    模型参数(MB)

    下载

    Darts

    -

    97.135%

    3.767

    -

    Darts_SA(基于Darts搜索空间)

    SANAS

    97.276%(+0.141%)

    3.344(-11.2%)

    -

    Note: MobileNetV2_NAS 的token是:[4, 4, 5, 1, 1, 2, 1, 1, 0, 2, 6, 2, 0, 3, 4, 5, 0, 4, 5, 5, 1, 4, 8, 0, 0]. Darts_SA的token是:[5, 5, 0, 5, 5, 10, 7, 7, 5, 7, 7, 11, 10, 12, 10, 0, 5, 3, 10, 8].

    2. 目标检测

    2.1 量化

    数据集: COCO 2017

    模型

    压缩方法

    数据集

    Image/GPU

    输入608 Box AP

    输入416 Box AP

    输入320 Box AP

    模型体积(MB)

    TensorRT时延(V100, ms)

    MobileNet-V1-YOLOv3

    -

    COCO

    8

    29.3

    29.3

    27.1

    95

    -

    MobileNet-V1-YOLOv3

    quant_post

    COCO

    8

    27.9 (-1.4)

    28.0 (-1.3)

    26.0 (-1.0)

    25

    -

    MobileNet-V1-YOLOv3

    quant_aware

    COCO

    8

    28.1 (-1.2)

    28.2 (-1.1)

    25.8 (-1.2)

    26.3

    -

    R34-YOLOv3

    -

    COCO

    8

    36.2

    34.3

    31.4

    162

    -

    R34-YOLOv3

    quant_post

    COCO

    8

    35.7 (-0.5)

    -

    -

    42.7

    -

    R34-YOLOv3

    quant_aware

    COCO

    8

    35.2 (-1.0)

    33.3 (-1.0)

    30.3 (-1.1)

    44

    -

    R50-dcn-YOLOv3 obj365_pretrain

    -

    COCO

    8

    41.4

    -

    -

    177

    18.56

    R50-dcn-YOLOv3 obj365_pretrain

    quant_aware

    COCO

    8

    40.6 (-0.8)

    37.5

    34.1

    66

    14.64

    数据集:WIDER-FACE

    模型

    压缩方法

    Image/GPU

    输入尺寸

    Easy/Medium/Hard

    模型体积(MB)

    BlazeFace

    -

    8

    640

    91.5/89.2/79.7

    815

    BlazeFace

    quant_post

    8

    640

    87.8/85.1/74.9 (-3.7/-4.1/-4.8)

    228

    BlazeFace

    quant_aware

    8

    640

    90.5/87.9/77.6 (-1.0/-1.3/-2.1)

    228

    BlazeFace-Lite

    -

    8

    640

    90.9/88.5/78.1

    711

    BlazeFace-Lite

    quant_post

    8

    640

    89.4/86.7/75.7 (-1.5/-1.8/-2.4)

    211

    BlazeFace-Lite

    quant_aware

    8

    640

    89.7/87.3/77.0 (-1.2/-1.2/-1.1)

    211

    BlazeFace-NAS

    -

    8

    640

    83.7/80.7/65.8

    244

    BlazeFace-NAS

    quant_post

    8

    640

    81.6/78.3/63.6 (-2.1/-2.4/-2.2)

    71

    BlazeFace-NAS

    quant_aware

    8

    640

    83.1/79.7/64.2 (-0.6/-1.0/-1.6)

    71

    2.2 剪裁

    数据集:Pasacl VOC & COCO 2017

    PaddleLite推理耗时说明:

    环境:Qualcomm SnapDragon 845 + armv8

    速度指标:Thread1/Thread2/Thread4耗时

    PaddleLite版本: v2.3

    模型

    压缩方法

    数据集

    Image/GPU

    输入608 Box AP

    输入416 Box AP

    输入320 Box AP

    模型体积(MB)

    GFLOPs (608*608)

    PaddleLite推理耗时(ms)(608*608)

    TensorRT推理速度(FPS)(608*608)

    MobileNet-V1-YOLOv3

    Baseline

    Pascal VOC

    8

    76.2

    76.7

    75.3

    94

    40.49

    1238796.943520.101

    60.04

    MobileNet-V1-YOLOv3

    sensitive -52.88%

    Pascal VOC

    8

    77.6 (+1.4)

    77.7 (1.0)

    75.5 (+0.2)

    31

    19.08

    602.497353.759222.427

    99.36

    MobileNet-V1-YOLOv3

    -

    COCO

    8

    29.3

    29.3

    27.0

    95

    41.35

    -

    -

    MobileNet-V1-YOLOv3

    sensitive -51.77%

    COCO

    8

    26.0 (-3.3)

    25.1 (-4.2)

    22.6 (-4.4)

    32

    19.94

    -

    73.93

    R50-dcn-YOLOv3

    -

    COCO

    8

    39.1

    -

    -

    177

    89.60

    -

    27.68

    R50-dcn-YOLOv3

    sensitive -9.37%

    COCO

    8

    39.3 (+0.2)

    -

    -

    150

    81.20

    -

    30.08

    R50-dcn-YOLOv3

    sensitive -24.68%

    COCO

    8

    37.3 (-1.8)

    -

    -

    113

    67.48

    -

    34.32

    R50-dcn-YOLOv3 obj365_pretrain

    -

    COCO

    8

    41.4

    -

    -

    177

    89.60

    -

    -

    R50-dcn-YOLOv3 obj365_pretrain

    sensitive -9.37%

    COCO

    8

    40.5 (-0.9)

    -

    -

    150

    81.20

    -

    -

    R50-dcn-YOLOv3 obj365_pretrain

    sensitive -24.68%

    COCO

    8

    37.8 (-3.3)

    -

    -

    113

    67.48

    -

    -

    2.3 蒸馏

    数据集:Pasacl VOC & COCO 2017

    模型

    压缩方法

    数据集

    Image/GPU

    输入608 Box AP

    输入416 Box AP

    输入320 Box AP

    模型体积(MB)

    MobileNet-V1-YOLOv3

    -

    Pascal VOC

    8

    76.2

    76.7

    75.3

    94

    ResNet34-YOLOv3

    -

    Pascal VOC

    8

    82.6

    81.9

    80.1

    162

    MobileNet-V1-YOLOv3

    ResNet34-YOLOv3 distill

    Pascal VOC

    8

    79.0 (+2.8)

    78.2 (+1.5)

    75.5 (+0.2)

    94

    MobileNet-V1-YOLOv3

    -

    COCO

    8

    29.3

    29.3

    27.0

    95

    ResNet34-YOLOv3

    -

    COCO

    8

    36.2

    34.3

    31.4

    163

    MobileNet-V1-YOLOv3

    ResNet34-YOLOv3 distill

    COCO

    8

    31.4 (+2.1)

    30.0 (+0.7)

    27.1 (+0.1)

    95

    2.4 搜索

    数据集:WIDER-FACE

    模型

    压缩方法

    Image/GPU

    输入尺寸

    Easy/Medium/Hard

    模型体积(KB)

    硬件延时(ms)

    BlazeFace

    -

    8

    640

    91.5/89.2/79.7

    815

    71.862

    BlazeFace-NAS

    -

    8

    640

    83.7/80.7/65.8

    244

    21.117

    BlazeFace-NASV2

    SANAS

    8

    640

    87.0/83.7/68.5

    389

    22.558

    Note: 硬件延时时间是利用提供的硬件延时表得到的,硬件延时表是在855芯片上基于PaddleLite测试的结果。BlazeFace-NASV2的详细配置在这里.

    3. 图像分割

    数据集:Cityscapes

    3.1 量化

    模型

    压缩方法

    mIoU

    模型体积(MB)

    DeepLabv3+/MobileNetv1

    -

    63.26

    6.6

    DeepLabv3+/MobileNetv1

    quant_post

    58.63 (-4.63)

    1.8

    DeepLabv3+/MobileNetv1

    quant_aware

    62.03 (-1.23)

    1.8

    DeepLabv3+/MobileNetv2

    -

    69.81

    7.4

    DeepLabv3+/MobileNetv2

    quant_post

    67.59 (-2.22)

    2.1

    DeepLabv3+/MobileNetv2

    quant_aware

    68.33 (-1.48)

    2.1

    图像分割模型Lite时延(ms), 输入尺寸769x769

    设备

    模型类型

    压缩策略

    armv7 Thread 1

    armv7 Thread 2

    armv7 Thread 4

    armv8 Thread 1

    armv8 Thread 2

    armv8 Thread 4

    高通835

    Deeplabv3- MobileNetV1

    FP32 baseline

    1227.9894

    734.1922

    527.9592

    1109.96

    699.3818

    479.0818

    高通835

    Deeplabv3- MobileNetV1

    quant_aware

    848.6544

    512.785

    382.9915

    752.3573

    455.0901

    307.8808

    高通835

    Deeplabv3- MobileNetV1

    quant_post

    840.2323

    510.103

    371.9315

    748.9401

    452.1745

    309.2084

    高通835

    Deeplabv3-MobileNetV2

    FP32 baseline

    1282.8126

    793.2064

    653.6538

    1193.9908

    737.1827

    593.4522

    高通835

    Deeplabv3-MobileNetV2

    quant_aware

    976.0495

    659.0541

    513.4279

    892.1468

    582.9847

    484.7512

    高通835

    Deeplabv3-MobileNetV2

    quant_post

    981.44

    658.4969

    538.6166

    885.3273

    586.1284

    484.0018

    高通855

    Deeplabv3- MobileNetV1

    FP32 baseline

    568.8748

    339.8578

    278.6316

    420.6031

    281.3197

    217.5222

    高通855

    Deeplabv3- MobileNetV1

    quant_aware

    608.7578

    347.2087

    260.653

    241.2394

    177.3456

    143.9178

    高通855

    Deeplabv3- MobileNetV1

    quant_post

    609.0142

    347.3784

    259.9825

    239.4103

    180.1894

    139.9178

    高通855

    Deeplabv3-MobileNetV2

    FP32 baseline

    639.4425

    390.1851

    322.7014

    477.7667

    339.7411

    262.2847

    高通855

    Deeplabv3-MobileNetV2

    quant_aware

    703.7275

    497.689

    417.1296

    394.3586

    300.2503

    239.9204

    高通855

    Deeplabv3-MobileNetV2

    quant_post

    705.7589

    474.4076

    427.2951

    394.8352

    297.4035

    264.6724

    麒麟970

    Deeplabv3- MobileNetV1

    FP32 baseline

    1682.1792

    1437.9774

    1181.0246

    1261.6739

    1068.6537

    690.8225

    麒麟970

    Deeplabv3- MobileNetV1

    quant_aware

    1062.3394

    1248.1014

    878.3157

    774.6356

    710.6277

    528.5376

    麒麟970

    Deeplabv3- MobileNetV1

    quant_post

    1109.1917

    1339.6218

    866.3587

    771.5164

    716.5255

    500.6497

    麒麟970

    Deeplabv3-MobileNetV2

    FP32 baseline

    1771.1301

    1746.0569

    1222.4805

    1448.9739

    1192.4491

    760.606

    麒麟970

    Deeplabv3-MobileNetV2

    quant_aware

    1320.2905

    921.4522

    676.0732

    1145.8801

    821.5685

    590.1713

    麒麟970

    Deeplabv3-MobileNetV2

    quant_post

    1320.386

    918.5328

    672.2481

    1020.753

    820.094

    591.4114

    3.2 剪裁

    PaddleLite推理耗时说明:

    环境:Qualcomm SnapDragon 845 + armv8

    速度指标:Thread1/Thread2/Thread4耗时

    PaddleLite版本: v2.3

    模型

    压缩方法

    mIoU

    模型体积(MB)

    GFLOPs

    PaddleLite推理耗时

    TensorRT推理速度(FPS)

    fast-scnn

    baseline

    69.64

    11

    14.41

    1226.36682.96415.664

    39.53

    fast-scnn

    uniform -17.07%

    69.58 (-0.06)

    8.5

    11.95

    1140.37656.612415.888

    42.01

    fast-scnn

    sensitive -47.60%

    66.68 (-2.96)

    5.7

    7.55

    866.693494.467291.748

    51.48

     
    人工智能芯片与自动驾驶
  • 相关阅读:
    警示
    【拒绝挂分】盘点蒟蒻ghy的各种sb错误
    牛客NOIPtg day5 B-demo的gcd
    数字校园APP——视频分享
    数字校园APP——软件需求规格说明书
    数字校园APP——可行性报告分析
    数字校园APP开发与应用
    结对编程第二次作业——四则运算自动生成器
    软件工程第四次作业
    软件工程第三次作业
  • 原文地址:https://www.cnblogs.com/wujianming-110117/p/14424097.html
Copyright © 2020-2023  润新知