• TVM 各个模块总体架构


    TVM 各个模块总体架构

      

     Deploy Deep Learning Everywhere

     

     Existing Deep Learning Frameworks

     

     Limitations of Existing Approach

     

     Learning-based Learning System

     

     Problem Setting

     

     Example Instance in a Search Space

     

     

      Optimization Choices in a Search Space

     Problem Formalization

     

     Black-box Optimization

     

     Cost-model Driven Approach

     

     Statistical Cost Model

     

     Unique Problem Characteristics

     

     Vanilla Cost Modeling

     

     Program-aware Modeling: Tree-based Approach

     

     Program-aware Modeling: Neural Approach

     

     Comparisons of Models

     

     Unique Problem Characteristics

     

     Transferable Cost Model

     

     Impact of Transfer Learning

     

     Learning to Optimize Tensor Programs

     

     Device Fleet: Distributed Test Bed for AutoTVM

     

     TVM: End to End Deep Learning Compiler

     

     Tensor Expression and Optimization Search Space

     

     Search Space for CPUs

     

     Hardware-aware Search Space

     

     Search Space for GPUs

     

     Search Space for TPU-like Specialized Accelerators

     

     Tensorization Challenge

     

     Tensorization Challenge

     

     Search Space for TPU-like Specialized Accelerators

     

     Software Support for Latency Hiding

     

     

     Summary: Hardware-aware Search Space

     

     VTA: Open & Flexible Deep Learning Accelerator

     

     TVM: End to End Deep Learning Compiler

     

     Need for More Dynamism

     

     Relay Virtual Machine

     

     uTVM: TVM on bare-metal Devices

     

     Core Infrastructure

     

     TSIM: Support for Future Hardware

     

     Unified Runtime For Heterogeneous Devices

     

     Unified Runtime Benefit

     

     Effectiveness of ML based Model

     

     Comparisons of Models

     

     Device Fleet in Action

     

     End to End Inference Performance (Nvidia Titan X)

     

     Portable Performance Across Hardware Platforms

     

    人工智能芯片与自动驾驶
  • 相关阅读:
    魔方
    js烟花特效
    面试cookie
    扩展日期插件
    通过javascript实现1~100内能同时被2和3整除的数并生成如下表格
    用三或四个个div标签实现工字效果
    2015_WEB页面前端工程师_远程测题_东方蜘蛛_1
    js公有、私有、静态属性和方法的区别
    Docker libnetwork(CNM)设计简介
    kubernetes,Docker网络相关资料链接
  • 原文地址:https://www.cnblogs.com/wujianming-110117/p/14878746.html
Copyright © 2020-2023  润新知