• swin transformer


    论文标题:Swin Transformer: Hierarchical Vision Transformer using ShiftedWindows

    swin transformer的主要有特点有三个:

    • 第一,把图像划分为一个个窗口,只在窗口内部计算self-attention。这样带来的优势是,self-attention的计算复杂度只与图像尺寸呈线性 系,而非平方关系。(Swin Transformer builds hierarchical feature maps by merging image patches in deeper  layers and has linear computation complexity to input image size due to computation of self-attention only within each local window.)
    • 第二,后面layer的patch会合并前面layer的patch,所以越深的layer,它的patch size越大,视野越大,从而构建出hierarchical feature maps。(Swin Transformer constructs a hierarchical representation by starting from small-sized patches (outlined in gray) and gradually merging neighboring patches in deeper Transformer layers.)
    • 第三个特点是shifted window,就是前后两层的window划分之间有偏移。每一个swin transformer block都包含两层,第一层是W-MSA (window multi-head self-attention),第二层是SW-MSA (shifted window multi-head self-attention)。前后层这种shifted window分别为对方的被拆开的window带来了联结。(The shifted windows bridge the windows of the preceding layer, providing connections among them that significantly enhance modeling power)

    论文讲解资料:

    知乎:CV+Transformer之Swin Transformer

  • 相关阅读:
    Android View部分消失效果实现
    Android TV Overscan
    一招搞定短信验证码服务不稳定
    揭秘:网上抽奖系统如何防止刷奖
    SVN迁移到GIT
    Android之高效率截图
    Android TV 开发(5)
    Android 标题栏(2)
    Android 标题栏(1)
    一步步教你学会browserify
  • 原文地址:https://www.cnblogs.com/picassooo/p/16725718.html
Copyright © 2020-2023  润新知