swin transformer

论文标题：Swin Transformer: Hierarchical Vision Transformer using ShiftedWindows

swin transformer的主要有特点有三个：

第一，把图像划分为一个个窗口，只在窗口内部计算self-attention。这样带来的优势是，self-attention的计算复杂度只与图像尺寸呈线性系，而非平方关系。（Swin Transformer builds hierarchical feature maps by merging image patches in deeper layers and has linear computation complexity to input image size due to computation of self-attention only within each local window.）
第二，后面layer的patch会合并前面layer的patch，所以越深的layer，它的patch size越大，视野越大，从而构建出hierarchical feature maps。（Swin Transformer constructs a hierarchical representation by starting from small-sized patches (outlined in gray) and gradually merging neighboring patches in deeper Transformer layers.）
第三个特点是shifted window，就是前后两层的window划分之间有偏移。每一个swin transformer block都包含两层，第一层是W-MSA (window multi-head self-attention)，第二层是SW-MSA (shifted window multi-head self-attention)。前后层这种shifted window分别为对方的被拆开的window带来了联结。（The shifted windows bridge the windows of the preceding layer, providing connections among them that significantly enhance modeling power）

论文讲解资料：

相关阅读:
Android View部分消失效果实现
Android TV Overscan
一招搞定短信验证码服务不稳定
揭秘：网上抽奖系统如何防止刷奖
SVN迁移到GIT
Android之高效率截图
Android TV 开发（5）
Android 标题栏（2）
Android 标题栏（1）
一步步教你学会browserify

原文地址：https://www.cnblogs.com/picassooo/p/16725718.html