1、darknet53 Backbone 输入: 416x416x3 3x3Conv2d, stride=1, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 32, 416, 416] [1,2,8,8,4]五种尺度,一共32倍下采样 layer1:重复1次 3x3Conv2d, stride=2, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 64, 208, 208] 下采样,channel加倍 残差模块 1x1Conv2d, stride=1, padding=0 -> BatchNorm2d -> LeakyReLu : [1, 32, 208, 208] 1x1卷积降维 3x3Conv2d, stride=1, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 64, 208, 208] 3x3卷积升维 layer2:重复2次 3x3Conv2d, stride=2, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 128, 104, 104] 下采样,channel加倍 残差模块 1x1Conv2d, stride=1, padding=0 -> BatchNorm2d -> LeakyReLu : [1, 64, 104, 104] 1x1卷积降维 3x3Conv2d, stride=1, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 128, 104, 104] 3x3卷积升维 残差模块 1x1Conv2d, stride=1, padding=0 -> BatchNorm2d -> LeakyReLu : [1, 64, 104, 104] 1x1卷积降维 3x3Conv2d, stride=1, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 128, 104, 104] 3x3卷积升维 layer3:重复8次 这个特征需要使用,特征图尺寸为52x52x256 3x3Conv2d, stride=2, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 256, 52, 52] 下采样,channel加倍 残差模块 1x1Conv2d, stride=1, padding=0 -> BatchNorm2d -> LeakyReLu : [1, 128, 52, 52] 1x1卷积降维 3x3Conv2d, stride=1, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 256, 52, 52] 3x3卷积升维 1x1Conv2d, stride=1, padding=0 -> BatchNorm2d -> LeakyReLu : [1, 128, 52, 52] 1x1卷积降维 3x3Conv2d, stride=1, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 256, 52, 52] 3x3卷积升维 1x1Conv2d, stride=1, padding=0 -> BatchNorm2d -> LeakyReLu : [1, 128, 52, 52] 1x1卷积降维 3x3Conv2d, stride=1, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 256, 52, 52] 3x3卷积升维 1x1Conv2d, stride=1, padding=0 -> BatchNorm2d -> LeakyReLu : [1, 128, 52, 52] 1x1卷积降维 3x3Conv2d, stride=1, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 256, 52, 52] 3x3卷积升维 1x1Conv2d, stride=1, padding=0 -> BatchNorm2d -> LeakyReLu : [1, 128, 52, 52] 1x1卷积降维 3x3Conv2d, stride=1, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 256, 52, 52] 3x3卷积升维 1x1Conv2d, stride=1, padding=0 -> BatchNorm2d -> LeakyReLu : [1, 128, 52, 52] 1x1卷积降维 3x3Conv2d, stride=1, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 256, 52, 52] 3x3卷积升维 1x1Conv2d, stride=1, padding=0 -> BatchNorm2d -> LeakyReLu : [1, 128, 52, 52] 1x1卷积降维 3x3Conv2d, stride=1, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 256, 52, 52] 3x3卷积升维 1x1Conv2d, stride=1, padding=0 -> BatchNorm2d -> LeakyReLu : [1, 128, 52, 52] 1x1卷积降维 3x3Conv2d, stride=1, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 256, 52, 52] 3x3卷积升维 layer4:重复8次 这个特征需要使用,特征图尺寸为26x26x512 3x3Conv2d, stride=2, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 512, 26, 26] 下采样,channel加倍 残差模块 1x1Conv2d, stride=1, padding=0 -> BatchNorm2d -> LeakyReLu : [1, 256, 26, 26] 1x1卷积降维 3x3Conv2d, stride=1, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 512, 26, 26] 3x3卷积升维 1x1Conv2d, stride=1, padding=0 -> BatchNorm2d -> LeakyReLu : [1, 256, 26, 26] 1x1卷积降维 3x3Conv2d, stride=1, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 512, 26, 26] 3x3卷积升维 1x1Conv2d, stride=1, padding=0 -> BatchNorm2d -> LeakyReLu : [1, 256, 26, 26] 1x1卷积降维 3x3Conv2d, stride=1, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 512, 26, 26] 3x3卷积升维 1x1Conv2d, stride=1, padding=0 -> BatchNorm2d -> LeakyReLu : [1, 256, 26, 26] 1x1卷积降维 3x3Conv2d, stride=1, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 512, 26, 26] 3x3卷积升维 1x1Conv2d, stride=1, padding=0 -> BatchNorm2d -> LeakyReLu : [1, 256, 26, 26] 1x1卷积降维 3x3Conv2d, stride=1, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 512, 26, 26] 3x3卷积升维 1x1Conv2d, stride=1, padding=0 -> BatchNorm2d -> LeakyReLu : [1, 256, 26, 26] 1x1卷积降维 3x3Conv2d, stride=1, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 512, 26, 26] 3x3卷积升维 1x1Conv2d, stride=1, padding=0 -> BatchNorm2d -> LeakyReLu : [1, 256, 26, 26] 1x1卷积降维 3x3Conv2d, stride=1, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 512, 26, 26] 3x3卷积升维 1x1Conv2d, stride=1, padding=0 -> BatchNorm2d -> LeakyReLu : [1, 256, 26, 26] 1x1卷积降维 3x3Conv2d, stride=1, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 512, 26, 26] 3x3卷积升维 layer5: 重复4次 这个特征需要使用,特征图尺寸为13x13x1024 3x3Conv2d, stride=2, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 1024, 13, 13] 下采样,channel加倍 残差模块 1x1Conv2d, stride=1, padding=0 -> BatchNorm2d -> LeakyReLu : [1, 512, 13, 13] 1x1卷积降维 3x3Conv2d, stride=1, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 1024, 13, 13] 3x3卷积升维 1x1Conv2d, stride=1, padding=0 -> BatchNorm2d -> LeakyReLu : [1, 512, 13, 13] 1x1卷积降维 3x3Conv2d, stride=1, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 1024, 13, 13] 3x3卷积升维 1x1Conv2d, stride=1, padding=0 -> BatchNorm2d -> LeakyReLu : [1, 512, 13, 13] 1x1卷积降维 3x3Conv2d, stride=1, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 1024, 13, 13] 3x3卷积升维 1x1Conv2d, stride=1, padding=0 -> BatchNorm2d -> LeakyReLu : [1, 512, 13, 13] 1x1卷积降维 3x3Conv2d, stride=1, padding=1 -> BatchNorm2d -> LeakyReLu : [1, 1024, 13, 13] 3x3卷积升维 Head yolo branch 0 ( out0 ) out0_branch----x0: (1x1024x13x13) -> 1x512x13x13 -> 1x1024x13x13 -> 1x512x13x13 -> 1x1024x13x13 -> 1x512x13x13 out0------------------out0_branch -> 1x1024x13x13 -> 1x计算出来的数x13x13 yolo branch 1 1x512x13x13 -> 1x256x13x13 使用1x1卷积调整通道数 out0_branch (1x256x13x13)-> 1x256x26x26 上采样 1x256x26x26 + 1x512x26x26 -> 1x768x26x26 特征图融合 out1_branch ---------- x1_in: 1x768x26x26 -> 1x256x26x26 -> 1x512x26x26 -> 1x256x26x26 -> 1x512x26x26 ->1x256x26x26 out1 --------------------------out1_branch -> 1x512x26x26 ->1x计算出来的数x26x26 yolo branch 2 26,26,256 -> 26,26,128 26,26,128 -> 52,52,128 上采样 1x128x52x52 + 1x256x52x52 -> 1x384x52x52 融合 #out2----1x384x52x52 -> 1x128x52x52 -> 1x256x52x52 -> 1x128x52x52 -> 1x256x52x52 -> 1x128x52x52 -> 1x256x52x52 -> 1X计算出来的数X52X52 损失函数 GroundTruth: images-8x3x416x416, 8-batchsize 3-channel 416-height 416-width targets-8x(n1+n2+....+n8)x5---8张图,每张图有nk个真值框 网络输出:outputs---3x(8x(5+类别数)x13x13) 3x(8x(5+类别数)x26x26) 3x(8x(5+类别数)x52x52) 第一张特征图:3x(8x(5+类别数)x13x13) 检测大物体 预测框中心点坐标x:【8,3,13,13】 sigmoid 预测框中心点坐标y:【8,3,13,13】sigmoid 预测框宽度w: 【8,3,13,13】 预测框高度h: 【8,3,13,13】 预测框有没有物体conf:【8,3,13,13】sigmoid 预测框类别预测: 【8,3,13,13,6】 6是类别数 有6类物体需要识别 sigmoid 重点1---找到哪些先验框内部包含物体 (1)计算GroundTruth在13x13特征图上的坐标x,y,w,h,向下取整看x,y落在哪个方格 (2)预测框和anchor左上角移动至坐标原点,利用宽高算IOU (3)计算框中心点坐标x,y距离所属的单元格左上角偏移量,宽度w和高度h相对于特征图13x13中的anchor宽度和高度进行编码 重点2---将预测结果进行解码,判断预测结果和真实值的重合程度