    Single Shot Multibox Detection (SSD)实战(下)

     2. Training


    2.1. Data Reading and Initialization


    batch_size = 32

    train_iter, _ = d2l.load_data_pikachu(batch_size)


    ctx, net = d2l.try_gpu(), TinySSD(num_classes=1)

    net.initialize(init=init.Xavier(), ctx=ctx)

    trainer = gluon.Trainer(net.collect_params(), 'sgd',

                            {'learning_rate': 0.2, 'wd': 5e-4})

    2.2. Defining Loss and Evaluation Functions


    cls_loss = gluon.loss.SoftmaxCrossEntropyLoss()

    bbox_loss = gluon.loss.L1Loss()


    def calc_loss(cls_preds, cls_labels, bbox_preds, bbox_labels, bbox_masks):

        cls = cls_loss(cls_preds, cls_labels)

        bbox = bbox_loss(bbox_preds * bbox_masks, bbox_labels * bbox_masks)

    return cls + bbox


    def cls_eval(cls_preds, cls_labels):

        # Because the category prediction results are placed in the final

        # dimension, argmax must specify this dimension

        return float((cls_preds.argmax(axis=-1) == cls_labels).sum())

    def bbox_eval(bbox_preds, bbox_labels, bbox_masks):

        return float((np.abs((bbox_labels - bbox_preds) * bbox_masks)).sum())

    2.3. Training the Model


    num_epochs, timer = 20, d2l.Timer()

    animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs],

                            legend=['class error', 'bbox mae'])

    for epoch in range(num_epochs):

        # accuracy_sum, mae_sum, num_examples, num_labels

        metric = d2l.Accumulator(4)

        train_iter.reset()  # Read data from the start.

        for batch in train_iter:


            X = batch.data[0].as_in_ctx(ctx)

            Y = batch.label[0].as_in_ctx(ctx)

            with autograd.record():

                # Generate multiscale anchor boxes and predict the category and

                # offset of each

                anchors, cls_preds, bbox_preds = net(X)

                # Label the category and offset of each anchor box

                bbox_labels, bbox_masks, cls_labels = npx.multibox_target(

                    anchors, Y, cls_preds.transpose(0, 2, 1))

                # Calculate the loss function using the predicted and labeled

                # category and offset values

                l = calc_loss(cls_preds, cls_labels, bbox_preds, bbox_labels,




            metric.add(cls_eval(cls_preds, cls_labels), cls_labels.size,

                       bbox_eval(bbox_preds, bbox_labels, bbox_masks),


        cls_err, bbox_mae = 1-metric[0]/metric[1], metric[2]/metric[3]

        animator.add(epoch+1, (cls_err, bbox_mae))

    print('class err %.2e, bbox mae %.2e' % (cls_err, bbox_mae))

    print('%.1f examples/sec on %s' % (train_iter.num_image/timer.stop(), ctx))

    class err 2.35e-03, bbox mae 2.68e-03

    4315.5 examples/sec on gpu(0)

    3. Prediction

    img = image.imread('../img/pikachu.jpg')

    feature = image.imresize(img, 256, 256).astype('float32')

    X = np.expand_dims(feature.transpose(2, 0, 1), axis=0)


    def predict(X):

        anchors, cls_preds, bbox_preds = net(X.as_in_ctx(ctx))

        cls_probs = npx.softmax(cls_preds).transpose(0, 2, 1)

        output = npx.multibox_detection(cls_probs, bbox_preds, anchors)

        idx = [i for i, row in enumerate(output[0]) if row[0] != -1]

        return output[0, idx]

    output = predict(X)


    def display(img, output, threshold):

        d2l.set_figsize((5, 5))

        fig = d2l.plt.imshow(img.asnumpy())

        for row in output:

            score = float(row[1])

            if score < threshold:


            h, w = img.shape[0:2]

            bbox = [row[2:6] * np.array((w, h, w, h), ctx=row.ctx)]

            d2l.show_bboxes(fig.axes, bbox, '%.2f' % score, 'w')


    display(img, output, threshold=0.3)

    4. Loss Function


    For the predicted offsets, replace L1L1 norm loss with L1L1 regularization loss. This loss function uses a square function around zero for greater smoothness. This is the regularized area controlled by the hyperparameter σσ:

    When σσ is large, this loss is similar to the L1L1 norm loss. When the value is small, the loss function is smoother.

    sigmas = [10, 1, 0.5]

    lines = ['-', '--', '-.']

    x = np.arange(-2, 2, 0.1)


    for l, s in zip(lines, sigmas):

        y = npx.smooth_l1(x, scalar=s)

        d2l.plt.plot(x.asnumpy(), y.asnumpy(), l, label='sigma=%.1f' % s)


    def focal_loss(gamma, x):

        return -(1 - x) ** gamma * np.log(x)

    x = np.arange(0.01, 1, 0.01)

    for l, gamma in zip(lines, [0, 1, 5]):

        y = d2l.plt.plot(x.asnumpy(), focal_loss(gamma, x).asnumpy(), l,

                         label='gamma=%.1f' % gamma)


    Training and Prediction

    When an object is relatively large compared to the image, the model normally adopts a larger input image size.

    This generally produces a large number of negative anchor boxes when labeling anchor box categories. We can sample the negative anchor boxes to better balance the data categories. To do this, we can set the MultiBoxTarget function’s negative_mining_ratio parameter.

    Assign hyper-parameters with different weights to the anchor box category loss and positive anchor box offset loss in the loss function.

    Refer to the SSD paper. What methods can be used to evaluate the precision of object detection models?

    5. Summary

    • SSD is a multiscale object detection model. This model generates different numbers of anchor boxes of different sizes based on the base network block and each multiscale feature block and predicts the categories and offsets of the anchor boxes to detect objects of different sizes.
    • During SSD model training, the loss function is calculated using the predicted and labeled category and offset values.


