• Detectron2 Beginner's Tutorial


    https://colab.research.google.com/drive/14MAOR5dy7EEl8s7vAc4r68_x67G77Yip

    Detectron2 Beginner's Tutorial

    Welcome to detectron2! This is the official colab tutorial of detectron2. Here, we will go through some basics usage of detectron2, including the following:

    • Run inference on images or videos, with an existing detectron2 model
    • Train a detectron2 model on a new dataset

    You can make a copy of this tutorial by "File -> Open in playground mode" and play with it yourself. DO NOT request access to this tutorial.

    Install detectron2

    # install dependencies: 
    !pip install pyyaml==5.1
    import torch, torchvision
    print(torch.__version__, torch.cuda.is_available())
    !gcc --version
    # opencv is pre-installed on colab
    
    # install detectron2: (Colab has CUDA 10.1 + torch 1.7)
    # See https://detectron2.readthedocs.io/tutorials/install.html for instructions
    import torch
    assert torch.__version__.startswith("1.7")
    !pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.7/index.html
    # exit(0)  # After installation, you need to "restart runtime" in Colab. This line can also restart runtime
    
    # Some basic setup:
    # Setup detectron2 logger
    import detectron2
    from detectron2.utils.logger import setup_logger
    setup_logger()
    
    # import some common libraries
    import numpy as np
    import os, json, cv2, random
    from google.colab.patches import cv2_imshow
    
    # import some common detectron2 utilities
    from detectron2 import model_zoo
    from detectron2.engine import DefaultPredictor
    from detectron2.config import get_cfg
    from detectron2.utils.visualizer import Visualizer
    from detectron2.data import MetadataCatalog, DatasetCatalog
    

    Run a pre-trained detectron2 model

    We first download an image from the COCO dataset:

    !wget http://images.cocodataset.org/val2017/000000439715.jpg -q -O input.jpg
    im = cv2.imread("./input.jpg")
    cv2_imshow(im)
    

    Then, we create a detectron2 config and a detectron2 DefaultPredictor to run inference on this image.

    cfg = get_cfg()
    # add project-specific config (e.g., TensorMask) here if you're not running a model in detectron2's core library
    cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
    cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # set threshold for this model
    # Find a model from detectron2's model zoo. You can use the https://dl.fbaipublicfiles... url as well
    cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
    predictor = DefaultPredictor(cfg)
    outputs = predictor(im)
    
    # look at the outputs. See https://detectron2.readthedocs.io/tutorials/models.html#model-output-format for specification
    print(outputs["instances"].pred_classes)
    print(outputs["instances"].pred_boxes)
    
    tensor([17,  0,  0,  0,  0,  0,  0,  0, 25,  0, 25, 25,  0,  0, 24],
           device='cuda:0')
    Boxes(tensor([[126.6035, 244.8977, 459.8291, 480.0000],
            [251.1083, 157.8127, 338.9731, 413.6379],
            [114.8496, 268.6864, 148.2352, 398.8111],
            [  0.8217, 281.0327,  78.6072, 478.4210],
            [ 49.3954, 274.1229,  80.1545, 342.9808],
            [561.2248, 271.5816, 596.2755, 385.2552],
            [385.9072, 270.3125, 413.7130, 304.0397],
            [515.9295, 278.3744, 562.2792, 389.3802],
            [335.2409, 251.9167, 414.7491, 275.9375],
            [350.9300, 269.2060, 386.0984, 297.9081],
            [331.6292, 230.9996, 393.2759, 257.2009],
            [510.7349, 263.2656, 570.9865, 295.9194],
            [409.0841, 271.8646, 460.5582, 356.8722],
            [506.8767, 283.3257, 529.9403, 324.0392],
            [594.5663, 283.4820, 609.0577, 311.4124]], device='cuda:0'))
    
    
    
    # We can use `Visualizer` to draw the predictions on the image.
    v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2)
    out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
    cv2_imshow(out.get_image()[:, :, ::-1])
    

    Train on a custom dataset

    In this section, we show how to train an existing detectron2 model on a custom dataset in a new format.

    We use the balloon segmentation dataset
    which only has one class: balloon.
    We'll train a balloon segmentation model from an existing model pre-trained on COCO dataset, available in detectron2's model zoo.

    Note that COCO dataset does not have the "balloon" category. We'll be able to recognize this new class in a few minutes.

    Prepare the dataset

    # download, decompress the data
    !wget https://github.com/matterport/Mask_RCNN/releases/download/v2.1/balloon_dataset.zip
    !unzip balloon_dataset.zip > /dev/null
    

    Register the balloon dataset to detectron2, following the detectron2 custom dataset tutorial.
    Here, the dataset is in its custom format, therefore we write a function to parse it and prepare it into detectron2's standard format. User should write such a function when using a dataset in custom format. See the tutorial for more details.

    # if your dataset is in COCO format, this cell can be replaced by the following three lines:
    # from detectron2.data.datasets import register_coco_instances
    # register_coco_instances("my_dataset_train", {}, "json_annotation_train.json", "path/to/image/dir")
    # register_coco_instances("my_dataset_val", {}, "json_annotation_val.json", "path/to/image/dir")
    
    from detectron2.structures import BoxMode
    
    def get_balloon_dicts(img_dir):
        json_file = os.path.join(img_dir, "via_region_data.json")
        with open(json_file) as f:
            imgs_anns = json.load(f)
    
        dataset_dicts = []
        for idx, v in enumerate(imgs_anns.values()):
            record = {}
            
            filename = os.path.join(img_dir, v["filename"])
            height, width = cv2.imread(filename).shape[:2]
            
            record["file_name"] = filename
            record["image_id"] = idx
            record["height"] = height
            record["width"] = width
          
            annos = v["regions"]
            objs = []
            for _, anno in annos.items():
                assert not anno["region_attributes"]
                anno = anno["shape_attributes"]
                px = anno["all_points_x"]
                py = anno["all_points_y"]
                poly = [(x + 0.5, y + 0.5) for x, y in zip(px, py)]
                poly = [p for x in poly for p in x]
    
                obj = {
                    "bbox": [np.min(px), np.min(py), np.max(px), np.max(py)],
                    "bbox_mode": BoxMode.XYXY_ABS,
                    "segmentation": [poly],
                    "category_id": 0,
                }
                objs.append(obj)
            record["annotations"] = objs
            dataset_dicts.append(record)
        return dataset_dicts
    
    for d in ["train", "val"]:
        DatasetCatalog.register("balloon_" + d, lambda d=d: get_balloon_dicts("balloon/" + d))
        MetadataCatalog.get("balloon_" + d).set(thing_classes=["balloon"])
    balloon_metadata = MetadataCatalog.get("balloon_train")
    

    To verify the data loading is correct, let's visualize the annotations of randomly selected samples in the training set:

    dataset_dicts = get_balloon_dicts("balloon/train")
    for d in random.sample(dataset_dicts, 3):
        img = cv2.imread(d["file_name"])
        visualizer = Visualizer(img[:, :, ::-1], metadata=balloon_metadata, scale=0.5)
        out = visualizer.draw_dataset_dict(d)
        cv2_imshow(out.get_image()[:, :, ::-1])
    

    Train!

    Now, let's fine-tune a COCO-pretrained R50-FPN Mask R-CNN model on the balloon dataset. It takes ~6 minutes to train 300 iterations on Colab's K80 GPU, or ~2 minutes on a P100 GPU.

    from detectron2.engine import DefaultTrainer
    
    cfg = get_cfg()
    cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
    cfg.DATASETS.TRAIN = ("balloon_train",)
    cfg.DATASETS.TEST = ()
    cfg.DATALOADER.NUM_WORKERS = 2
    cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")  # Let training initialize from model zoo
    cfg.SOLVER.IMS_PER_BATCH = 2
    cfg.SOLVER.BASE_LR = 0.00025  # pick a good LR
    cfg.SOLVER.MAX_ITER = 300    # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset
    cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # faster, and good enough for this toy dataset (default: 512)
    cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # only has one class (ballon). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)
    # NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here.
    
    os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
    trainer = DefaultTrainer(cfg) 
    trainer.resume_or_load(resume=False)
    trainer.train()
    
    [11/06 01:35:37 d2.engine.defaults]: Model:
    GeneralizedRCNN(
      (backbone): FPN(
        (fpn_lateral2): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
        (fpn_output2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
        (fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
        (fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
        (fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (top_block): LastLevelMaxPool()
        (bottom_up): ResNet(
          (stem): BasicStem(
            (conv1): Conv2d(
              3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
              (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
            )
          )
          (res2): Sequential(
            (0): BottleneckBlock(
              (shortcut): Conv2d(
                64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
              )
              (conv1): Conv2d(
                64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
              )
              (conv2): Conv2d(
                64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
              )
              (conv3): Conv2d(
                64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
              )
            )
            (1): BottleneckBlock(
              (conv1): Conv2d(
                256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
              )
              (conv2): Conv2d(
                64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
              )
              (conv3): Conv2d(
                64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
              )
            )
            (2): BottleneckBlock(
              (conv1): Conv2d(
                256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
              )
              (conv2): Conv2d(
                64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
              )
              (conv3): Conv2d(
                64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
              )
            )
          )
          (res3): Sequential(
            (0): BottleneckBlock(
              (shortcut): Conv2d(
                256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
                (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
              )
              (conv1): Conv2d(
                256, 128, kernel_size=(1, 1), stride=(2, 2), bias=False
                (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
              )
              (conv2): Conv2d(
                128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
              )
              (conv3): Conv2d(
                128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
              )
            )
            (1): BottleneckBlock(
              (conv1): Conv2d(
                512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
              )
              (conv2): Conv2d(
                128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
              )
              (conv3): Conv2d(
                128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
              )
            )
            (2): BottleneckBlock(
              (conv1): Conv2d(
                512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
              )
              (conv2): Conv2d(
                128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
              )
              (conv3): Conv2d(
                128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
              )
            )
            (3): BottleneckBlock(
              (conv1): Conv2d(
                512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
              )
              (conv2): Conv2d(
                128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
              )
              (conv3): Conv2d(
                128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
              )
            )
          )
          (res4): Sequential(
            (0): BottleneckBlock(
              (shortcut): Conv2d(
                512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False
                (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
              )
              (conv1): Conv2d(
                512, 256, kernel_size=(1, 1), stride=(2, 2), bias=False
                (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
              )
              (conv2): Conv2d(
                256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
              )
              (conv3): Conv2d(
                256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
              )
            )
            (1): BottleneckBlock(
              (conv1): Conv2d(
                1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
              )
              (conv2): Conv2d(
                256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
              )
              (conv3): Conv2d(
                256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
              )
            )
            (2): BottleneckBlock(
              (conv1): Conv2d(
                1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
              )
              (conv2): Conv2d(
                256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
              )
              (conv3): Conv2d(
                256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
              )
            )
            (3): BottleneckBlock(
              (conv1): Conv2d(
                1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
              )
              (conv2): Conv2d(
                256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
              )
              (conv3): Conv2d(
                256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
              )
            )
            (4): BottleneckBlock(
              (conv1): Conv2d(
                1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
              )
              (conv2): Conv2d(
                256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
              )
              (conv3): Conv2d(
                256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
              )
            )
            (5): BottleneckBlock(
              (conv1): Conv2d(
                1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
              )
              (conv2): Conv2d(
                256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
              )
              (conv3): Conv2d(
                256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
              )
            )
          )
          (res5): Sequential(
            (0): BottleneckBlock(
              (shortcut): Conv2d(
                1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False
                (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
              )
              (conv1): Conv2d(
                1024, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
                (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
              )
              (conv2): Conv2d(
                512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
              )
              (conv3): Conv2d(
                512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
              )
            )
            (1): BottleneckBlock(
              (conv1): Conv2d(
                2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
              )
              (conv2): Conv2d(
                512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
              )
              (conv3): Conv2d(
                512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
              )
            )
            (2): BottleneckBlock(
              (conv1): Conv2d(
                2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
              )
              (conv2): Conv2d(
                512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
              )
              (conv3): Conv2d(
                512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
                (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
              )
            )
          )
        )
      )
      (proposal_generator): RPN(
        (rpn_head): StandardRPNHead(
          (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (objectness_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1))
          (anchor_deltas): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1))
        )
        (anchor_generator): DefaultAnchorGenerator(
          (cell_anchors): BufferList()
        )
      )
      (roi_heads): StandardROIHeads(
        (box_pooler): ROIPooler(
          (level_poolers): ModuleList(
            (0): ROIAlign(output_size=(7, 7), spatial_scale=0.25, sampling_ratio=0, aligned=True)
            (1): ROIAlign(output_size=(7, 7), spatial_scale=0.125, sampling_ratio=0, aligned=True)
            (2): ROIAlign(output_size=(7, 7), spatial_scale=0.0625, sampling_ratio=0, aligned=True)
            (3): ROIAlign(output_size=(7, 7), spatial_scale=0.03125, sampling_ratio=0, aligned=True)
          )
        )
        (box_head): FastRCNNConvFCHead(
          (flatten): Flatten(start_dim=1, end_dim=-1)
          (fc1): Linear(in_features=12544, out_features=1024, bias=True)
          (fc_relu1): ReLU()
          (fc2): Linear(in_features=1024, out_features=1024, bias=True)
          (fc_relu2): ReLU()
        )
        (box_predictor): FastRCNNOutputLayers(
          (cls_score): Linear(in_features=1024, out_features=2, bias=True)
          (bbox_pred): Linear(in_features=1024, out_features=4, bias=True)
        )
        (mask_pooler): ROIPooler(
          (level_poolers): ModuleList(
            (0): ROIAlign(output_size=(14, 14), spatial_scale=0.25, sampling_ratio=0, aligned=True)
            (1): ROIAlign(output_size=(14, 14), spatial_scale=0.125, sampling_ratio=0, aligned=True)
            (2): ROIAlign(output_size=(14, 14), spatial_scale=0.0625, sampling_ratio=0, aligned=True)
            (3): ROIAlign(output_size=(14, 14), spatial_scale=0.03125, sampling_ratio=0, aligned=True)
          )
        )
        (mask_head): MaskRCNNConvUpsampleHead(
          (mask_fcn1): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
            (activation): ReLU()
          )
          (mask_fcn2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
            (activation): ReLU()
          )
          (mask_fcn3): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
            (activation): ReLU()
          )
          (mask_fcn4): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
            (activation): ReLU()
          )
          (deconv): ConvTranspose2d(256, 256, kernel_size=(2, 2), stride=(2, 2))
          (deconv_relu): ReLU()
          (predictor): Conv2d(256, 1, kernel_size=(1, 1), stride=(1, 1))
        )
      )
    )
    [11/06 01:35:39 d2.data.build]: Removed 0 images with no usable annotations. 61 images left.
    [11/06 01:35:39 d2.data.build]: Distribution of instances among all 1 categories:
    |  category  | #instances   |
    |:----------:|:-------------|
    |  balloon   | 255          |
    |            |              |
    [11/06 01:35:39 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in training: [ResizeShortestEdge(short_edge_length=(640, 672, 704, 736, 768, 800), max_size=1333, sample_style='choice'), RandomFlip()]
    [11/06 01:35:39 d2.data.build]: Using training sampler TrainingSampler
    [11/06 01:35:39 d2.data.common]: Serializing 61 elements to byte tensors and concatenating them all ...
    [11/06 01:35:39 d2.data.common]: Serialized dataset takes 0.17 MiB
    Skip loading parameter 'roi_heads.box_predictor.cls_score.weight' to the model due to incompatible shapes: (81, 1024) in the checkpoint but (2, 1024) in the model! You might want to double check if this is expected.
    Skip loading parameter 'roi_heads.box_predictor.cls_score.bias' to the model due to incompatible shapes: (81,) in the checkpoint but (2,) in the model! You might want to double check if this is expected.
    Skip loading parameter 'roi_heads.box_predictor.bbox_pred.weight' to the model due to incompatible shapes: (320, 1024) in the checkpoint but (4, 1024) in the model! You might want to double check if this is expected.
    Skip loading parameter 'roi_heads.box_predictor.bbox_pred.bias' to the model due to incompatible shapes: (320,) in the checkpoint but (4,) in the model! You might want to double check if this is expected.
    Skip loading parameter 'roi_heads.mask_head.predictor.weight' to the model due to incompatible shapes: (80, 256, 1, 1) in the checkpoint but (1, 256, 1, 1) in the model! You might want to double check if this is expected.
    Skip loading parameter 'roi_heads.mask_head.predictor.bias' to the model due to incompatible shapes: (80,) in the checkpoint but (1,) in the model! You might want to double check if this is expected.
    [11/06 01:35:44 d2.engine.train_loop]: Starting training from iteration 0
    [11/06 01:35:53 d2.utils.events]:  eta: 0:02:04  iter: 19  total_loss: 2.046  loss_cls: 0.6769  loss_box_reg: 0.5966  loss_mask: 0.685  loss_rpn_cls: 0.02908  loss_rpn_loc: 0.008902  time: 0.4490  data_time: 0.0259  lr: 4.9953e-06  max_mem: 2724M
    [11/06 01:36:02 d2.utils.events]:  eta: 0:01:53  iter: 39  total_loss: 2.083  loss_cls: 0.6601  loss_box_reg: 0.7223  loss_mask: 0.6551  loss_rpn_cls: 0.02151  loss_rpn_loc: 0.007349  time: 0.4379  data_time: 0.0069  lr: 9.9902e-06  max_mem: 2724M
    [11/06 01:36:11 d2.utils.events]:  eta: 0:01:46  iter: 59  total_loss: 1.829  loss_cls: 0.5824  loss_box_reg: 0.5576  loss_mask: 0.6056  loss_rpn_cls: 0.03342  loss_rpn_loc: 0.007773  time: 0.4432  data_time: 0.0068  lr: 1.4985e-05  max_mem: 2724M
    [11/06 01:36:19 d2.utils.events]:  eta: 0:01:36  iter: 79  total_loss: 1.664  loss_cls: 0.495  loss_box_reg: 0.612  loss_mask: 0.5251  loss_rpn_cls: 0.0398  loss_rpn_loc: 0.008393  time: 0.4408  data_time: 0.0078  lr: 1.998e-05  max_mem: 2724M
    [11/06 01:36:28 d2.utils.events]:  eta: 0:01:28  iter: 99  total_loss: 1.653  loss_cls: 0.4368  loss_box_reg: 0.6527  loss_mask: 0.4676  loss_rpn_cls: 0.02758  loss_rpn_loc: 0.005817  time: 0.4405  data_time: 0.0076  lr: 2.4975e-05  max_mem: 2724M
    [11/06 01:36:37 d2.utils.events]:  eta: 0:01:19  iter: 119  total_loss: 1.604  loss_cls: 0.4126  loss_box_reg: 0.7145  loss_mask: 0.4098  loss_rpn_cls: 0.0417  loss_rpn_loc: 0.00942  time: 0.4400  data_time: 0.0060  lr: 2.997e-05  max_mem: 2724M
    [11/06 01:36:46 d2.utils.events]:  eta: 0:01:10  iter: 139  total_loss: 1.46  loss_cls: 0.3711  loss_box_reg: 0.6621  loss_mask: 0.3871  loss_rpn_cls: 0.02824  loss_rpn_loc: 0.01492  time: 0.4409  data_time: 0.0072  lr: 3.4965e-05  max_mem: 2724M
    [11/06 01:36:55 d2.utils.events]:  eta: 0:01:02  iter: 159  total_loss: 1.283  loss_cls: 0.2932  loss_box_reg: 0.6513  loss_mask: 0.3031  loss_rpn_cls: 0.01643  loss_rpn_loc: 0.003501  time: 0.4432  data_time: 0.0076  lr: 3.996e-05  max_mem: 2724M
    [11/06 01:37:04 d2.utils.events]:  eta: 0:00:53  iter: 179  total_loss: 1.307  loss_cls: 0.29  loss_box_reg: 0.724  loss_mask: 0.2765  loss_rpn_cls: 0.01487  loss_rpn_loc: 0.01075  time: 0.4436  data_time: 0.0080  lr: 4.4955e-05  max_mem: 2724M
    [11/06 01:37:13 d2.utils.events]:  eta: 0:00:44  iter: 199  total_loss: 1.181  loss_cls: 0.2553  loss_box_reg: 0.6373  loss_mask: 0.2437  loss_rpn_cls: 0.02367  loss_rpn_loc: 0.008626  time: 0.4436  data_time: 0.0078  lr: 4.995e-05  max_mem: 2724M
    [11/06 01:37:22 d2.utils.events]:  eta: 0:00:35  iter: 219  total_loss: 1.048  loss_cls: 0.213  loss_box_reg: 0.625  loss_mask: 0.2106  loss_rpn_cls: 0.02227  loss_rpn_loc: 0.005251  time: 0.4452  data_time: 0.0057  lr: 5.4945e-05  max_mem: 2724M
    [11/06 01:37:31 d2.utils.events]:  eta: 0:00:26  iter: 239  total_loss: 1.049  loss_cls: 0.2045  loss_box_reg: 0.6159  loss_mask: 0.184  loss_rpn_cls: 0.01542  loss_rpn_loc: 0.008343  time: 0.4462  data_time: 0.0071  lr: 5.994e-05  max_mem: 2832M
    [11/06 01:37:41 d2.utils.events]:  eta: 0:00:17  iter: 259  total_loss: 0.9736  loss_cls: 0.1754  loss_box_reg: 0.5704  loss_mask: 0.162  loss_rpn_cls: 0.01074  loss_rpn_loc: 0.006589  time: 0.4480  data_time: 0.0069  lr: 6.4935e-05  max_mem: 2832M
    [11/06 01:37:50 d2.utils.events]:  eta: 0:00:08  iter: 279  total_loss: 0.8728  loss_cls: 0.151  loss_box_reg: 0.5358  loss_mask: 0.1624  loss_rpn_cls: 0.01978  loss_rpn_loc: 0.009639  time: 0.4488  data_time: 0.0075  lr: 6.993e-05  max_mem: 2832M
    [11/06 01:38:00 d2.utils.events]:  eta: 0:00:00  iter: 299  total_loss: 0.7729  loss_cls: 0.1192  loss_box_reg: 0.4951  loss_mask: 0.1248  loss_rpn_cls: 0.01628  loss_rpn_loc: 0.003562  time: 0.4498  data_time: 0.0080  lr: 7.4925e-05  max_mem: 2832M
    [11/06 01:38:01 d2.engine.hooks]: Overall training speed: 298 iterations in 0:02:14 (0.4499 s / it)
    [11/06 01:38:01 d2.engine.hooks]: Total training time: 0:02:16 (0:00:02 on hooks)
    
    
    
    # Look at training curves in tensorboard:
    %load_ext tensorboard
    %tensorboard --logdir output
    

    Inference & evaluation using the trained model

    Now, let's run inference with the trained model on the balloon validation dataset. First, let's create a predictor using the model we just trained:

    # Inference should use the config with parameters that are used in training
    # cfg now already contains everything we've set previously. We changed it a little bit for inference:
    cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")  # path to the model we just trained
    cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set a custom testing threshold
    predictor = DefaultPredictor(cfg)
    

    Then, we randomly select several samples to visualize the prediction results.

    from detectron2.utils.visualizer import ColorMode
    dataset_dicts = get_balloon_dicts("balloon/val")
    for d in random.sample(dataset_dicts, 3):    
        im = cv2.imread(d["file_name"])
        outputs = predictor(im)  # format is documented at https://detectron2.readthedocs.io/tutorials/models.html#model-output-format
        v = Visualizer(im[:, :, ::-1],
                       metadata=balloon_metadata, 
                       scale=0.5, 
                       instance_mode=ColorMode.IMAGE_BW   # remove the colors of unsegmented pixels. This option is only available for segmentation models
        )
        out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
        cv2_imshow(out.get_image()[:, :, ::-1])
        ```
    We can also evaluate its performance using AP metric implemented in COCO API.
    This gives an AP of ~70. Not bad!
    
    from detectron2.evaluation import COCOEvaluator, inference_on_dataset
    from detectron2.data import build_detection_test_loader
    evaluator = COCOEvaluator("balloon_val", ("bbox", "segm"), False, output_dir="./output/")
    val_loader = build_detection_test_loader(cfg, "balloon_val")
    print(inference_on_dataset(trainer.model, val_loader, evaluator))
    # another equivalent way to evaluate the model is to use `trainer.test`
    
    
    [07/08 22:50:43 d2.evaluation.coco_evaluation]: 'balloon_val' is not registered by `register_coco_instances`. Therefore trying to convert it to COCO format ...
    [07/08 22:50:43 d2.data.datasets.coco]: Converting annotations of dataset 'balloon_val' to COCO format ...)
    [07/08 22:50:43 d2.data.datasets.coco]: Converting dataset dicts into COCO format
    [07/08 22:50:43 d2.data.datasets.coco]: Conversion finished, #images: 13, #annotations: 50
    [07/08 22:50:43 d2.data.datasets.coco]: Caching COCO format annotations at './output/balloon_val_coco_format.json' ...
    [07/08 22:50:44 d2.data.build]: Distribution of instances among all 1 categories:
    |  category  | #instances   |
    |:----------:|:-------------|
    |  balloon   | 50           |
    |            |              |
    [07/08 22:50:44 d2.data.common]: Serializing 13 elements to byte tensors and concatenating them all ...
    [07/08 22:50:44 d2.data.common]: Serialized dataset takes 0.04 MiB
    [07/08 22:50:44 d2.data.dataset_mapper]: Augmentations used in training: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')]
    [07/08 22:50:44 d2.evaluation.evaluator]: Start inference on 13 images
    [07/08 22:50:52 d2.evaluation.evaluator]: Inference done 11/13. 0.1931 s / img. ETA=0:00:00
    [07/08 22:50:53 d2.evaluation.evaluator]: Total inference time: 0:00:02.607042 (0.325880 s / img per device, on 1 devices)
    [07/08 22:50:53 d2.evaluation.evaluator]: Total inference pure compute time: 0:00:01 (0.186243 s / img per device, on 1 devices)
    [07/08 22:50:53 d2.evaluation.coco_evaluation]: Preparing results for COCO format ...
    [07/08 22:50:53 d2.evaluation.coco_evaluation]: Saving results to ./output/coco_instances_results.json
    [07/08 22:50:53 d2.evaluation.coco_evaluation]: Evaluating predictions ...
    Loading and preparing results...
    DONE (t=0.00s)
    creating index...
    index created!
    Running per image evaluation...
    Evaluate annotation type *bbox*
    COCOeval_opt.evaluate() finished in 0.01 seconds.
    Accumulating evaluation results...
    COCOeval_opt.accumulate() finished in 0.00 seconds.
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.668
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.847
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.797
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.239
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.549
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.795
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.222
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.704
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.766
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.567
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.659
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.847
    [07/08 22:50:53 d2.evaluation.coco_evaluation]: Evaluation results for bbox: 
    |   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
    |:------:|:------:|:------:|:------:|:------:|:------:|
    | 66.758 | 84.719 | 79.685 | 23.917 | 54.933 | 79.514 |
    Loading and preparing results...
    DONE (t=0.01s)
    creating index...
    index created!
    Running per image evaluation...
    Evaluate annotation type *segm*
    COCOeval_opt.evaluate() finished in 0.01 seconds.
    Accumulating evaluation results...
    COCOeval_opt.accumulate() finished in 0.00 seconds.
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.768
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.842
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.840
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.058
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.565
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.936
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.248
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.782
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.842
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.600
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.688
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.953
    [07/08 22:50:53 d2.evaluation.coco_evaluation]: Evaluation results for segm: 
    |   AP   |  AP50  |  AP75  |  APs  |  APm   |  APl   |
    |:------:|:------:|:------:|:-----:|:------:|:------:|
    | 76.799 | 84.203 | 83.958 | 5.838 | 56.506 | 93.572 |
    OrderedDict([('bbox',
                  {'AP': 66.7575984802854,
                   'AP50': 84.71906024215401,
                   'AP75': 79.6850976022887,
                   'APl': 79.51426848548515,
                   'APm': 54.933394319629045,
                   'APs': 23.917443214909724}),
                 ('segm',
                  {'AP': 76.79883944079043,
                   'AP50': 84.20295316611471,
                   'AP75': 83.95779282808186,
                   'APl': 93.57150630750836,
                   'APm': 56.50588544163433,
                   'APs': 5.8381956414264895})])
    

    Other types of builtin models

    # Inference with a keypoint detection model
    cfg = get_cfg()   # get a fresh new config
    cfg.merge_from_file(model_zoo.get_config_file("COCO-Keypoints/keypoint_rcnn_R_50_FPN_3x.yaml"))
    cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7  # set threshold for this model
    cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Keypoints/keypoint_rcnn_R_50_FPN_3x.yaml")
    predictor = DefaultPredictor(cfg)
    outputs = predictor(im)
    v = Visualizer(im[:,:,::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2)
    out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
    cv2_imshow(out.get_image()[:, :, ::-1])
    
    # Inference with a panoptic segmentation model
    
    cfg = get_cfg()
    cfg.merge_from_file(model_zoo.get_config_file("COCO-PanopticSegmentation/panoptic_fpn_R_101_3x.yaml"))
    cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-PanopticSegmentation/panoptic_fpn_R_101_3x.yaml")
    predictor = DefaultPredictor(cfg)
    panoptic_seg, segments_info = predictor(im)["panoptic_seg"]
    v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2)
    out = v.draw_panoptic_seg_predictions(panoptic_seg.to("cpu"), segments_info)
    cv2_imshow(out.get_image()[:, :, ::-1])
    

    Run panoptic segmentation on a video

    # This is the video we're going to process
    from IPython.display import YouTubeVideo, display
    video = YouTubeVideo("ll8TgCZ0plk", width=500)
    display(video)
    
    # Install dependencies, download the video, and crop 5 seconds for processing
    !pip install youtube-dl
    !pip uninstall -y opencv-python-headless opencv-contrib-python
    !apt install python3-opencv  # the one pre-installed have some issues
    !youtube-dl https://www.youtube.com/watch?v=ll8TgCZ0plk -f 22 -o video.mp4
    !ffmpeg -i video.mp4 -t 00:00:06 -c:v copy video-clip.mp4
    
    # Run frame-by-frame inference demo on this video (takes 3-4 minutes) with the "demo.py" tool we provided in the repo.
    !git clone https://github.com/facebookresearch/detectron2
    !python detectron2/demo/demo.py --config-file detectron2/configs/COCO-PanopticSegmentation/panoptic_fpn_R_101_3x.yaml --video-input video-clip.mp4 --confidence-threshold 0.6 --output video-output.mkv 
      --opts MODEL.WEIGHTS detectron2://COCO-PanopticSegmentation/panoptic_fpn_R_101_3x/139514519/model_final_cafdb1.pkl
    
    
    # Download the results
    from google.colab import files
    files.download('video-output.mkv')
    
  • 相关阅读:
    JZOJ 4.1 B组 删数
    JZOJ 4.1 B组 无限序列
    JZOJ 4.1 C组 【GDOI2005】电路稳定性
    JZOJ 4.1 C组【GDOI2005】积木分发
    SSL 1614——医院设置[最短路]
    SSL 1761——城市问题[最短路]
    SSL 1760——商店选址问题(最短路)
    SSL 1613——最短路径问题(最短路)
    JZOJ 3.25 1422——【汕头市选2012初中组】步行(walk)
    JZOJ 3.25 1421【汕头市选2012初中组】数数(count)
  • 原文地址:https://www.cnblogs.com/xuehuiping/p/14110160.html
Copyright © 2020-2023  润新知