一、安装
https://github.com/open-mmlab/mmdetection/blob/master/docs/INSTALL.md
二、训练自己的数据
1、数据
mmdet的默认格式是coco的,这里就以voc格式为例,data下文件夹摆放位置如图
2、训练
(1)修改configs文件下的文件
可先复制一份,然后自己命名一下。比如retinanet_x101_64x4d_fpn_1x.py,修改的部分主要是dataset settings部分,这部分可直接参考
pascal_voc/faster_rcnn_r50_fpn_1x_voc0712.py(如下);还有一部分是修改该文件下的num_classes(类别数+1)
# dataset settings dataset_type = 'VOCDataset' data_root = 'data/VOCdevkit/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1000, 600), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1000, 600), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']), ]) ] data = dict( imgs_per_gpu=2, workers_per_gpu=2, train=dict( type='RepeatDataset', times=3, dataset=dict( type=dataset_type, ann_file=[ data_root + 'VOC2007/ImageSets/Main/trainval.txt' ], img_prefix=[data_root + 'VOC2007/'], pipeline=train_pipeline)), val=dict( type=dataset_type, ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt', img_prefix=data_root + 'VOC2007/', pipeline=test_pipeline), test=dict( type=dataset_type, ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt', img_prefix=data_root + 'VOC2007/', pipeline=test_pipeline)) evaluation = dict(interval=1, metric='mAP') # optimizer optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) # learning policy lr_config = dict(policy='step', step=[3]) # actual epoch = 3 * 3 = 9 checkpoint_config = dict(interval=1) # yapf:disable log_config = dict( interval=50, hooks=[ dict(type='TextLoggerHook'), # dict(type='TensorboardLoggerHook') ]) # yapf:enable # runtime settings total_epochs = 100 # actual epoch = 4 * 3 = 12 dist_params = dict(backend='nccl') log_level = 'INFO' work_dir = './work_dirs/faster_rcnn_r50_fpn_1x_voc0712' load_from = None resume_from = None workflow = [('train', 1)]
(2)修改mmdet/datasets/voc.py下classes为自己的类
(3)训练
python tools/train.py configs/pascal_voc/faster_rcnn_r50_fpn_1x_voc0712_my.py
3、测试
(1)输出mAP
修改mmdetection/mmdet/core/evaluation voc_classes()返回自己的类
python3 tools/test.py configs/pascal_voc/faster_rcnn_r50_fpn_1x_voc0712_my.py work_dirs/faster_rcnn_r50_fpn_1x_voc0712/latest.pth --eval mAP --show
(2) 测试单张图片
参考 demo/webcam_demo.py,
python demo/img_demo.py configs/pascal_voc/faster_rcnn_r50_fpn_1x_voc0712_my.py work_dirs/faster_rcnn_r50_fpn_1x_voc0712/latest.pth demo/2017-09-05-161908.jpg
import argparse import torch from mmdet.apis import inference_detector, init_detector, show_result def parse_args(): parser = argparse.ArgumentParser(description='MMDetection image demo') parser.add_argument('config', help='test config file path') parser.add_argument('checkpoint', help='checkpoint file') parser.add_argument('imagepath', help='the path of image to test') parser.add_argument('--device', type=int, default=0, help='CUDA device id') parser.add_argument( '--score-thr', type=float, default=0.5, help='bbox score threshold') args = parser.parse_args() return args def main(): args = parse_args() model = init_detector( args.config, args.checkpoint, device=torch.device('cuda', args.device)) result = inference_detector(model, args.imagepath) show_result( args.imagepath, result, model.CLASSES, score_thr=args.score_thr, wait_time=0) if __name__ == '__main__': main()
参考
https://zhuanlan.zhihu.com/p/101202864
https://blog.csdn.net/laizi_laizi/article/details/104256781