• 最新超简单解读torchvision


    torchvision

    https://pytorch.org/docs/stable/torchvision/index.html#module-torchvision

    The torchvision package consists of popular datasets(数据集), model architectures(模型结构), and common image transformations(通用图像转换) for computer vision.

    torchvision.get_image_backend():Gets the name of the package used to load images

    torchvision.set_image_backend(backend): Specifies the package used to load images.

    torchvision.set_video_backend(backend): Specifies the package used to decode videos.

    MNIST;Fashion-MNIST;KMNIST;EMNIST;QMNIST;FakeData;COCO;LSUN;ImageFolder;DatasetFolder;ImageNet;CIFAR;STL10;SVHN;PhotoTour;SBU;Flickr;VOC;Cityscapes;SBD;USPS;Kinetics-400;HMDB51;UCF101.

    Video

    torchvision.io.read_video(filenamestart_pts=0end_pts=Nonepts_unit='pts')

    Reads a video from a file, returning both the video frames as well as the audio frames.

    Classification

    The models subpackage contains definitions for the following model architectures for image classification:

    AlexNet

    VGG

    ResNet

    SqueezeNet

    DenseNet

    Inception v3

    GoogLeNet

    ShuffleNet v2

    MobileNet v2

    ResNeXt

    Wide ResNet

    MNASNet

    You can construct a model with random weights by calling its constructor:

    import torchvision.models as models

    resnet18 = models.resnet18()

    alexnet = models.alexnet()

    vgg16 = models.vgg16()

    squeezenet = models.squeezenet1_0()

    densenet = models.densenet161()

    inception = models.inception_v3()

    googlenet = models.googlenet()

    shufflenet = models.shufflenet_v2_x1_0()

    mobilenet = models.mobilenet_v2()

    resnext50_32x4d = models.resnext50_32x4d()

    wide_resnet50_2 = models.wide_resnet50_2()

    mnasnet = models.mnasnet1_0()

    pre-trained models, using the PyTorch torch.utils.model_zoo. These can be constructed by passing pretrained=True:

    import torchvision.models as models

    resnet18 = models.resnet18(pretrained=True)

    alexnet = models.alexnet(pretrained=True)

    squeezenet = models.squeezenet1_0(pretrained=True)

    vgg16 = models.vgg16(pretrained=True)

    densenet = models.densenet161(pretrained=True)

    inception = models.inception_v3(pretrained=True)

    googlenet = models.googlenet(pretrained=True)

    shufflenet = models.shufflenet_v2_x1_0(pretrained=True)

    mobilenet = models.mobilenet_v2(pretrained=True)

    resnext50_32x4d = models.resnext50_32x4d(pretrained=True)

    wide_resnet50_2 = models.wide_resnet50_2(pretrained=True)

    mnasnet = models.mnasnet1_0(pretrained=True)

    Instancing a pre-trained model will download its weights to a cache directory. This directory can be set using the TORCH_MODEL_ZOO environment variable. See torch.utils.model_zoo.load_url() for details.

    Some models use modules which have different training and evaluation behavior, such as batch normalization. To switch between these modes, use model.train() or model.eval() as appropriate. See train() or eval() for details.

    All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. You can use the following transform to normalize:

    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],

                                     std=[0.229, 0.224, 0.225])

     

    Semantic Segmentation

    The models subpackage contains definitions for the following model architectures for semantic segmentation:

    FCN ResNet101

    DeepLabV3 ResNet101

    As with image classification models, all pre-trained models expect input images normalized in the same way. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. They have been trained on images resized such that their minimum size is 520.

    The pre-trained models have been trained on a subset of COCO train2017, on the 20 categories that are present in the Pascal VOC dataset. You can see more information on how the subset has been selected in references/segmentation/coco_utils.py. The classes that the pre-trained model outputs are the following, in order:

    ['__background__', 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus',

     'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike',

     'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']

     

    Object Detection, Instance Segmentation and Person Keypoint Detection

    The models subpackage contains definitions for the following model architectures for detection:

    Faster R-CNN ResNet-50 FPN

    Mask R-CNN ResNet-50 FPN

    The pre-trained models for detection, instance segmentation and keypoint detection are initialized with the classification models in torchvision.

    The models expect a list of Tensor[C, H, W], in the range 0-1. The models internally resize the images so that they have a minimum size of 800. This option can be changed by passing the option min_size to the constructor of the models.

    For object detection and instance segmentation, the pre-trained models return the predictions of the following classes:

    COCO_INSTANCE_CATEGORY_NAMES = [

        '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',

        'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',

        'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',

        'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',

        'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',

        'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',

        'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',

        'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',

        'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',

        'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',

        'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',

        'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'

    ]

     

    For person keypoint detection, the pre-trained model return the keypoints in the following order:

    COCO_PERSON_KEYPOINT_NAMES = [

        'nose',

        'left_eye',

        'right_eye',

        'left_ear',

        'right_ear',

        'left_shoulder',

        'right_shoulder',

        'left_elbow',

        'right_elbow',

        'left_wrist',

        'right_wrist',

        'left_hip',

        'right_hip',

        'left_knee',

        'right_knee',

        'left_ankle',

        'right_ankle'

    ]

     

    Video classification

    We provide models for action recognition pre-trained on Kinetics-400. They have all been trained with the scripts provided in references/video_classification.

    All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB videos of shape (3 x T x H x W), where H and W are expected to be 112, and T is a number of video frames in a clip. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.43216, 0.394666, 0.37645] and std = [0.22803, 0.22145, 0.216989].

    NOTE

    The normalization parameters are different from the image classification ones, and correspond to the mean and std from Kinetics-400.

    NOTE

    For now, normalization code can be found in references/video_classification/transforms.py, see the Normalizefunction there. Note that it differs from standard normalization for images because it assumes the video is 4d.

    Kinetics 1-crop accuracies for clip length 16 (16x112x112)

    Network

    Clip acc@1

    Clip acc@5

    ResNet 3D 18

    52.75

    75.45

    ResNet MC 18

    53.90

    76.29

    ResNet (2+1)D

    57.50

    78.81

     

    torchvision.ops implements operators that are specific for Computer Vision.

    支持:

    torchvision.ops.nms(boxesscoresiou_threshold):Performs non-maximum suppression (NMS) on the boxes according to their intersection-over-union (IoU).

    torchvision.ops.roi_align(inputboxesoutput_sizespatial_scale=1.0sampling_ratio=-1): Performs Region of Interest (RoI) Align operator described in Mask R-CNN

    torchvision.ops.roi_pool(inputboxesoutput_sizespatial_scale=1.0): Performs Region of Interest (RoI) Pool operator described in Fast R-CNN

    torchvision.utils.make_grid(tensornrow=8padding=2normalize=Falserange=Nonescale_each=Falsepad_value=0), Make a grid of images.

    torchvision.utils.save_image(tensorfpnrow=8padding=2normalize=Falserange=Nonescale_each=Falsepad_value=0format=None), Save a given Tensor into an image file.

  • 相关阅读:
    流程控制之if判断
    各种运算符
    输入和输出
    垃圾回收机制(详细)
    3/5 作业
    3/4 作业
    数据类型
    变量
    Checkout 显示 URL /../../.. 不存在
    Tomcat8 访问 manager App 失败
  • 原文地址:https://www.cnblogs.com/jeshy/p/12048463.html
Copyright © 2020-2023  润新知