• [dpdk] 读官方文档(2)


    续前节。切好继续:

    一,文档里提到uio_pci_generic, igb_uio, vfio_pci三个内核模块,完全搞不懂,以及dpdk-devbind.py用来查看网卡状态,我得到了下边的输出:

    [root@dpdk tools]# ./dpdk-devbind.py --status
    Network devices using DPDK-compatible driver
    ============================================
    <none>
    Network devices using kernel driver
    ===================================
    0000:00:03.0 'Virtio network device' if= drv=virtio-pci unused= 
    Other network devices
    =====================
    <none>
    [root@dpdk tools]# 

    所以,首先需要学习一下qemu的网卡设置,调一调硬件再回来~~(我悲催的去man qemu了。。。)

    此前,对于qemu的网络,我只有一种用法,外边一个tap,里边一个virtio。

    man完,回来鸟,guest的硬件使用”-net nic model=xxx“可以模拟。但是如何passthough还不知道。

    1 在前端驱动使用virtio的情况下,如何让后端使用vhost-user

    突然意识到其实这个事情如此复杂,于是我觉得另起一文。move to  ” [qemu] 在前端驱动使用virtio的情况下,如何让后端使用vhost-user”

    2. 设备直接访问,PCI passthrough

    http://blog.csdn.net/qq123386926/article/details/47757089

    http://blog.csdn.net/halcyonbaby/article/details/37776211

    http://blog.csdn.net/richardysteven/article/details/9008971

    两种方法,pci-stub / VFIO ,我只使用较新的VFIO。我准备把我的物理网口交给虚拟机直接访问。

    1. 确保CPU支持 vt-d,并且bois中已经打开。

    我的CPU是支持地:http://ark.intel.com/products/85214/Intel-Core-i7-5500U-Processor-4M-Cache-up-to-3_00-GHz

    2. 修改grub在内核启动 intel_iommu=on(这里有个坑,请继续阅读后边另起一 ”“ 讲了这个坑)

    [tong@T7 dpdk]$ zcat /proc/config.gz  |grep -i intel_iommu
    CONFIG_INTEL_IOMMU=y
    CONFIG_INTEL_IOMMU_SVM=y
    # CONFIG_INTEL_IOMMU_DEFAULT_ON is not set
    CONFIG_INTEL_IOMMU_FLOPPY_WA=y
    [tong@T7 dpdk]$ 

    3. 加载 vfio-pci 驱动至内核。

    [tong@T7 dpdk]$ sudo modprobe vfio-pci
    [tong@T7 dpdk]$ lsmod |grep vfio
    vfio_pci               36864  0
    vfio_iommu_type1       20480  0
    vfio_virqfd            16384  1 vfio_pci
    vfio                   24576  2 vfio_iommu_type1,vfio_pci
    irqbypass              16384  2 kvm,vfio_pci
    [tong@T7 dpdk]$ 

    4. 查看网卡信息

    [root@T7 0000:00:19.0]# lspci -vv -nn -d 8086:15a3
    00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection (3) I218-V [8086:15a3] (rev 03)
            Subsystem: Lenovo Device [17aa:2227]
            Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
            Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
            Interrupt: pin A routed to IRQ 20
            Region 0: Memory at f2200000 (32-bit, non-prefetchable) [size=128K]
            Region 1: Memory at f223e000 (32-bit, non-prefetchable) [size=4K]
            Region 2: I/O ports at 4080 [size=32]
            Capabilities: [c8] Power Management version 2
                    Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                    Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
            Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
                    Address: 0000000000000000  Data: 0000
            Capabilities: [e0] PCI Advanced Features
                    AFCap: TP+ FLR+
                    AFCtrl: FLR-
                    AFStatus: TP-
            Kernel modules: e1000e

     5. bind / unbind

    [root@T7 0000:00:19.0]# echo "0000:00:19.0" > /sys/bus/pci/devices/0000:00:19.0/driver/unbind 
    [root@T7 0000:00:19.0]# echo "8086 15a3" > /sys/bus/pci/drivers/vfio-pci/new_id  

    *** 问题来了,根据文档描述,已经发现些许不对,我并没有iommu_group, 那是神马鬼。。。***

    [tong@T7 dpdk]$ ls /dev/vfio/
    vfio
    [tong@T7 dpdk]$ dmesg |grep vfio
    [20355.407062] vfio-pci: probe of 0000:00:19.0 failed with error -22
    [20593.172116] vfio-pci: probe of 0000:00:19.0 failed with error -22
    [20684.750370] vfio-pci: probe of 0000:00:19.0 failed with error -22
    [tong@T7 dpdk]$ 

    我如下启动,然后报错:

    [tong@T7 dpdk]$ cat start.sh
    sudo qemu-system-x86_64 -enable-kvm 
            -m 2G -cpu Nehalem -smp cores=2,threads=2,sockets=2 
            -numa node,mem=1G,cpus=0-3,nodeid=0 
            -numa node,mem=1G,cpus=4-7,nodeid=1 
            -drive file=disk.img,if=virtio 
            -net nic,model=virtio,macaddr='00:00:00:00:00:03' 
            -device vfio-pci,host='0000:00:19.0' 
            -net tap,ifname=tap0 &
    [tong@T7 dpdk]$ ./start.sh
    [tong@T7 dpdk]$ qemu-system-x86_64: -device vfio-pci,host=0000:00:19.0: vfio: error no iommu_group for device
    qemu-system-x86_64: -device vfio-pci,host=0000:00:19.0: Device initialization failed

    问题解答:

    为了解答这个问题,我读了内核文档,以及又读了IBM的这篇特别好的文。终于理解了iommu group到底是什么,然而并没有找到答案。

    https://www.kernel.org/doc/Documentation/vfio.txt

    https://www.ibm.com/developerworks/community/blogs/5144904d-5d75-45ed-9d2b-cf1754ee936a/entry/vfio?lang=en

    那么为什么没有iommu_group呢? 因为我愚蠢啊!并没有如(2)所说在grub上加入内核参数 intel_iommu=on 。为什么我没加呢? 因为我已经zcat /proc/config.gz里边写着是y就是启动了的意思。然后等我加好这个参数之后,再zcat /proc/config.gz。两次竟然是一样的。嗯,原来我根本就把这个文件的功能理解错了。我猜它只是代表内核编译时的选项状态。与运行状态根本就是无关的!

    于是,改完参数,系统刚刚启动开的时候,是酱紫的,就代表生效了:

    [tong@T7 ~]$ ll /sys/bus/pci/devices/0000:00:19.0/ |grep io
    lrwxrwxrwx 1 root root      0 Sep 27 23:44 iommu -> ../../virtual/iommu/dmar1
    lrwxrwxrwx 1 root root      0 Sep 27 23:44 iommu_group -> ../../../kernel/iommu_groups/5
    [tong@T7 ~]$ 

    然后出栈这个问题,回到 unbind / bind 继续,我要passthrough给虚拟机的是物理网卡 lan0 :

    unbind前网络灯亮,状态信息:

    [tong@T7 ~]$ lspci -vv -nn -s 00:19.0
    00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection (3) I218-V [8086:15a3] (rev 03)
            Subsystem: Lenovo Device [17aa:2227]
            Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
            Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
            Latency: 0
            Interrupt: pin A routed to IRQ 46
            Region 0: Memory at f2200000 (32-bit, non-prefetchable) [size=128K]
            Region 1: Memory at f223e000 (32-bit, non-prefetchable) [size=4K]
            Region 2: I/O ports at 4080 [size=32]
            Capabilities: <access denied>
            Kernel driver in use: e1000e
            Kernel modules: e1000e
    
    [tong@T7 ~]$ sudo ip link show dev lan0
    2: lan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
        link/ether 50:7b:9d:5c:1e:9b brd ff:ff:ff:ff:ff:ff
    [tong@T7 ~]$ ll /sys/bus/pci/devices/0000:00:19.0/ |grep driver
    lrwxrwxrwx 1 root root      0 Sep 27 23:42 driver -> ../../../bus/pci/drivers/e1000e
    -rw-r--r-- 1 root root   4096 Sep 27 23:44 driver_override
    [tong@T7 ~]$ 

    unbind:(I don't know why ? maybe someday someone could tell me, if you see code belowj.但这并不重要

    [tong@T7 ~]$ sudo echo 0000:00:19.0 > /sys/bus/pci/devices/0000:00:19.0/driver/unbind 
    bash: /sys/bus/pci/devices/0000:00:19.0/driver/unbind: Permission denied
    [tong@T7 ~]$ sudo su -
    [root@T7 ~]# echo 0000:00:19.0 > /sys/bus/pci/devices/0000:00:19.0/driver/unbind
    [root@T7 ~]# 

    unbind成功后,各状态的对比如下: 网卡灯还是亮的

    [root@T7 ~]# lspci -vv -nn -s 00:19.0
    00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection (3) I218-V [8086:15a3] (rev 03)
            Subsystem: Lenovo Device [17aa:2227]
            Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
            Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
            Interrupt: pin A routed to IRQ 20
            Region 0: Memory at f2200000 (32-bit, non-prefetchable) [size=128K]
            Region 1: Memory at f223e000 (32-bit, non-prefetchable) [size=4K]
            Region 2: I/O ports at 4080 [size=32]
            Capabilities: [c8] Power Management version 2
                    Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                    Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
            Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
                    Address: 0000000000000000  Data: 0000
            Capabilities: [e0] PCI Advanced Features
                    AFCap: TP+ FLR+
                    AFCtrl: FLR-
                    AFStatus: TP-
            Kernel modules: e1000e
    
    [root@T7 ~]# ip link show dev lan0
    Device "lan0" does not exist.
    [root@T7 ~]# ip link show
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DORMANT group default qlen 1000
        link/ether dc:53:60:6c:b5:7e brd ff:ff:ff:ff:ff:ff
    4: internal-br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
        link/ether 26:4a:07:a1:4f:06 brd ff:ff:ff:ff:ff:ff
    [root@T7 ~]# ll /sys/bus/pci/devices/0000:00:19.0/ |grep driver
    -rw-r--r-- 1 root root   4096 Sep 27 23:44 driver_override
    [root@T7 ~]# 

    bind to vfio:

    [root@T7 ~]# modprobe vfio_pci
    [root@T7 ~]# lsmod |grep vfio
    vfio_pci               36864  0
    vfio_iommu_type1       20480  0
    vfio_virqfd            16384  1 vfio_pci
    vfio                   24576  2 vfio_iommu_type1,vfio_pci
    irqbypass              16384  2 kvm,vfio_pci
    [root@T7 ~]# echo 8086 15a3 > /sys/bus/pci/drivers/vfio-pci/new_id

    bind成功后,各种状态:

    [root@T7 ~]# ll /sys/bus/pci/devices/0000:00:19.0/iommu_group/devices/
    total 0
    lrwxrwxrwx 1 root root 0 Sep 28 00:09 0000:00:19.0 -> ../../../../devices/pci0000:00/0000:00:19.0
    [root@T7 ~]# ll /dev/vfio/
    total 0
    crw------- 1 root root 242,   0 Sep 28 00:08 5
    crw-rw-rw- 1 root root  10, 196 Sep 28 00:06 vfio
    [root@T7 ~]# ll /sys/bus/pci/devices/0000:00:19.0/iom*
    lrwxrwxrwx 1 root root 0 Sep 27 23:44 /sys/bus/pci/devices/0000:00:19.0/iommu -> ../../virtual/iommu/dmar1
    lrwxrwxrwx 1 root root 0 Sep 27 23:44 /sys/bus/pci/devices/0000:00:19.0/iommu_group -> ../../../kernel/iommu_groups/5
    [root@T7 ~]# dmesg |tail
    ... ...
    [ 1027.806155] e1000e 0000:00:19.0 lan0: removed PHC
    [ 1394.134555] VFIO - User Level meta-driver version: 0.3
    [root@T7 ~]# lspci -vv -nn -s 00:19.0
    00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection (3) I218-V [8086:15a3] (rev 03)
            Subsystem: Lenovo Device [17aa:2227]
            Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
            Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
            Interrupt: pin A routed to IRQ 20
            Region 0: Memory at f2200000 (32-bit, non-prefetchable) [disabled] [size=128K]
            Region 1: Memory at f223e000 (32-bit, non-prefetchable) [disabled] [size=4K]
            Region 2: I/O ports at 4080 [disabled] [size=32]
            Capabilities: [c8] Power Management version 2
                    Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                    Status: D3 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
            Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
                    Address: 0000000000000000  Data: 0000
            Capabilities: [e0] PCI Advanced Features
                    AFCap: TP+ FLR+
                    AFCtrl: FLR-
                    AFStatus: TP-
            Kernel driver in use: vfio-pci
            Kernel modules: e1000e
    
    [root@T7 ~]# ip link
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DORMANT group default qlen 1000
        link/ether dc:53:60:6c:b5:7e brd ff:ff:ff:ff:ff:ff
    4: internal-br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
        link/ether 26:4a:07:a1:4f:06 brd ff:ff:ff:ff:ff:ff
    [root@T7 ~]# 

    6. 启虚拟机测试,进去虚拟机查看,多了一个网卡,该网卡在虚拟机内可以收到交换机上的二层广播,可以dhcp到地址:

    [root@dpdk ~]# lspci -nn
    00:00.0 Host bridge [0600]: Intel Corporation 440FX - 82441FX PMC [Natoma] [8086:1237] (rev 02)
    00:01.0 ISA bridge [0601]: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] [8086:7000]
    00:01.1 IDE interface [0101]: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] [8086:7010]
    00:01.3 Bridge [0680]: Intel Corporation 82371AB/EB/MB PIIX4 ACPI [8086:7113] (rev 03)
    00:02.0 VGA compatible controller [0300]: Device [1234:1111] (rev 02)
    00:03.0 Ethernet controller [0200]: Red Hat, Inc Virtio network device [1af4:1000]
    00:04.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection (3) I218-V [8086:15a3] (rev 03)
    00:05.0 SCSI storage controller [0100]: Red Hat, Inc Virtio block device [1af4:1001]
    [root@dpdk ~]# ip link
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT 
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
        link/ether 00:00:00:00:00:03 brd ff:ff:ff:ff:ff:ff
    3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
        link/ether 50:7b:9d:5c:1e:9b brd ff:ff:ff:ff:ff:ff
    [root@dpdk ~]# tcpdump -i ens4 -nn -c 10
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on ens4, link-type EN10MB (Ethernet), capture size 65535 bytes
    00:17:32.969547 ARP, Request who-has 192.168.197.100 tell 192.168.197.101, length 46
    00:17:33.970617 ARP, Request who-has 192.168.197.100 tell 192.168.197.101, length 46

    7. 是否可以复用??? 我打算再启动一个虚拟机看看。

    [tong@T7 CentOS7]$ ./start.sh 
    [tong@T7 CentOS7]$ qemu-system-x86_64: -device vfio-pci,host=0000:00:19.0: vfio: error opening /dev/vfio/5: Device or resource busy
    qemu-system-x86_64: -device vfio-pci,host=0000:00:19.0: vfio: failed to get group 5
    qemu-system-x86_64: -device vfio-pci,host=0000:00:19.0: Device initialization failed
    ^C
    [tong@T7 CentOS7]$ 

    答案是不能!

    至此,pci网卡使用 vfio 配置passthrough完成!: )

  • 相关阅读:
    Docker+Jenkins持续集成环境(1)使用Docker搭建Jenkins+Docker持续集成环境
    DockOne技术分享(二十):Docker三剑客之Swarm介绍
    最佳实战Docker持续集成图文详解
    Spring Cloud Netflix Eureka源码导读与原理分析
    JVM内存区域的划分(内存结构或者内存模型)
    深入理解JVM(一)——JVM内存模型
    java多线程有哪些实际的应用场景?
    深入理解Java类加载器(1):Java类加载原理解析
    【深入Java虚拟机】之四:类加载机制
    OAuth 2和JWT
  • 原文地址:https://www.cnblogs.com/hugetong/p/5904024.html
Copyright © 2020-2023  润新知