• Rook定制化和管理Ceph集群


    一、Ceph OSD配置

    默认通过 cluster.yaml 创建Ceph集群时,使用的是filestore,并且使用的是 /var/lib/rook/osd-<id> 目录,这明显不是我们通常的使用方式,下面介绍如何配置Ceph OSD使用bluestore和具体磁盘。

    1、使用所有可用磁盘

    如下,若我们配置具体节点上Ceph OSD使用所有可以使用的Devices,并且指定都使用bluestore的方式,则可以类似如下配置:

    ...
    ---
    apiVersion: ceph.rook.io/v1beta1
    kind: Cluster
    metadata:
      name: rook-ceph
      namespace: rook-ceph
    spec:
      cephVersion:
        image: ceph/ceph:v13
        allowUnsupported: false
      dataDirHostPath: /var/lib/rook
      serviceAccount: rook-ceph-cluster
      mon:
        count: 3
        allowMultiplePerNode: true
      dashboard:
        enabled: true
      network:
        hostNetwork: false
      storage: # cluster level storage configuration and selection
        useAllNodes: false
        useAllDevices: true
        deviceFilter:
        location:
        config:
          storeType: bluestore
        nodes:
        - name: "ke-dev1-worker1"
        - name: "ke-dev1-worker3"
        - name: "ke-dev1-worker4"

    2、使用指定磁盘

    若指定具体节点使用的磁盘,storage的部分配置如下:

    storage:
      useAllNodes: false
      useAllDevices: false
      deviceFilter:
      location:
      config:
        storeType: bluestore
      nodes:
      - name: "ke-dev1-worker1"
        devices:
        - name: "vde"
      - name: "ke-dev1-worker3"
        devices:
        - name: "vde"
      - name: "ke-dev1-worker4"
        devices:
        - name: "vdf"

    指定磁盘必须有GPT header!

    不支持指定分区!(查看log,配置分区的信息并没有传递到ceph-osd-prepare这一步)

    二、Ceph集群修改

    在部署完Ceph集群后,若想修改Ceph集群的部署配置,比如增加/删除OSDs等,可以通过下面命令执行:

    # kubectl -n rook-ceph edit cluster rook-ceph
    ...
    spec:
      cephVersion:
        image: ceph/ceph:v13
      dashboard:
        enabled: true
      dataDirHostPath: /var/lib/rook
      mon:
        allowMultiplePerNode: true
        count: 3
      network:
        hostNetwork: false
      serviceAccount: rook-ceph-cluster
      storage:
        config:
          storeType: bluestore
        nodes:
        - config: null
          devices:
          - FullPath: ""
            config: null
            name: vde
          name: ke-dev1-worker1
          resources: {}
        - config: null
          devices:
          - FullPath: ""
            config: null
            name: vde
          name: ke-dev1-worker3
          resources: {}
        - config: null
          devices:
          - FullPath: ""
            config: null
            name: vdf
          name: ke-dev1-worker4
          resources: {}
        useAllDevices: false
    ...

    根据需要修改后,直接保存退出即可;

    遇到的问题

    部署中出现问题后,可以通过下面方法查看log,分析原因:

    rook-ceph-operator
    kubectl describe <pod>

    ceph-mon状态一直不为running

    遇到两种情况下会出现ceph-mon一直能为running的状态:

    /var/lib/rook/
    

    配置osd指定磁盘无效

    若 cluster.yaml 的storage做如下配置时,并不能找到按照配置的设备来部署OSD:

    storage:
      useAllNodes: false
      useAllDevices: false
      deviceFilter:
      location:
      config:
        storeType: bluestore
      nodes:
      - name: "ke-dev1-worker1"
        devices:
        - name: "vde"
      - name: "ke-dev1-worker3"
        devices:
        - name: "vde"
      - name: "ke-dev1-worker4"
        devices:
        - name: "vdf"

    查看 rook-ceph-operator pod的log,发现是识别了配置的 vde/vdf 信息:

    # kubectl -n rook-ceph-systemm log rook-ceph-operator-5dc97f5c79-vq7xs
    ...
    2018-11-29 03:28:30.239119 I | exec: nodeep-scrub is set
    2018-11-29 03:28:30.252166 I | op-osd: 3 of the 3 storage nodes are valid
    2018-11-29 03:28:30.252192 I | op-osd: checking if orchestration is still in progress
    2018-11-29 03:28:30.259012 I | op-osd: start provisioning the osds on nodes, if needed
    2018-11-29 03:28:30.338514 I | op-osd: avail devices for node ke-dev1-worker1: [{Name:vde FullPath: Config:map[]}]
    2018-11-29 03:28:30.354912 I | op-osd: osd provision job started for node ke-dev1-worker1
    2018-11-29 03:28:31.050925 I | op-osd: avail devices for node ke-dev1-worker3: [{Name:vde FullPath: Config:map[]}]
    2018-11-29 03:28:31.071399 I | op-osd: osd provision job started for node ke-dev1-worker3
    2018-11-29 03:28:32.253394 I | op-osd: avail devices for node ke-dev1-worker4: [{Name:vdf FullPath: Config:map[]}]
    2018-11-29 03:28:32.269271 I | op-osd: osd provision job started for node ke-dev1-worker4
    ...

    查看 ceph-osd-prepare job的log:

    # kubectl -n rook-ceph get pods -a -o wide
    NAME                                          READY     STATUS      RESTARTS   AGE       IP                NODE
    rook-ceph-mgr-a-959d64b9d-hfntv               1/1       Running     0          9m        192.168.32.184    ke-dev1-worker1
    rook-ceph-mon-a-b79d8687d-qwcnp               1/1       Running     0          10m       192.168.53.210    ke-dev1-master3
    rook-ceph-mon-b-66b895d57d-prfdp              1/1       Running     0          9m        192.168.32.150    ke-dev1-worker1
    rook-ceph-mon-c-8489c4bc8b-jwm8v              1/1       Running     0          9m        192.168.2.76      ke-dev1-worker3
    rook-ceph-osd-prepare-ke-dev1-worker1-bbm9t   0/2       Completed   0          8m        192.168.32.170    ke-dev1-worker1
    rook-ceph-osd-prepare-ke-dev1-worker3-xg2pc   0/2       Completed   0          8m        192.168.2.122     ke-dev1-worker3
    rook-ceph-osd-prepare-ke-dev1-worker4-mjlg7   0/2       Completed   0          8m        192.168.217.153   ke-dev1-worker4
    
    # kubectl -n rook-ceph log rook-ceph-osd-prepare-ke-dev1-worker1-bbm9t provision
    ...
    2018-11-29 03:28:36.533532 I | exec: Running command: lsblk /dev/vde --bytes --nodeps --pairs --output SIZE,ROTA,RO,TYPE,PKNAME
    2018-11-29 03:28:36.537270 I | exec: Running command: sgdisk --print /dev/vde
    2018-11-29 03:28:36.547839 W | inventory: skipping device vde with an unknown uuid. Failed to complete 'get disk vde uuid': exit status 2. ^GCaution: invalid main GPT header, but valid backup; regenerating main header
    from backup!
    
    Invalid partition data!

    从log里找到了设备vde没有被识别的原因: invalid main GPT header 。

    这个盘是新添加的,并没有创建GPT分区信息,手动给各个盘创建GPT header后,部署OSD正常!

    三、扩展功能

    记录下使用Rook部署Ceph系统的扩展功能需求。

    1、如何配置分区?

    Rook现在不支持配置OSD的devices为分区,代码中检测配置磁盘分区这块有待改善!

    Operator discover检查

    File: pkg/operator/ceph/cluster/osd/osd.go
    
    func (c *Cluster) startProvisioning(config *provisionConfig) {
        config.devicesToUse = make(map[string][]rookalpha.Device, len(c.Storage.Nodes))
    
        // start with nodes currently in the storage spec
        for _, node := range c.Storage.Nodes {
            ...
            availDev, deviceErr := discover.GetAvailableDevices(c.context, n.Name, c.Namespace, n.Devices, n.Selection.DeviceFilter, n.Selection.GetUseAllDevices())
            ...
        }
        ...
    }
    File: pkg/operator/discover/discover.go
    
    // GetAvailableDevices conducts outer join using input filters with free devices that a node has. It marks the devices from join result as in-use.
    func GetAvailableDevices(context *clusterd.Context, nodeName, clusterName string, devices []rookalpha.Device, filter string, useAllDevices bool) ([]rookalpha.Device, error) {
        ...
        // find those on the node
        nodeAllDevices, ok := allDevices[nodeName]
        if !ok {
            return results, fmt.Errorf("node %s has no devices", nodeName)
        }
        // find those in use on the node
        devicesInUse, err := ListDevicesInUse(context, namespace, nodeName)
        if err != nil {
            return results, err
        }
        
        nodeDevices := []sys.LocalDisk{}
        for _, nodeDevice := range nodeAllDevices {
            // TODO: Filter out devices that are in use by another cluster.
            // We need to retain the devices in use for this cluster so the provisioner will continue to configure the same OSDs.
            for _, device := range devicesInUse {
                if nodeDevice.Name == device.Name {
                    break
                }
            }
            nodeDevices = append(nodeDevices, nodeDevice)
        }
        claimedDevices := []sys.LocalDisk{}
        // now those left are free to use
        if len(devices) > 0 {
            for i := range devices {
                for j := range nodeDevices {
                    // 指定devices为分区时
                    // devices[i].Name 为 sdk1,而nodeDevices[j].Name 为 sdk
                    // 所以最后返回到上层的可用Devices为空!!
                    if devices[i].Name == nodeDevices[j].Name {
                        results = append(results, devices[i])
                        claimedDevices = append(claimedDevices, nodeDevices[j])
                    }
                }
            }
        } else if len(filter) >= 0 {
            ...
        } else if useAllDevices {
            ...
        }
        ...
    }

    ListDevices函数返回的disk格式如下:

    {Name:sdk ... Partitions:[{Name:sdk1 Size:4000785964544 Label: Filesystem:}] ...}
    
    // ListDevices lists all devices discovered on all nodes or specific node if node name is provided.
    func ListDevices(context *clusterd.Context, namespace, nodeName string) (map[string][]sys.LocalDisk, error) {
    ...
    }

    OSD Daemon检查

    当磁盘通过了Ceph Operator Discover的相关检查后,会通过参数传递给OSD Prepare Job,如下所示:

    File:rook-ceph-osd-prepare-ceph0-bphlv-ceph0.log
    
    2018-12-04 10:18:51.959163 I | rookcmd: starting Rook v0.8.0-320.g3135b1d with arguments '/rook/rook ceph osd provision'
    2018-12-04 10:18:51.993500 I | rookcmd: flag values: --cluster-id=c6434de9-f7ad-11e8-bec3-6c92bf2db856, --data-device-filter=, --data-devices=sdk,sdl, --data-directories=, --force-format=false, --help=false, --location=, --log-level=INFO, --metadata-device=, --node-name=ceph0, --osd-database-size=20480, --osd-journal-size=5120, --osd-store=bluestore, --osd-wal-size=576
    ...

    上述指定了 --data-devices=sdk,sdl 。

    File: pkg/daemon/ceph/osd/daemon.go
    
    func getAvailableDevices(context *clusterd.Context, desiredDevices string, metadataDevice string, usingDeviceFilter bool) (*DeviceOsdMapping, error) {
        ...
        for _, device := range context.Devices {
            ownPartitions, fs, err := sys.CheckIfDeviceAvailable(context.Executor, device.Name)
            if err != nil {
                return nil, fmt.Errorf("failed to get device %s info. %+v", device.Name, err)
            }
    
            // 从这里看出需要配置的磁盘上fs信息为空,并且没有分区信息!
            if fs != "" || !ownPartitions {
                // not OK to use the device because it has a filesystem or rook doesn't own all its partitions
                logger.Infof("skipping device %s that is in use (not by rook). fs: %s, ownPartitions: %t", device.Name, fs, ownPartitions)
                continue
            }
            ...
        }
        ...
    }

    所以现在通过任何方式无法配置Ceph OSD指定磁盘分区!

    2、如何配置HDD+SSD的BlueStore?

    配置节点OSD使用HDD+SSD的方式,可以修改cluster.yaml如下:

    storage:
      useAllNodes: false
      useAllDevices: false
      location:
      config:
        storeType: bluestore
      nodes:
      ...
      - name: "ke-dev1-worker4"
        devices:
        - name: "vdf"
        - name: "vdg"
        config:
          metadataDevice: "vdh"

    部署中可以通过获取ceph-osd-prepare的log来查看是否配置正确:

    # kubectl -n rook-ceph log rook-ceph-osd-prepare-ke-dev1-worker4-456nj provision
    2018-11-30 03:30:37.118716 I | rookcmd: starting Rook v0.8.0-304.g0a8e109 with arguments '/rook/rook ceph osd provision'
    2018-11-30 03:30:37.124652 I | rookcmd: flag values: --cluster-id=072418f4-f450-11e8-bb3e-fa163e65e579, --data-device-filter=, --data-devices=vdf,vdg, --data-directories=, --force-format=false, --help=false, --location=, --log-level=INFO, --metadata-device=vdh, --node-name=ke-dev1-worker4, --osd-database-size=20480, --osd-journal-size=5120, --osd-store=bluestore, --osd-wal-size=576
    ...

    如上述log,传进来的正确参数应该为:

    • –data-devices=vdf,vdg
    • –metadata-device=vdh

    若要指定SSD提供的wal/db分区的大小,可以加如下配置:

    ...
    - name: "ke-dev1-worker4"
      devices:
      - name: "vdf"
      - name: "vdg"
      config:
        metadataDevice: "vdh"
        databaseSizeMB: "10240"
        WalSizeMB: "10240"

    3、如何自定义ceph.conf?

    默认创建Ceph集群的配置参数在Rook代码里是固定的,在创建 Cluster 的时候生成Ceph集群的配置参数,参考上面章节的:

    如果用户想自定义Ceph集群的配置参数,可以通过修改 rook-config-override 的方法。

    如下是默认的 rook-config-override :

    # kubectl -n rook-ceph get configmap rook-config-override -o yaml
    apiVersion: v1
    data:
      config: ""
    kind: ConfigMap
    metadata:
      creationTimestamp: 2018-12-03T05:34:58Z
      name: rook-config-override
      namespace: rook-ceph
      ownerReferences:
      - apiVersion: v1beta1
        blockOwnerDeletion: true
        kind: Cluster
        name: rook-ceph
        uid: 229e7106-f6bd-11e8-bec3-6c92bf2db856
      resourceVersion: "40803738"
      selfLink: /api/v1/namespaces/rook-ceph/configmaps/rook-config-override
      uid: 2c489850-f6bd-11e8-bec3-6c92bf2db856

    修改已有Ceph集群配置参数

    1、修改 rook-config-override :

    # kubectl -n rook-ceph edit configmap rook-config-override -o yaml
    apiVersion: v1
    data:
      config: |
        [global]
        osd crush update on start = false
        osd pool default size = 2
        [osd]
        bluefs_buffered_io = false
        bluestore_csum_type = none
    kind: ConfigMap
    metadata:
      creationTimestamp: 2018-12-03T05:34:58Z
      name: rook-config-override
      namespace: rook-ceph
      ownerReferences:
      - apiVersion: v1beta1
        blockOwnerDeletion: true
        kind: Cluster
        name: rook-ceph
        uid: 229e7106-f6bd-11e8-bec3-6c92bf2db856
      resourceVersion: "40803738"
      selfLink: /api/v1/namespaces/rook-ceph/configmaps/rook-config-override
      uid: 2c489850-f6bd-11e8-bec3-6c92bf2db856

    2、依次重启ceph组件

    # kubectl -n rook-ceph get pods
    NAME                               READY     STATUS    RESTARTS   AGE
    rook-ceph-mgr-a-5699bb7984-kpxgp   1/1       Running   0          2h
    rook-ceph-mon-a-66854cfb5-m5d9x    1/1       Running   0          15m
    rook-ceph-mon-b-c6f58986f-xpnc4    1/1       Running   0          2h
    rook-ceph-mon-c-97669b7ff-kgdbp    1/1       Running   0          2h
    rook-ceph-osd-0-54bdd844b-wfqk6    1/1       Running   0          2h
    rook-ceph-osd-1-789cdb4c5b-rddhh   1/1       Running   0          2h
    rook-ceph-osd-2-57c8644749-djs98   1/1       Running   0          2h
    rook-ceph-osd-3-7566d48f85-k5mw6   1/1       Running   0          2h
    
    # kubectl -n rook-ceph delete pod rook-ceph-mgr-a-5699bb7984-kpxgp
    
    # kubectl -n rook-ceph delete pod rook-ceph-mon-a-66854cfb5-m5d9x
    ...
    
    # kubectl -n rook-ceph delete pod rook-ceph-osd-0-54bdd844b-wfqk6

    ceph-mon, ceph-osd的delete最后是one-by-one的,等待ceph集群状态为HEALTH_OK后再delete另一个

    3、检查ceph组件的配置

    # cat /var/lib/rook/osd2/rook-ceph.config
    [global]
    run dir                   = /var/lib/rook/osd2
    mon initial members       = a b c
    mon host                  = 10.96.195.188:6790,10.96.128.73:6790,10.96.51.21:6790
    log file                  = /dev/stderr
    mon cluster log file      = /dev/stderr
    public addr               = 192.168.150.252
    cluster addr              = 192.168.150.252
    mon keyvaluedb            = rocksdb
    mon_allow_pool_delete     = true
    mon_max_pg_per_osd        = 1000
    debug default             = 0
    debug rados               = 0
    debug mon                 = 0
    debug osd                 = 0
    debug bluestore           = 0
    debug filestore           = 0
    debug journal             = 0
    debug leveldb             = 0
    filestore_omap_backend    = rocksdb
    osd pg bits               = 11
    osd pgp bits              = 11
    osd pool default size     = 2
    osd pool default min size = 1
    osd pool default pg num   = 100
    osd pool default pgp num  = 100
    osd objectstore           = bluestore
    crush location            = root=default host=ceph5
    rbd_default_features      = 3
    fatal signal handlers     = false
    osd crush update on start = false
    
    [osd.2]
    keyring                  = /var/lib/rook/osd2/keyring
    bluestore block path     = /dev/disk/by-partuuid/bad8c220-d4f7-40de-b7ff-fcc2e492ea64
    bluestore block wal path = /dev/disk/by-partuuid/5315d8be-f80b-4351-95b5-026889d1dd19
    bluestore block db path  = /dev/disk/by-partuuid/6d3d494f-0021-4e95-b45f-59a326976cf8
    
    [osd]
    bluefs_buffered_io  = false
    bluestore_csum_type = none

    创建Ceph集群前指定配置参数

    若用户想在创建Ceph集群前指定配置参数,可以通过先手动创建名为: rook-config-override 的 ConfigMap ,然后再创建Ceph集群。

    1、创建ConfigMap后创建

    # cat ceph-override-conf.yaml
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: rook-config-override
      namespace: rook-ceph
    data:
      config: |
        [global]
        osd crush update on start = false
        osd pool default size = 2
        [osd]
        bluefs_buffered_io = false
        bluestore_csum_type = none
    
    # kubectl create -f ceph-override-conf.yaml
    # kubectl create -f cluster.yaml
    serviceaccount "rook-ceph-cluster" created
    role "rook-ceph-cluster" created
    rolebinding "rook-ceph-cluster-mgmt" created
    rolebinding "rook-ceph-cluster" created
    configmap "rook-config-override" created
    cluster "rook-ceph" created

    2、检查启动的Ceph组件配置

    # cat /var/lib/rook/mon-a/rook-ceph.config
    [global]
    fsid                      = e963975a-fe17-4806-b1b1-d4a6fcebd710
    run dir                   = /var/lib/rook/mon-a
    mon initial members       = a
    mon host                  = 10.96.0.239:6790
    log file                  = /dev/stderr
    mon cluster log file      = /dev/stderr
    public addr               = 10.96.0.239
    cluster addr              = 192.168.239.137
    mon keyvaluedb            = rocksdb
    mon_allow_pool_delete     = true
    mon_max_pg_per_osd        = 1000
    debug default             = 0
    debug rados               = 0
    debug mon                 = 0
    debug osd                 = 0
    debug bluestore           = 0
    debug filestore           = 0
    debug journal             = 0
    debug leveldb             = 0
    filestore_omap_backend    = rocksdb
    osd pg bits               = 11
    osd pgp bits              = 11
    osd pool default size     = 2
    osd pool default min size = 1
    osd pool default pg num   = 100
    osd pool default pgp num  = 100
    rbd_default_features      = 3
    fatal signal handlers     = false
    osd crush update on start = false
    
    [mon.a]
    keyring          = /var/lib/rook/mon-a/keyring
    public bind addr = 192.168.239.137:6790
    
    [osd]
    bluefs_buffered_io  = false
    bluestore_csum_type = none

    4、如何自定义crush rule?

    Rook没有提供kind为 crush rule 的API,所以这里没法类似创建Pool那样创建一个 crush rule , crush rule的定制化也比较多,可以通过CLI或者修改CRUSHMAP的方式操作。

    5/如何升级Ceph集群?

    如下,创建Ceph版本为v12的Cluster:

    # vim cluster.yaml
    ...
    spec:
      cephVersion:
        image: ceph/ceph:v12
        allowUnsupported: false
    ...

    创建后查看Ceph版本为:12.2.9

    [root@rook-ceph-mgr-a-558d49cf8c-dk49n /]# ceph -v
    ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)
    
    # kubectl create -f toolbox.yaml
    deployment "rook-ceph-tools" created
    # kubectl -n rook-ceph exec -it rook-ceph-tools-79954fdf9d-s65wm bash
    [root@ceph0 /]# ceph -v
    ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)

    通过edit来修改Cluster,指定image的Ceph版本为v13,如下:

    # kubectl -n rook-ceph edit cluster rook-ceph
    ...
    spec:
      cephVersion:
        image: ceph/ceph:v13
    ...
    
    cluster "rook-ceph" edited

    之后查看Ceph OSD组件会逐个删除重建,升级到指定的Ceph版本:

    # kubectl -n rook-ceph get pods -o wide
    NAME                               READY     STATUS        RESTARTS   AGE       IP                NODE
    rook-ceph-mgr-a-558d49cf8c-dk49n   1/1       Running       0          29m       192.168.239.130   ceph0
    rook-ceph-mon-a-6c99f7fc49-rw556   1/1       Running       0          30m       192.168.239.171   ceph0
    rook-ceph-mon-b-77bbdd8676-rj22f   1/1       Running       0          29m       192.168.152.189   ceph4
    rook-ceph-mon-c-c7dd7bb4b-8qclr    1/1       Running       0          29m       192.168.150.217   ceph5
    rook-ceph-osd-0-c5d865db6-5dgl4    1/1       Running       0          1m        192.168.152.190   ceph4
    rook-ceph-osd-1-785b4f8c6d-qf9lc   1/1       Running       0          55s       192.168.150.237   ceph5
    rook-ceph-osd-2-6679497484-hjf85   0/1       Terminating   0          28m       <none>            ceph5
    rook-ceph-osd-3-87f8d69db-tmrl5    1/1       Running       0          2m        192.168.239.184   ceph0
    rook-ceph-tools-79954fdf9d-s65wm   1/1       Running       0          23m       100.64.0.20       ceph0

    升级过程中,会发现会自动设置上flag:noscrub,nodeep-scrub

    [root@ceph0 /]# ceph -s
      cluster:
        id:     adb3db57-6f09-4c4a-a3f9-171d6cfe167a
        health: HEALTH_WARN
                noscrub,nodeep-scrub flag(s) set
                1 osds down
                Reduced data availability: 6 pgs inactive, 18 pgs down
                Degraded data redundancy: 2/10 objects degraded (20.000%), 2 pgs degraded
    ...

    待所有的OSD升级完成后,集群状态为 HEALTH_OK ,Ceph mgr,mon,mds组件不会自动升级:

    # kubectl -n rook-ceph get pods -o wide
    NAME                               READY     STATUS    RESTARTS   AGE       IP                NODE
    rook-ceph-mgr-a-558d49cf8c-dk49n   1/1       Running   0          32m       192.168.239.130   ceph0
    rook-ceph-mon-a-6c99f7fc49-rw556   1/1       Running   0          33m       192.168.239.171   ceph0
    rook-ceph-mon-b-77bbdd8676-rj22f   1/1       Running   0          32m       192.168.152.189   ceph4
    rook-ceph-mon-c-c7dd7bb4b-8qclr    1/1       Running   0          32m       192.168.150.217   ceph5
    rook-ceph-osd-0-c5d865db6-5dgl4    1/1       Running   0          4m        192.168.152.190   ceph4
    rook-ceph-osd-1-785b4f8c6d-qf9lc   1/1       Running   0          3m        192.168.150.237   ceph5
    rook-ceph-osd-2-86bb5594df-tdhx4   1/1       Running   0          2m        192.168.150.244   ceph5
    rook-ceph-osd-3-87f8d69db-tmrl5    1/1       Running   0          5m        192.168.239.184   ceph0
    rook-ceph-tools-79954fdf9d-s65wm   1/1       Running   0          26m       100.64.0.20       ceph0

    Rook V0.9.0版本里,mgr和mon会自动升级

    之后单独升级Ceph的其他组件:

    # kubectl -n rook-ceph delete pod rook-ceph-mgr-a-558d49cf8c-dk49n
    # kubectl -n rook-ceph delete pod rook-ceph-mon-a-6c99f7fc49-rw556
    ...

    但发现这些pod重启后,还是使用旧的Ceph版本!!!!

    可以通过修改deployment的方法来升级Ceph mgr,mon,mds组件:

    # kubectl -n rook-ceph get deployment
    NAME                     DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
    rook-ceph-mds-cephfs-a   1         1         1            1           22m
    rook-ceph-mds-cephfs-b   1         1         1            1           22m
    rook-ceph-mds-cephfs-c   1         1         1            1           22m
    rook-ceph-mds-cephfs-d   1         1         1            1           22m
    rook-ceph-mgr-a          1         1         1            1           25m
    rook-ceph-mon-a          1         1         1            1           27m
    rook-ceph-mon-b          1         1         1            1           26m
    rook-ceph-mon-c          1         1         1            1           26m
    rook-ceph-osd-0          1         1         1            1           25m
    rook-ceph-osd-1          1         1         1            1           25m
    rook-ceph-osd-2          1         1         1            1           25m
    rook-ceph-tools          1         1         1            1           14m
    
    # kubectl -n rook-ceph edit deployment rook-ceph-mon-a
    ...
            image: ceph/ceph:v13
    ...
    deployment "rook-ceph-mon-a" edited
    升级Ceph MDS组件时候要全部升级,不同Ceph版本的MDSs不能组成多Active MDSs集群

    总结

    Rook的定位

    从Rook的官方文档中看出,它的定位是Kubernetes上的存储提供框架,提供基于Kubernetes的多种存储部署,比如:Ceph,Minio,CockroachDB,Cassandra,NFS等。

    Ceph只是作为其第一个提供的beta版的存储方案。

    参考: Storage Provider Framework

    Rook的优势

    1. 与Kubernetes集成,一键部署
    2. Rook支持通过yaml文件创建pool,cephfs,radosgw,监控等
    3. 简单扩容和小版本升级比较方便,kuberctl edit 即可

    Rook的不足

    1. Rook项目时间还短,代码不够完善
    2. 不支持分区配置OSD,不能准确定制OSD的磁盘使用
    3. Rook可以一键删除Ceph pool / cephfs / radosgw和Ceph集群,没有确认,有些危险
    4. 基于容器化技术,Ceph的各个组件的IO栈又多了一层,性能会有所损耗
    5. Ceph运维增加了Kubernetes一层,对Ceph运维人员的知识栈要求又提高了

    使用场景总结

    所以总体来说如下:

    适合使用Rook的场景

    • POC环境,测试环境
    • Kubernetes + Ceph混合部署环境
    • 对Ceph性能没强要求环境
    • 不需要经常随社区升级Ceph版本的环境

    不适合使用Rook的场景

    • Ceph集群单独部署环境
    • Ceph性能强需求环境
    • 跟随Ceph社区升级版本的环境

    转载于:  https://blog.csdn.net/wangshuminjava/article/details/90603382

  • 相关阅读:
    torchvision 之 transforms 模块详解
    图像的形状相关参数顺序
    卷积神经网络(CNN)
    OpenCV 图片基本操作
    Dataset 和 DataLoader 详解
    ARM开发各种烧写文件格式说明(ELF、HEX、BIN)结合KEIL环境的使用
    14 局部段描述符的使用
    一文看懂Linux内核!Linux内核架构和工作原理详解
    13 从保护模式返回实模式
    基于win10家庭版的docker安装
  • 原文地址:https://www.cnblogs.com/deny/p/14235814.html
Copyright © 2020-2023  润新知