• ceph旧版客户端挂载新版ceph报错


    问题描述

    描述

    当我们用低版本的rbd或cephfs客户端mount高版本的ceph服务端的时候会报错1000000000000、200000000000000或400000000000000

    痛点:客户端是直接集成在Linux内核里的更新频率显然跟不上服务端社区的更新频率。
    不更新ceph的服务端版本,一些功能和BUG又没办法解决。

    报错日志

    sudo mount -t ceph 10.10.1.11:/
    >> > /mnt/mycephfs -o name=admin,secretfile=/etc/ceph/admin.key;
    >> > sudo tail /var/log/messages
    >> > Fri May  6 22:31:14 MSK 2016
    >> > mount error 5 = Input/output error
    >> > May  6 22:31:24 ceph-admin kernel: libceph: mon0 10.10.1.11:6789
    >> > feature set mismatch, my 103b84a842aca < server's 40103b84a842aca,
    >> > missing 400000000000000 May  6 22:31:24 ceph-admin kernel: libceph:
    >> > mon0 10.10.1.11:6789 missing required protocol features May  6
    >> > 22:31:34 ceph-admin kernel: libceph: mon0 10.10.1.11:6789 feature set
    >> > mismatch, my 103b84a842aca < server's 40103b84a842aca, missing
    >> > 400000000000000 May  6 22:31:34 ceph-admin kernel: libceph: mon0
    >> > 10.10.1.11:6789 missing required protocol features May  6 22:31:44
    >> > ceph-admin kernel: libceph: mon0 10.10.1.11:6789 feature set mismatch,
    >> > my 103b84a842aca < server's 40103b84a842aca, missing 400000000000000
    >> > May  6 22:31:44 ceph-admin kernel: libceph: mon0 10.10.1.11:6789
    >> > missing required protocol features May  6 22:31:54 ceph-admin kernel:
    >> > libceph: mon0 10.10.1.11:6789 feature set mismatch, my 103b84a842aca <
    >> > server's 40103b84a842aca, missing 400000000000000 May  6 22:31:54
    >> > ceph-admin kernel: libceph: mon0 10.10.1.11:6789 missing required
    >> > protocol features May  6 22:32:04 ceph-admin kernel: libceph: mon0
    >> > 10.10.1.11:6789 feature set mismatch, my 103b84a842aca < server's
    >> > 40103b84a842aca, missing 400000000000000 May  6 22:32:04 ceph-admin
    >> > kernel: libceph: mon0 10.10.1.11:6789 missing required protocol
    >> > features
    >> >
    >> > As I guessed I need to switch off the "require_feature_tunables5" to
    >> > remove the error messages.
    >> >
    >> > Can somebody tell me how to do that ?
    >> >
    >> > Many thanks in advance.
    

    特性和内核对应表

    客户端与服务端能力之间的匹配关系

    CEPH_FEATURE Table and Kernel Version
    You can find the feature missing in that table :
    
    For exemple, missing 2040000 means that CEPH_FEATURE_CRUSH_TUNABLES (40000) and CEPH_FEATURE_CRUSH_TUNABLES2 (2000000) is missing on kernel client.
    
    ‘R’:required, ’S’:support, ‘-X-’ feature is new since this version
    
    Feature BIT OCT 3.8 3.9 3.10 3.14 3.15 3.18 4.1 4.5 4.6
    CEPH_FEATURE_NOSRCADDR 1 2 R R R R R R R R R
    CEPH_FEATURE_SUBSCRIBE2 4 10 -R-
    CEPH_FEATURE_RECONNECT_SEQ 6 40 -R- R R R R R R
    CEPH_FEATURE_PGID64 9 200 R R R R R R R R
    CEPH_FEATURE_PGPOOL3 11 800 R R R R R R R R
    CEPH_FEATURE_OSDENC 13 2000 R R R R R R R R
    CEPH_FEATURE_CRUSH_TUNABLES 18 40000 S S S S S S S S S
    CEPH_FEATURE_MSG_AUTH 23 800000 -S- S S S
    CEPH_FEATURE_CRUSH_TUNABLES2 25 2000000 S S S S S S S S
    CEPH_FEATURE_REPLY_CREATE_INODE 27 8000000 S S S S S S S S
    CEPH_FEATURE_OSDHASHPSPOOL 30 40000000 S S S S S S S S
    CEPH_FEATURE_OSD_CACHEPOOL 35 800000000 -S- S S S S S
    CEPH_FEATURE_CRUSH_V2 36 1000000000 -S- S S S S S
    CEPH_FEATURE_EXPORT_PEER 37 2000000000 -S- S S S S S
    CEPH_FEATURE_OSD_ERASURE_CODES*** 38 4000000000
    CEPH_FEATURE_OSDMAP_ENC 39 8000000000 -S- S S S S
    CEPH_FEATURE_CRUSH_TUNABLES3 41 20000000000 -S- S S S S
    CEPH_FEATURE_OSD_PRIMARY_AFFINITY 41* 20000000000 -S- S S S S
    CEPH_FEATURE_CRUSH_V4 **** 48 1000000000000 -S- S S
    CEPH_FEATURE_CRUSH_TUNABLES5 58 200000000000000 -S- S
    CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING 58* 400000000000000 -S- S

    解决办法

    描述

    最简单的办法就是升级客户端版本,但显然遇到这个问题的人都是升级不了客户端版本的人。
    反过来,那只能通过降低服务端的能力来解决这个问题了。
    以ceph-nautilus 14.2.9为例
    展示一下tunables

    $ ceph osd crush show-tunables
    {
        "choose_local_tries": 0,
        "choose_local_fallback_tries": 0,
        "choose_total_tries": 50,
        "chooseleaf_descend_once": 1,
        "chooseleaf_vary_r": 1,
        "chooseleaf_stable": 0,
        "straw_calc_version": 1,
        "allowed_bucket_algs": 54,
        "profile": "hammer",
        "optimal_tunables": 0,
        "legacy_tunables": 0,
        "minimum_required_version": "jewel",
        "require_feature_tunables": 1,
        "require_feature_tunables2": 1,
        "has_v2_rules": 0,
        "require_feature_tunables3": 0,
        "has_v3_rules": 0,
        "has_v4_buckets": 1,
        "require_feature_tunables5": 1,
        "has_v5_rules": 0
    }
    

    自己的客户端版本内核是3.10;因此错误1000000000000、和400000000000000都会报。
    最终关掉require_feature_tunables5、has_v4_buckets两项能力才完成了挂载。

    关掉require_feature_tunables5

    查看可调的参数

    $ ceph osd crush tunables --help
    .....
    osd crush tunables legacy|argonaut|bobtail|firefly|hammer|jewel|optimal|default
    

    设置到firefly

    ceph osd crush tunables firefly
    ceph osd crush reweight-all
    

    关掉has_v4_buckets

    我们发现就算把所有的选项都尝试一遍has_v4_buckets依然都是1
    最终有网友发现,把crush里的straw2都改成straw就可以了。

    # 获取crushmap
    $ sudo ceph osd getcrushmap -o crushmap.txt
    # 反编译crushmap
    $ crushtool -d crushmap.txt -o crushmap-decompile
    # 改之前记得备份
    $ cp crushmap-decompile bakcrushmap
    # 修改把所有的straw2都改成straw
    $ sed -i "s/straw2/straw/" crushmap-decompile
    # 编译crushmap
    $ crushtool -c crushmap-decompile -o crushmap-compiled
    # 设置crushmap
    [root@node1 ~]# sudo ceph osd setcrushmap -i crushmap-compiled
    

    参考网址

    http://cephnotes.ksperis.com/blog/2014/01/21/feature-set-mismatch-error-on-ceph-kernel-client/
    http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-May/009634.html
    https://ceph.io/planet/ceph-的crush算法-straw/
    https://blog.csdn.net/tiankai517/article/details/50221931?locationNum=3&fps=1

  • 相关阅读:
    汉语-词语:什么
    汉语-词语:甚么
    汉语-汉字:心
    汉语-词汇:头脑
    汉语-词语:冰冷
    汉语-词汇:冷静
    两个int类型的数据相加,有可能会出现超出int的表示范围。
    两个int类型的数据相加,有可能会出现超出int的表示范围。
    成员变量与局部变量的区别_
    函数额基本概述
  • 原文地址:https://www.cnblogs.com/bugutian/p/14218388.html
Copyright © 2020-2023  润新知