• ClickHouse之集群搭建以及数据复制


    前面的文章简单的介绍了ClickHouse,以及也进行了简单的性能测试。本次说说集群的搭建以及数据复制,如果复制数据需要zookeeper配合。

    环境:

    1. 3台机器,我这里是3台虚拟机。都安装了clickhouse。

    2. 绑定hosts,其实不绑定也没关系,配置文件里面直接写ip。(3台机器都绑定hosts,如下)

    192.168.0.10 db_server_yayun_01
    192.168.0.20 db_server_yayun_02
    192.168.0.30 db_server_yayun_03

    3. 创建配置文件,默认这个配置文件是不存在的。/etc/clickhouse-server/config.xml有提示,如下:
    If element has 'incl' attribute, then for it's value will be used corresponding substitution from another file.
    By default, path to file with substitutions is /etc/metrika.xml. It could be changed in config in 'include_from' element.
    Values for substitutions are specified in /yandex/name_of_substitution elements in that file.

    配置文件/etc/metrika.xml内容如下:

    <yandex>
    <clickhouse_remote_servers>
        <perftest_3shards_1replicas>
            <shard>
                 <internal_replication>true</internal_replication>
                <replica>
                    <host>db_server_yayun_01</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <replica>
                    <internal_replication>true</internal_replication>
                    <host>db_server_yayun_02</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>db_server_yayun_03</host>
                    <port>9000</port>
                </replica>
            </shard>
        </perftest_3shards_1replicas>
    </clickhouse_remote_servers>
    
    
    <zookeeper-servers>
      <node index="1">
        <host>192.168.0.30</host>
        <port>2181</port>
      </node>
    </zookeeper-servers>
    
    <macros>
        <replica>192.168.0.10</replica>
    </macros>
    
    
    <networks>
       <ip>::/0</ip>
    </networks>
    
    
    <clickhouse_compression>
    <case>
    <min_part_size>10000000000</min_part_size> <min_part_size_ratio>0.01</min_part_size_ratio> <method>lz4</method> </case>
    </clickhouse_compression> </yandex>

    3台机器的配置文件都一样,唯一有区别的是:

    <macros>
        <replica>192.168.0.10</replica>
    </macros>

    服务器ip是多少这里就写多少,其实不写ip也没关系,3台机器不重复就行。这里是复制需要用到的配置。还有zk的配置如下:

    <zookeeper-servers>
      <node index="1">
        <host>192.168.0.30</host>
        <port>2181</port>
      </node>
    </zookeeper-servers>

    我的zk是安装在30的机器上面的,只安装了一个实例,生产环境肯定要放到单独的机器,并且配置成集群。配置文件修改好以后3台服务器重启。
    官方文档给的步骤是:

    ClickHouse deployment to cluster
    
    ClickHouse cluster is a homogenous cluster. Steps to set up:
    
    1. Install ClickHouse server on all machines of the cluster
    2. Set up cluster configs in configuration file
    3. Create local tables on each instance
    4. Create a Distributed table

    前面2步都搞定了,下面创建本地表,再创建Distributed表。(3台机器都创建,DDL不同步,蛋疼)

    CREATE TABLE ontime_local (FlightDate Date,Year UInt16) ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192);
    CREATE TABLE ontime_all AS ontime_local ENGINE = Distributed(perftest_3shards_1replicas, default, ontime_local, rand())

    插入数据(随便一台机器就行):

    :) insert into ontime_all (FlightDate,Year)values('2001-10-12',2001);
    
    INSERT INTO ontime_all (FlightDate, Year) VALUES
    
    Ok.
    
    1 rows in set. Elapsed: 0.013 sec. 
    
    :) insert into ontime_all (FlightDate,Year)values('2002-10-12',2002);
    
    INSERT INTO ontime_all (FlightDate, Year) VALUES
    
    Ok.
    
    1 rows in set. Elapsed: 0.004 sec. 
    
    :) insert into ontime_all (FlightDate,Year)values('2003-10-12',2003);
    
    INSERT INTO ontime_all (FlightDate, Year) VALUES
    
    Ok.

    我这里插入了3条数据。下面查询看看(任何一台机器都可以):

    :) select * from  ontime_all;
    
    SELECT *
    FROM ontime_all 
    
    ┌─FlightDate─┬─Year─┐
    │ 2001-10-122001 │
    └────────────┴──────┘
    ┌─FlightDate─┬─Year─┐
    │ 2002-10-122002 │
    └────────────┴──────┘
    ┌─FlightDate─┬─Year─┐
    │ 2003-10-122003 │
    └────────────┴──────┘
    → Progress: 3.00 rows, 12.00 B (48.27 rows/s., 193.08 B/s.) 
    3 rows in set. Elapsed: 0.063 sec. 
    
    :) 

    当在其中一台机器上面查询的时候,抓包其他机器可以看见是有请求的。

    tcpdump -i any -s 0 -l -w - dst port 9000

    那么关闭其中一台机器呢?

    :) select * from ontime_all;
    
    SELECT *
    FROM ontime_all 
    
    ┌─FlightDate─┬─Year─┐
    │ 2001-10-122001 │
    └────────────┴──────┘
    ┌─FlightDate─┬─Year─┐
    │ 2002-10-122002 │
    └────────────┴──────┘
    ┌─FlightDate─┬─Year─┐
    │ 2003-10-122003 │
    └────────────┴──────┘
    ↓ Progress: 6.00 rows, 24.00 B (292.80 rows/s., 1.17 KB/s.) Received exception from server:
    Code: 279. DB::Exception: Received from localhost:9000, ::1. DB::NetException. DB::NetException: All connection tries failed. Log: 
    
    Code: 210, e.displayText() = DB::NetException: Connection refused: (db_server_yayun_02:9000, 192.168.0.20), e.what() = DB::NetException
    Code: 210, e.displayText() = DB::NetException: Connection refused: (db_server_yayun_02:9000, 192.168.0.20), e.what() = DB::NetException
    Code: 210, e.displayText() = DB::NetException: Connection refused: (db_server_yayun_02:9000, 192.168.0.20), e.what() = DB::NetException

    可以看见已经抛错了,竟然不是高可用?后面又看到了文档的另外一种配置方法,那就是配置2个节点,副本2个,经过测试高可用没有问题,另外也是分布式并行查询。感兴趣的同学可以自行测试。
    https://clickhouse.yandex/reference_en.html#Distributed

    下面进行数据复制的测试,zk已经配置好了,直接建表测试(3台机器都创建):

    CREATE TABLE ontime_replica (FlightDate Date,Year UInt16) ENGINE = ReplicatedMergeTree('/clickhouse_perftest/tables/ontime_replica','{replica}',FlightDate,(Year, FlightDate),8192);

    插入数据测试:

    insert into ontime_replica (FlightDate,Year)values('2018-10-12',2018);

    任何一台机器均可查询到。其实到现在对于集群和复制都还没彻底搞明白,因为分布式表也进行了数据复制,所以有点懵。有大婶的话欢迎一起交流。

    参考资料:

    https://clickhouse.yandex/reference_en.html#Distributed

    https://clickhouse.yandex/tutorial.html

  • 相关阅读:
    利用jmeter进行数据库测试
    oracle创建/删除表空间、创建/删除用户并赋予权限
    在linux环境下安装JDK并配置环境变量
    本地与在线图片转Base64及图片预览
    html标签页图标
    Eclipse启动时卡死解决方法
    Java创建目录 mkdir与mkdirs的区别
    Java 获取距离最近一段时间的时间点
    data URI
    JavaScript input file上传前获取文件名、文件类型、文件大小等信息
  • 原文地址:https://www.cnblogs.com/gomysql/p/6708650.html
Copyright © 2020-2023  润新知