os: centos 7.4
pgxl:pg.version ‘10.3 (Postgres-XL 10alpha2)
添加节点、删除节点在日常运维中是很常见的操作。
本次记录的是 pgxl 添加 datanode 类型的节点,典型的横向扩展。
node4节点准备
安装依赖包
关闭防火墙,selinux
创建用户,修改环境变量
创建目录
git获取pgxl
编译安装pgxl
配置ssh信任
ntp同步时间
这些步骤都可以参考上一篇博客。
node4节点上 datanode
$ initdb -D /var/lib/pgxl/data --nodename datanode3 -E UTF8 --locale=C -U postgres -W
Success. You can now start the database server of the Postgres-XL coordinator using:
pg_ctl -D /var/lib/pgxl/data -l logfile start -Z coordinator
or
You can now start the database server of the Postgres-XL datanode using:
pg_ctl -D /var/lib/pgxl/data -l logfile start -Z datanode
$ vi /var/lib/pgxl/data/pg_hba.conf
host all all 192.168.56.101/32 trust
host all all 192.168.56.102/32 trust
host all all 192.168.56.103/32 trust
host all all 192.168.56.104/32 trust
$ vi /var/lib/pgxl/data/postgresql.conf
listen_addresses = '*'
port = 5432
max_connections = 100
pooler_port = 6667
max_pool_size = 100
gtm_host = 'node1'
gtm_port = 6668
pgxc_node_name = 'datanode3'
启动 datanode3
$ pg_ctl -D /var/lib/pgxl/data -l logfile start -Z datanode
添加 coordinator,datanode信息
postgres=# alter node datanode3 with (type=datanode,host='node4', port=5432);
postgres=# create node coordinator1 with (type=coordinator,host='node1', port=5432);
postgres=# create node coordinator2 with (type=coordinator,host='node1', port=5433);
postgres=# create node datanode1 with (type=datanode, host='node2',port=5432,primary,preferred);
postgres=# create node datanode2 with (type=datanode, host='node3',port=5432);
postgres=# select pgxc_pool_reload();
postgres=# select * from pgxc_node;
coordinator,datanode添加 datanode3信息
coordinator1,coordinator2,datanode1,datanode2 都需要添加
postgres=# create node datanode3 with (type=datanode, host='node4',port=5432);
postgres=# select pgxc_pool_reload();
postgres=# select * from pgxc_node;
如果使用pgxc_ctl就会方便很多
表重新sharding
登录node1节点的 coordinator1, redistribute tables
$ psql -p 5432
psql (PGXL 10alpha2, based on PG 10.3 (Postgres-XL 10alpha2))
Type "help" for help.
postgres=# select * from pgxc_node;
node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
--------------+-----------+-----------+-----------+----------------+------------------+-------------
coordinator2 | C | 5433 | node1 | f | f | -2089598990
datanode1 | D | 5432 | node2 | t | t | 888802358
datanode2 | D | 5432 | node3 | f | f | -905831925
coordinator1 | C | 5432 | node1 | f | f | 1938253334
datanode3 | D | 5432 | node4 | f | f | -1894792127
(5 rows)
postgres=# EXECUTE DIRECT ON (datanode3) 'create database peiybdb;';
postgres=# c peiybdb
peiybdb=# EXECUTE DIRECT ON (datanode3) ' create table tmp_t0 (c0 varchar(100),c1 varchar(100));';
peiybdb=# ALTER TABLE tmp_t0 ADD NODE (datanode3);
peiybdb=# select xc_node_id,count(1) from tmp_t0 group by xc_node_id;
xc_node_id | count
-------------+-------
-1894792127 | 3220
-905831925 | 3429
888802358 | 3351
(3 rows)
控制表数据分布的节点
ALTER TABLE tmp_t0 ADD NODE (datanode3);
ALTER TABLE tmp_t0 DELETE NODE (datanode3);
ALTER TABLE tmp_t0 DISTRIBUTE BY HASH(c0);
ALTER TABLE tmp_t0 DISTRIBUTE BY REPLICATION;
下面是帮助文档的截取
To change the distribution type and the list of nodes where table data is located:
ALTER TABLE distributors TO NODE (dn1, dn7), DISTRIBUTE BY HASH(dist_id);
To add a node where data of table is distributed:
ALTER TABLE distributors ADD NODE (dn9, dn14);
To remove a node where data of table is distributed:
ALTER TABLE distributors DELETE NODE (dn4, dn0);
下面是简单的建表分析
CREATE TABLE table_name(...)
DISTRIBUTE BY
HASH(col)|MODULO(col)|ROUNDROBIN|REPLICATION
TO NODE(nodename1,nodename2...)
可以看到,如果DISTRIBUTE BY 后面有如下选项:
REPLICATION,则是复制模式,其余则是分片模式,
HASH 指的是按照指定列的哈希值分布数据,
MODULO 指的是按照指定列的取摩运算分布数据,
ROUNDROBIN 指的是按照轮询的方式分布数据
TO NODE指定了数据分布的节点范围,如果没有指定则默认所有数据节点参与数据分布。如果没有指定分布模式,即使用普通的CREATE TABLE语句,PGXL会默认采用分片模式将数据分布到所有数据节点。
参考:
https://www.postgres-xl.org/documentation/tutorial-createcluster.html
https://www.postgres-xl.org/documentation/sql-createtable.html
https://www.postgres-xl.org/documentation/sql-altertable.html