• Greenplum启动失败Error occurred: non-zero rc: 1的修复


    某日开发反馈测试环境的集群启动失败

    报错内容如下:

    [gpadmin@hadoop-test2:/root]
    $ gpstart
    20181205:16:42:23:005451 gpstart:hadoop-test2:gpadmin-[INFO]:-Starting gpstart with args: 20181205:16:42:23:005451 gpstart:hadoop-test2:gpadmin-[INFO]:-Gathering information and validating the environment... 20181205:16:42:23:005451 gpstart:hadoop-test2:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 5.0.0 build dev' 20181205:16:42:23:005451 gpstart:hadoop-test2:gpadmin-[INFO]:-Greenplum Catalog Version: '301705051' 20181205:16:42:24:005451 gpstart:hadoop-test2:gpadmin-[INFO]:-Starting Master instance in admin mode 20181205:16:52:24:005451 gpstart:hadoop-test2:gpadmin-[CRITICAL]:-Failed to start Master instance in admin mode 20181205:16:52:24:005451 gpstart:hadoop-test2:gpadmin-[CRITICAL]:-Error occurred: non-zero rc: 1 Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /home/gpadmin/gpdata/gpmaster/gpseg-1 -l /home/gpadmin/gpdata/gpmaster/gpseg-1/pg_log/startup.log
    -w -t 600 -o " -p 2346 --gp_dbid=1 --gp_num_contents_in_cluster=0 --silent-mode=true -i -M master --gp_contentid=-1 -x 0 -c gp_role=utility " start
    ' rc=1, stdout='waiting for server to start...................................................................................................................................
    ...........................................................................................................................................................................
    ...........................................................................................................................................................................
    .................................................................................................................................. stopped waiting
    ', stderr='could not change directory to "/root" pg_ctl: could not start server Examine the log output.

    查看启动日志发现:

    vim /home/gpadmin/gpdata/gpmaster/gpseg-1/pg_log/startup.log
    2018-12-05 08:42:24.067241 GMT,,,p5464,th-829482944,,,,0,,,seg-1,,,,,"WARNING","01000","""work_mem"": setting is deprecated, and may be removed in a future release.",,,,,,,,"set_config_option","guc.c",4666,
    2018-12-05 08:42:24.067612 GMT,,,p5464,th-829482944,,,,0,,,seg-1,,,,,"WARNING","01000","""work_mem"": setting is deprecated, and may be removed in a future release.",,,,,,,,"set_config_option","guc.c",4666,
    2018-12-05 08:42:24.083813 GMT,,,p5465,th-829482944,,,,0,,,seg-1,,,,,"LOG","00000","removing all temporary files",,,,,,,,"RemovePgTempFiles","fd.c",2046,
    2018-12-05 08:42:24.098673 GMT,,,p5465,th-829482944,,,,0,,,seg-1,,,,,"FATAL","XX000","could not create shared memory segment: Invalid argument (pg_shmem.c:183)","Failed system call was shmget(key=2346001, size=177586016, 03600).","This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter.  You can either reduce the request size or reconfigure the kernel with larger SHMMAX.  To reduce the request size (currently 177586016 bytes), reduce PostgreSQL's shared_buffers parameter (currently 4000) and/or its max_connections parameter (currently 253).
    If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter, in which case raising the request size or reconfiguring SHMMIN is called for.
    The PostgreSQL documentation contains more information about shared memory configuration.",,,,,,"InternalIpcMemoryCreate","pg_shmem.c",183,1

    内容大概是说/etc/sysctl.conf设置的内核参数shmmax过小,导致启动失败

    查看/etc/sysctl.conf下的配置发现:

    kernel.shmmax = 20000000
    kernel.shmmni = 4096
    kernel.shmall = 40000000
    kernel.sem = 250 512000 100 2048
    kernel.sysrq = 1
    kernel.core_uses_pid = 1
    kernel.msgmnb = 65536
    kernel.msgmax = 65536
    kernel.msgmni = 2048
    net.ipv4.tcp_syncookies = 1
    net.ipv4.ip_forward = 0
    net.ipv4.conf.default.accept_source_route = 0
    net.ipv4.tcp_tw_recycle = 1
    net.ipv4.tcp_max_syn_backlog = 4096
    net.ipv4.conf.all.arp_filter = 1
    net.ipv4.ip_local_port_range = 1025 65535
    net.core.netdev_max_backlog = 10000
    net.core.rmem_max = 2097152
    net.core.wmem_max = 2097152
    vm.overcommit_memory = 2

    对比官网建议的设置和参数定义以及集群已有的数据量,发现确实过小。于是改成官网建议的设置后启动。

    20181205:17:54:28:009711 gpstart:hadoop-test2:gpadmin-[INFO]:-----------------------------------------------------
    20181205:17:54:28:009711 gpstart:hadoop-test2:gpadmin-[INFO]:-   Successful segment starts                                            = 8
    20181205:17:54:28:009711 gpstart:hadoop-test2:gpadmin-[INFO]:-   Failed segment starts                                                = 0
    20181205:17:54:28:009711 gpstart:hadoop-test2:gpadmin-[INFO]:-   Skipped segment starts (segments are marked down in configuration)   = 0
    20181205:17:54:28:009711 gpstart:hadoop-test2:gpadmin-[INFO]:-----------------------------------------------------
    20181205:17:54:28:009711 gpstart:hadoop-test2:gpadmin-[INFO]:-Successfully started 8 of 8 segment instances 
    20181205:17:54:28:009711 gpstart:hadoop-test2:gpadmin-[INFO]:-----------------------------------------------------
    20181205:17:54:28:009711 gpstart:hadoop-test2:gpadmin-[INFO]:-Starting Master instance hadoop-test2 directory /home/gpadmin/gpdata/gpmaster/gpseg-1 
    20181205:17:54:29:009711 gpstart:hadoop-test2:gpadmin-[INFO]:-Command pg_ctl reports Master hadoop-test2 instance active
    20181205:17:54:29:009711 gpstart:hadoop-test2:gpadmin-[INFO]:-No standby master configured.  skipping...
    20181205:17:54:29:009711 gpstart:hadoop-test2:gpadmin-[INFO]:-Database successfully started

    启动成功。

    总结:pg启动相关的内核参数配置与实际情况不匹配时,会导致启动失败。可通过查看日志详细信息查找根源解决问题。

    参考文档:

    1、官网建议设置 http://gpdb.docs.pivotal.io/4380/prep_os-system-params.html#topic3

    2、内核参数含义http://www.oicqzone.com/pc/2012091612901.html

  • 相关阅读:
    sqlserver数据库的备份与还原——完整备份与还原
    sqlserver中为节约存储空间的收缩数据库机制
    sqlserver数据库的分离与附加
    sqlserver的数据库状态——脱机与联机
    sqlserver打开对象资源管理器管理的帮助文档的快捷键
    sqlserver使用SQL语句创建数据库登录对象、数据库用户以及对为该用户赋予操作权限
    sqlserver window身份验证时切换账户的快捷键
    向现有数据库中添加文件组和数据文件
    使用SQL语句创建数据库2——创建多个数据库文件和多个日志文件
    Java中怎样判断一个字符串是否是数字?
  • 原文地址:https://www.cnblogs.com/chou1214/p/10072385.html
Copyright © 2020-2023  润新知