• Vertically Scaling PostgreSQL


    转自:https://pgdash.io/blog/scaling-postgres.html

    PostgreSQL can scale rather well vertically. The more resources (CPU, memory, disk) that you can make available to your PostgreSQL server, the better it can perform. However, while some parts of Postgres can automatically make use of the increased resources, other parts need configuration changes before improvements can be noticed.

    Read on to learn more about how to ensure PostgreSQL makes full use of the system you’re running it on.

    CPU

    What Scales Automatically

    PostgreSQL has a traditional process architecture, consisting of a master process (called the postmaster) that spawns a new process (called a backend) for each new client connection. This means that if there are more CPU cores available, more processes can run simultaneously, and therefore backends do not have to contend as much for CPU availability. CPU-bound queries will complete faster.

    You may wish to adjust the maximum allowed simultaneous connections at a system-wide, per-database or per-user level:

    -- system level
    ALTER SYSTEM SET max_connections = 200;
    
    -- database level
    ALTER DATABASE dbname CONNECTION LIMIT 200;
    
    -- user level
    ALTER ROLE username CONNECTION LIMIT 20;

    so that rogue apps cannot end up hogging too many connections.

    What Needs Tweaking

    The PostgreSQL server can spawn process to take care of housekeeping tasks, like vacuuming, replication, subscriptions (for logical replication) etc. The number of such workers is not determined dynamically, but just set via configuration, and defaults to 8.

    The top-level configuration setting for the number of worker process is:

    # typically specified in postgresql.conf
    max_worker_processes = 16

    Increasing this value can result in speedup of maintenance jobs, parallel queries and index creation.

    Parallel Queries

    Starting with version 9.6, Postgres can execute queries parallely if the query planner decides it’ll help. Parallel querying involves spawning workers, distributing work amongst them and then collecting (gathering) the results. Subject to the overall limit of max_worker_processes set earlier, Postgres will determine how many workers can be spawned for parallel query depending on the value of two configuration settings:

    # the maximum number of workers that the system can
    # support for parallel operations
    max_parallel_workers = 8
    
    # the maximum number of workers that can be started
    # by a single Gather or Gather Merge node
    max_parallel_workers_per_gather = 8

    If you have idle CPUs and parallelizable queries, increasing these values can speedup such queries.

    Parallel Index Creation

    In Postgres 11, support for parallel creation of B-Tree indexes was added. If you regularly create B-Tree indexes or REINDEX them, increasing this value can help:

    # the maximum number of parallel workers that can be
    # started by a single utility command
    max_parallel_maintenance_workers = 8

    This will allow Postgres to spawn these many workers (subject to the overall limit of max_worker_processes) to speed up the creation of B-Tree indexes.

    Logical Replication

    Logical replication (available in Postgres 10 and above), relies on worker processes at the subscription side to fetch changes from the publisher. By asking Postgres to spawn more logical replication workers, the changes can be fetched and applied in parallel, especially if there are more tables. This configuration setting increases the total number of replication workers:

    # maximum number of logical replication workers
    max_logical_replication_workers = 8

    In streaming replication, you can start off a sync with a base backup. For logical replication however, changes have to be pulled in via the replication protocol itself, over the network. This can be time consuming. Allowing for more workers during the sync phase can speed up this process:

    # basically the number of tables that are synced in
    # parallel during initialization of subscription
    max_sync_workers_per_subscription = 8

    Autovacuum

    Periodically, based on a bunch of configuration settings, Postgres will spwan a bunch of workers that will VACUUM the database tables. This is of course, called autovacuum, and the number of workers that the autovacuum launcher spawns each time can be set via the configuration setting:

    # the maximum number of autovacuum processes
    autovacuum_max_workers = 8

    WAL Compression

    If you have CPU to spare, you can trade CPU for disk bandwidth by compressing the pages that are written into the WAL files. This reduces the amount of data that needs to be written to disk, at the expense of more CPU cycles to compress the data. It also reduces the size of data that needs to be sent across the wire for streaming replication.

    Practically, the benefits of WAL compression are well worth the very reasonable overhead. To turn it on, use:

    # compresses full page images written to WAL
    wal_compression = on

    Memory

    What Scales Automatically

    The OS automatically manages and uses memory that is unused by any application for caching data read from and written to the disk recently. This greatly speeds up disk-intensive applications, and certainly PostgreSQL.

    In Linux, the most popular host for Postgres, the size of the OS disk cache cannot be set by the user. It’s management is internal to Linux. Under memory pressure, it will yield disk cache memory to applications.

    What Needs Tweaking

    Query Planner

    The query planner has to include the amount of disk cache provided by the OS as a factor into it’s estimations. If you’ve managed to increase the OS disk cache significantly (by increasing the available memory), increasing this configuration setting might help in improving the planner’s estimates:

    # the planner's assumption about the effective size
    # of the disk cache that is available to a single query.
    effective_cache_size = 64GB

    Shared Memory

    PostgreSQL uses a set of buffers that is shared between all workers and backend processes. These are called shared buffers, and the amount of memory allocated for shared buffers is set using the configuration setting:

    shared_buffers = 32GB

    Temporary Buffers

    When temporary tables are accessed by a query, buffers are allocated to cache the contents that are read in. The size of this buffer is set using the configuration setting:

    # the maximum number of temporary buffers used
    # by each database session
    temp_buffers = 100MB

    If you have memory to spare and queries that use temporary tables heavily, increasing this value can speed up such queries.

    Working Memory

    Working memory is allocated locally and privately by backends. It is used to fulfil sorts and joins without having to create into temporary tables. Increasing this from the default of 4MB can let queries complete faster by during temporary table creation:

    # the amount of memory to be used by internal sort
    # operations and hash tables before writing to temporary disk files
    work_mem = 16MB

    Maintenance Operations

    The memory used by VACUUM, index creation and other such maintenance commands are controlled by the configuration setting maintenance_work_mem. Increasing this amount can speed up these operations, especially on indexes or tables that need to be recreated.

    The memory used by the autovacuum workers can be taken from the maintenance work memory (by setting autovacuum_work_mem = -1) or configured independently.

    # the maximum amount of memory to be used by
    # maintenance operations
    maintenance_work_mem = 128MB
    
    # maximum amount of memory to be used by each
    # autovacuum worker process
    autovacuum_work_mem = -1

    Disk

    What Scales Automatically

    Disks can be made bigger, faster or more concurrent. The size of the disk is the only thing that PostgreSQL doesn’t have to be instructed about. By default, PostgreSQL will not constrain itself from using any available disk space. This is usually just fine.

    You can place a limit on the total size of temporary files created though, to provide some amount of protection against queries that try to sort a billion rows and the like:

    # the maximum amount of disk space that a process
    # can use for temporary files
    temp_file_limit = 500GB

    What Needs Tweaking

    Concurrency

    RAID-ed disks and file systems like ZFS can be setup to support more concurrency. That is to say, you can have a few disk reads/writes being serviced concurrently by such file systems because of the way the store or handle data internally.

    You can let Postgres issue multiple concurrent disk I/O, by using this configuration setting:

    # the number of concurrent disk I/O operations that
    # PostgreSQL expects can be executed simultaneously
    effective_io_concurrency = 4

    This is currently used only by bitmap heap scans though.

    Random Page Cost

    The Postgres query planner assumes that sequential reads are faster than random reads. Exactly how much faster is a value you can tweak. By default, it assumes random reads are 4 times costlier.

    Depending on your disk setup, workload and benchmarking, if you’re sure that random reads are say, only twice as costly as sequential reads, you can tell that to Postgres:

    # the planner's estimate of the cost of a disk page
    # fetch that is part of a series of sequential fetches
    seq_page_cost = 1
    
    # the planner's estimate of the cost of a
    # non-sequentially-fetched disk page
    random_page_cost = 2

    Tablespaces

    To take advantage of multiple disks which are not mounted as one big single filesystem, you can use tablespaces. With tablespaces, you can places tables or indexes different filesystems. This can improve concurrency and provides an easy way to handle table growth.

    CREATE TABLESPACE disk2 LOCATION '/mnt/disk2/postgres';

    Read more about tablespaces here.

    Network

    Network is usually the least used resource on a PostgreSQL server, and is rarely saturated. If you do need to scale, it is easy enough to add more network interfaces each with it’s own IP and have PostreSQL listen on them all:

    listen_addresses = '10.1.0.10,10.1.0.11'

    The clients will have to be load balanced across all the IPs that Postgres listens on.

    Other

    There are a few other configuration settings that can be tweaked, most of which use up more CPU and memory.

    Partitionwise Operations

    Postgres 10 introduced table partitioning, which was improved upon in Postgres 11. Some query optimizations on partitions are not turn on by default, as they might result in higher CPU and memory consumption. These are:

    # allow a join between partitioned tables to be
    # performed by joining the matching partitions
    enable_partitionwise_join = on
    
    # allow grouping or aggregation on a partitioned
    # tables performed separately for each partition
    enable_partitionwise_aggregate = on

     

  • 相关阅读:
    Spinner用法与ListView用法
    ViewPager实现选项卡功能
    android:layout_weight的真实含义
    vb和vb.net事件机制
    go
    挨踢江湖之十一
    蓝桥杯-地铁换乘
    【Android LibGDX游戏引擎开发教程】第06期:图形图像的绘制(下)图片整合工具的使用
    Eclipse3.6 添加JUnit源代码
    【分享】如何使用sublime代码片段快速输入PHP头部版本声明
  • 原文地址:https://www.cnblogs.com/rongfengliang/p/14156060.html
Copyright © 2020-2023  润新知