• Better PostgreSQL backups with WAL archiving


    转:http://www.anchor.com.au/blog/documentation/better-postgresql-backups-with-wal-archiving/

    It’s pretty well known that we’re big fans of PostgreSQL. One of the great things about Postgres is its ability to perform backups while running, without interruptions to service.

    The bog-standard way of backing up MySQL or Postgres is to take a textual dump. That’s great for simple things, but it’s slow and can lock the database while it’s running. We’ve also seen strange behaviour when trying to restore from mysqldump backups when they contain triggers.

    For MySQL, tools like mylvmbackup dodge some of those issues by taking a mostly consistent point-in-time snapshot of the filesystem. It’s pretty cool if you’re stuck using MySQL, but it’s not ideal.

    That’s for another day though. Here we’ll talk about a more reliable and low-impact way of doing backups for Postgres, taking advantage of its well-architected storage engine. Postgres’ own docs describe a lot of the theory and a little application. It’s a good read but a bit thin when it comes down to implementation – that’s what we’ll cover here.

    How it works

    Postgres was designed with ACID properties in mind. This is reflected in the way it works and stores data, at the core of which is the Write-Ahead Log (WAL). Amongst other things, the WAL allows for atomic transactions and data-safety in the face of a crash.

    The details are far too deep to cover here, but for our purposes it suffices to know that a successful backup only requires:

    1. A copy of the main files on disk at some point in time (it’s okay that they were open or written-to while being copied)
    2. WAL data segments for the time the files were being copied, they’re just files on disk

    It may help to think of a backup+restore as being like a recovery after a sudden crash (eg. power loss). Postgres deals with both situations the same way because they’re effectively the same thing:

    After a crash, Postgres replays the WAL to correct inconsistencies in the main data files, then opens for business as usual. It stands to reason that we can perform a backup in the same way, by having a copy of the main data files and the current WAL segments. Postgres even makes this really easy, by archiving copies of the WAL segments for you.

    Setting up WAL archiving

    We’ll start by setting a home for the archived WAL segments, then enable archiving in the configuration. Note that some filesystem paths and ownership may differ depending on your distro.

    1. Setup a destination directory for archived WAL segments.
      WAL_DIR=/var/lib/postgresql/8.4/wal-archive
      
      mkdir $WAL_DIR
      chmod 700 $WAL_DIR
      chown postgres.postgres $WAL_DIR
      
    2. Fix up the wal_level (needed for Postgres 9.0 or later) and set the archive_command in your postgresql.conf, you’ll need to manually substitute the value of WAL_DIR.

      It’s imperative that the archive_command does not return successfully unless it actually worked, otherwise Postgres will assume everything is fine and get rid of old WAL segments.

      Also notice that we’re testing for a file named backup_in_progress – you can in theory keep every WAL segment ever written, but they’re big and aren’t needed unless you want to perform point-in-time recovery. We’ll only want hot backups for now, so we don’t need to keep any WAL segments except those used during the main data file copy.

      # Only present in Postgres 9.0 or later
      wal_level = archive
      
      archive_mode = on
      archive_command = 'test ! -f $WAL_DIR/backup_in_progress || (test ! -f $WAL_DIR/%f && cp %p $WAL_DIR/%f)'
      
    3. Restart postgres, this is necessary because WAL archiving must begin from server startup.
      invoke-rc.d postgresql-8.4 restart

    Performing a backup run

    Performing a backup dump now is just a handful of commands. Because this is something we’ll be doing regularly (typically every night), we’ll make it into a script.

    Keeping dumps on the local system isn’t the most efficient way to do things, but it’s simple and works with whatever other backup software you use because it just leaves ordinary files on the disk. If you can’t afford the diskspace-cost of keeping dumps around, you could easily rsync the files straight to the backup server instead of using tar in the script.

    #!/bin/bash
    
    PG_ROOT=/var/lib/postgresql/8.4
    DATA_DIR="${PG_ROOT}/main"
    WAL_DIR="${PG_ROOT}/wal-archive"
    BACKUP_DESTINATION=/var/lib/backup/pgsql
    DATE=$(date +%Y%m%d-%H%M%S)
    
    # Signal that we're starting a backup
    touch $PG_ROOT/backup_in_progress
    sudo -u postgres /usr/bin/psql -c "SELECT pg_start_backup('backup_${DATE}');"  # May not return immediately if system is busy
    
    # Grab the main data files
    tar -zcf "${BACKUP_DESTINATION}/${DATE}_data.tar.gz" "$DATA_DIR"
    
    # Signal that we're done with the main part of the backup
    sudo -u postgres /usr/bin/psql -c "SELECT pg_stop_backup();"
    rm -f $PG_ROOT/backup_in_progress
    
    # Grab the WAL archives that were created while we were copying the main data files
    tar -zcf "${BACKUP_DESTINATION}/${DATE}_wal.tar.gz" "$WAL_DIR"
    
    # Nuke archived WAL segments now that we're done
    rm -f "$WAL_DIR"/*
    
    # You should now remove old dumps from $BACKUP_DESTINATION if they're older than your retention period.
    # tmpreaper or tmpwatch are tools that can do this for you.
    

    We’ll leave it to you to ensure this fits into the rest of your backup schedule. We trigger backup dumps via cron.

    Recovering from backups

    Of course all of this is useless if you can’t recover in the event of a disaster, be it accidental deletion or destruction of the whole server.

    This is a classic example:

    DROP TABLE all_our_financial_data_since_forever;
    

    Oh no! Thankfully we have backups.

    This process will revert everything back to the state as it was when the backup was taken. These backups are all-or-nothing.

    • If you have more recent changes that may need to be kept, you should find a way to save them. One (rather comprehensive) suggestion is to take a copy of the entire $DATA_DIR right after you stop the Postgres service.
    • If you only need a subset of the data (eg. a single table or database), it’s best to restore to a fresh Postgres instance, and leave the original running. Then you can use a tool like pg_dump to retrieve the bits you need.

    The official docs cover the specifics and potential caveats in plenty of detail, but we’ll gloss over them for the sake of simplicity.

    1. Stop the Postgres service:
      invoke-rc.d postgresql-8.4 stop
    2. Setup some convenient shell variables:
      PG_ROOT=/var/lib/postgresql/8.4
      DATA_DIR="${PG_ROOT}/main"
      WAL_DIR="${PG_ROOT}/wal-archive"
      BACKUP_DESTINATION=/var/lib/backup/pgsql
      
    3. Wipe all existing data files:
      rm -rfv ${DATA_DIR}/*
    4. Unpack the backed-up data files and the archived WAL segments:
      # These commands will extract to the root of the FS,
      # so the full path to $DATA_DIR in the archives is correct.
      tar -zxvf $BACKUP_DESTINATION/20120509-143400_data.tar.gz -C /
      tar -zxvf $BACKUP_DESTINATION/20120509-143400_wal.tar.gz -C /
      
    5. Wipe any “live” WAL files that were present during the filesystem backup, as they’ll be stale now:
      rm -rfv ${DATA_DIR}/pg_xlog/*
    6. Now would be a good time to deny any users/apps from connecting to Postgres while the recovery takes place. Edit your pg_hba.conf file to prevent access accordingly.
    7. Create a recovery file, recovery.conf, in the $DATA_DIR. If this file is detected during startup, Postgres will switch to recovery mode and start looking for the necessary archived WAL segments.
      # You'll need to substitute your own WAL_DIR.
      # Once again, it's imperative that the command throws an error if something
      # goes wrong (such as a file not existing), otherwise Postgres will misbehave.
      restore_command = 'cp $WAL_DIR/%f "%p"'
      

      Don’t forget to ensure ownership and permissions for the file are correct. Postgres will want to rename the file to recovery.done once it’s finished.

      chown postgres:postgres $DATA_DIR/recovery.conf
      chmod 0600 $DATA_DIR/recovery.conf
      
    8. Start Postgres and let it get to work:
      invoke-rc.d postgresql-8.4 start

      If you’re interested in what’s happening, you can follow Postgres’ logs and watch the recovery process. A really simple one looks something like this:

      2012-05-09 15:25:32 EST LOG:  database system was interrupted; last known up at 2012-05-09 14:34:00 EST
      2012-05-09 15:25:32 EST LOG:  creating missing WAL directory "pg_xlog/archive_status"
      2012-05-09 15:25:32 EST LOG:  starting archive recovery
      2012-05-09 15:25:32 EST LOG:  restore_command = 'cp /var/lib/postgresql/8.4/wal-archive/%f "%p"'
      cp: cannot stat `/var/lib/postgresql/8.4/wal-archive/00000001.history': No such file or directory
      2012-05-09 15:25:32 EST LOG:  restored log file "000000010000000000000020.00000020.backup" from archive
      2012-05-09 15:25:32 EST LOG:  restored log file "000000010000000000000020" from archive
      2012-05-09 15:25:32 EST LOG:  automatic recovery in progress
      2012-05-09 15:25:32 EST LOG:  redo starts at 0/20000068, consistency will be reached at 0/21F931D8
      2012-05-09 15:25:33 EST LOG:  restored log file "000000010000000000000021" from archive
      2012-05-09 15:25:33 EST LOG:  consistent recovery state reached
      2012-05-09 15:25:33 EST LOG:  incomplete startup packet
      cp: cannot stat `/var/lib/postgresql/8.4/wal-archive/000000010000000000000022': No such file or directory
      2012-05-09 15:25:33 EST LOG:  could not open file "pg_xlog/000000010000000000000022" (log file 0, segment 34): No such file or directory
      2012-05-09 15:25:33 EST LOG:  redo done at 0/21F931B8
      2012-05-09 15:25:33 EST LOG:  last completed transaction was at log time 2012-05-09 14:34:42.173134+10
      2012-05-09 15:25:33 EST LOG:  restored log file "000000010000000000000021" from archive
      cp: cannot stat `/var/lib/postgresql/8.4/wal-archive/00000002.history': No such file or directory
      2012-05-09 15:25:33 EST LOG:  selected new timeline ID: 2
      cp: cannot stat `/var/lib/postgresql/8.4/wal-archive/00000001.history': No such file or directory
      2012-05-09 15:25:33 EST LOG:  archive recovery complete
      2012-05-09 15:25:33 EST FATAL:  the database system is starting up
      2012-05-09 15:25:34 EST FATAL:  the database system is starting up
      2012-05-09 15:25:34 EST LOG:  autovacuum launcher started
      2012-05-09 15:25:34 EST LOG:  database system is ready to accept connections
      
    9. Once you’re done, you should remove the archived WAL segments again. Once you’ve successfully recovered, you don’t need them any more, and there’s no point having them hang around for the next backup run.
      rm -f "$WAL_DIR"/*
    10. If all is well, enable access for users/apps again by reverting your changes to pg_hba.conf and reloading Postgres’ config.

    Other considerations

    WAL archive backups are pretty straightforward and easy to setup. That said, it’s definitely recommended that you have a play with them and get comfortable before putting them into production, just like any backup solution. And as GLaDOS says, you must test, test, test – you need to test the recovery process end to end to make sure everything will work in your own setup. If recovery doesn’t work, you’re wasting time and effort by doing backups.

    We’re big on monitoring here at Anchor, and you should definitely monitor your backups. One thing we highly recommend doing is setting up a monitoring point to ensure that backup dumps are happening regularly and running as expected (eg. not too much variation in size between runs). A simple script that looks for recently-created tarballs and checks the size is sufficient.

  • 相关阅读:
    Linux入门-ssh scp rsync
    Linux入门-samba
    Linux入门-vsftp
    Linux入门-nfs
    MySQL卸载(rpm)
    rust_3_所有权
    rust_1_入门指南与猜猜看游戏
    vscode首次ssh remote linux连不上
    vbox新装ubuntu虚拟机注意事项
    IBM Rational Rose和Start UML的c++逆向工程自动生成UML类图组件
  • 原文地址:https://www.cnblogs.com/shuaixf/p/2697102.html
Copyright © 2020-2023  润新知