Howto – local and remote snapshot backup using rsync with hard links

Introduction

Your Linux servers are running smoothly? Fine.
Now if something incidentally gets wrong you need also to prepare an emergency plan.
And even when everything is going fine, a backup that is directly stored on the same server can still be useful:

  • for example when you need to see what was changed on a specific file,
  • or if you want to find the list of files that were installed or modified after the installation of an application

This article is then intended to show you how you can set up an open-source solution, using the magic of the famous ‘rsync’ tool and some shell-scripting, to deploy a backup system without the need of investing into expensive proprietary software.
Another advantage of a shell-script is that you can easily adapt it according to your specific needs, for example for you DRP architecture.

The proposed shell-script is derivate from the great work of Mikes Handy’s rotating-filesystem-snapshot utility (cf. http://www.mikerubel.org/computers/rsync_snapshots).
It creates backups of your full filesystem (snapshots) with the combined advantages of full and incremental backups:

  • It uses as little disk space as an incremental backup because all unchanged files are hard linked with existing files from previous backups; only the modified files require new inodes.
  • Because of the transparency of hard links, all backups are directly available and always online for read access with your usual programs; there is no need to extract the files from a full archive and also no complicate replay of incremental archives is necessary.

It is capable of doing local (self) backups and it can also be run from a remote backup server to centralize all backups to a safe place and therefore avoid correlated physical risks.
‘rsync’ features tremendous optimizations of bandwidth usage and transfers only the portions of a file that were changed thanks to its brilliant algorithms, created by Andrew Tridgell. (cf. http://bryanpendleton.blogspot.ch/2010/05/rsync-algorithm.html)
‘rsync’ is also using network encryption via ‘ssh’.

The script let you achieve:

  • local or remote backups with extremely low bandwidth requirement
  • file level deduplication between backups using hard links (also across servers on the remote backup server)
  • specify a bandwidth limit to moderate the network and I/O load on production servers
  • backup retention policy:
    • per server disk quota restrictions: for example never exceed 50GB and always keep 100GB of free disk
    • rotation of backups with non-linear distribution, with the idea that recent backups are more useful than older, but that sometimes you still need a very old backup
  • filter rules to include or exclude specific patterns of folders and files
  • integrity protection, the backups have a ‘chattr’ read-only protection and a MD5 integrity signature can also be calculated incrementally

 

Installation

The snapshot backups are saved into the ‘/backup’ folder.
You can also create a symbolic link to point to another partition with more disk space, for example:

ln -sv mnt/bigdisk /backup

Then create the folders:

mkdir -pv /backup/snapshot/{$(hostname),rsync,md5-log}
ln -sv $(hostname) /backup/snapshot/localhost

Now create the shell-script ‘/backup/snapshot/rsync/rsync-snapshot.sh’ (download rsync-snapshot.sh):

#!/bin/bash
# ----------------------------------------------------------------------
# created by francois scheurer on 20070323
# derivate from mikes handy rotating-filesystem-snapshot utility
# see http://www.mikerubel.org/computers/rsync_snapshots
# ----------------------------------------------------------------------
#rsync note:
#    1) rsync -avz /src/foo  /dest      => ok, creates /dest/foo, like cp -a /src/foo /dest
#    2) rsync -avz /src/foo/ /dest/foo  => ok, creates /dest/foo, like cp -a /src/foo/. /dest/foo (or like cp -a /src/foo /dest)
#    3) rsync -avz /src/foo/ /dest/foo/ => ok, same as 2)
#    4) rsync -avz /src/foo/ /dest      => dangerous!!! overwrite dest content, like cp -a /src/foo/. /dest
#      solution: remove trailing / at /src/foo/ => 1)
#      minor problem: rsync -avz /src/foo /dest/foo => creates /dest/foo/foo, like mkdir /dest/foo && cp -a /src/foo /dest/foo
#    main options:
#      -H --hard-links
#      -x --one-file-system
#      -a equals -rlptgoD (no -H,-A,-X)
#        -r --recursive
#        -l --links
#        -p --perms
#        -t --times
#        -g --group
#        -o --owner
#        -D --devices --specials
#    useful options:
#      -S --sparse
#      -n --dry-run
#      -I --ignore-times
#      -c --checksum
#      -z --compress
#      -bwlimit=X limit disk IO to X kB/s
#    other options:
#      -v --verbose
#      -y --fuzzy
#      --stats
#      -h --human-readable
#      --progress
#      -i --itemize-changes
#    quickcheck options:
#      the default behavior is to skip files with same size & mtime on destination
#      mtime = last data write access
#      atime = last data read access (can be ignored with noatime mount option or with chattr +A)
#      ctime = last inode change (write access, change of permission or ownership)
#      note that a checksum is always done after a file synchronization/transfer
#      --modify-window=X ignore mtime differences less or equal to X sec
#      --size-only skip files with same size on destination (ignore mtime)
#      -c --checksum skip files with same MD5 checksum on destination (ignore size & mtime, all files are read once, then the list of files to be resynchronized is read a second time, there is a lot of disk IO but network trafic is minimal if many files are identical; log includes only different files)
#      -I --ignore-times never skip files (all files are resynchronized, all files are read once, there is more network trafic than with --checksum but less disk IO and hence is faster than --checksum if net is fast or if most files are different; log includes all files)
#      --link-dest does the quickcheck on another reference-directory and makes hardlinks if quickcheck succeeds
#        (however, if mtime is different and --perms is used, the reference file is copied in a new inode)
#    see also this link for a rsync tutorial: http://www.thegeekstuff.com/2010/09/rsync-command-examples/
#todo:
#                 'du' slow on many snapshot.X..done
#  autokill after n minutes.
#                 if disk full, its better to replace the snapshot.001 than to cancel and have a very old backup (even if it may fail to create the snapshot and ends with 0 backups)..done
#                 rsync-snapshot for oracle redo logs..old
#                 'find'-list with md5 signatures -> .gz file stored aside rsync.log.gz inside the snapshot.X folder; this file will be move to parent dir /backup/snapshot/localhost/ before deletion of a snapshot; this file will also be used to extract an incremental backup with tape-arch.sh..done (md5sum calculation with rsync-list.sh for acm14=18m58 and only 5m27 with a reference file. speedup is ~250-300%)
#  realtime freedisk display with echo $(($(stat -f -c "%f" /backup/snapshot/) * 4096 / 1024))
#  use authorized_keys with restriction of bash (command=) and set sshd_config with PermitRootLogin=forced-commands-only, see http://troy.jdmz.net/rsync/index.html http://www.snailbook.com/faq/restricted-scp.auto.html
#  note: rsync lists all files in snapshot.X disregarding inclusion patterns, this is slow.




# ------------- the help page ------------------------------------------
if [ "$1" == "-h" ] || [ "$1" == "--help" ]
then
  cat << "EOF"
Version 2.01 2013-01-16

USAGE: rsync-snapshot.sh HOST [--recheck]

PURPOSE: create a snapshot backup of the whole filesystem into the folder
  '/backup/snapshot/HOST/snapshot.001'.
  If HOST is 'localhost' it is replaced with the local hostname.
  If HOST is a remote host then rsync over ssh is used to transfer the files
  with a delta-transfer algorithm to transfer only minimal parts of the files
  and improve speed; rsync uses for this the previous backup as reference.
  This reference is also used to create hard links instead of files when
  possible and thus save disk space. If original and reference file have
  identical content but different timestamps or permissions then no hard link
  is created.
  A rotation of all backups renames snapshot.X into snapshot.X+1 and removes
  backups with X>512. About 10 backups with non-linear distribution are kept
  in rotation; for example with X=1,2,3,4,8,16,32,64,128,256,512.
  The snapshots folders are protected read-only against all users including
  root using 'chattr'.
  The --recheck option forces a sync of all files even if they have same mtime
  & size; it is can verify a backup and fix corrupted files;
  --recheck recalculates also the MD5 integrity signatures without using the
  last signature-file as precalculation.
  Some features like filter rules, MD5, chattr, bwlimit and per server retention
  policy can be configured by modifying the scripts directly.

FILES:
    /backup/snapshot/rsync/rsync-snapshot.sh  the backup script
    /backup/snapshot/rsync/rsync-list.sh      the md5 signature script
    /backup/snapshot/rsync/rsync-include.txt  the filter rules

Examples:
  (nice -5 ./rsync-snapshot.sh >log &) ; tail -f log
  cd /backup/snapshot; for i in $(ls -A); do nice -10 /backup/snapshot/rsync/rsync-snapshot.sh $i; done
EOF
  exit 1
fi




# ------------- tuning options, file locations and constants -----------
SRC="$1" #name of backup source, may be a remote or local hostname
OPT="$2" #options (--recheck)
HOST_PORT=22 #port of source of backup
SCRIPT_PATH="/backup/snapshot/rsync"
SNAPSHOT_DST="/backup/snapshot" #destination folder
NAME="snapshot" #backup name
LOG="rsync.log"
MIN_MIBSIZE=5000 # older snapshots (except snapshot.001) are removed if free disk <= MIN_MIBSIZE. the script may exit without performing a backup if free disk is still short.
OVERWRITE_LAST=0 # if free disk space is too small, then this option let us remove snapshot.001 as well and retry once
MAX_MIBSIZE=80000 # older snapshots (except snapshot.001) are removed if their size >= MAX_MIBSIZE. the script performs a backup even if their size is too big.
#old: SPEED=5 # 1 is slow, 100 is fast, 100000 faster and 0 does not use slow-down. this allows to avoid rsync consuming too much system performance
BWLIMIT=100000 # bandwidth limit in KiB/s. 0 does not use slow-down. this allows to avoid rsync consuming too much system performance
BACKUPSERVER="rembk" # this server connects to all other to download filesystems and create remote snapshot backups
MD5LIST=0 #to compute a list of md5 integrity signatures of all backuped files, need 'rsync-list.sh'
CHATTR=1 # to use 'chattr' command and protect the backups again modification and deletion
DU=1 # to use 'du' command and calculate the size of existing backups, disable it if you have many backups and it is getting too slow (for example on BACKUPSERVER)
SOURCE="/" #source folder to backup

HOST_LOCAL="$(hostname -s)" #local hostname
#HOST_SRC="${SRC:-${HOST_LOCAL}}" #explicit source hostname, default is local hostname
if [ -z "${SRC}" ] || [ "${SRC}" == "localhost" ]
then
  HOST_SRC="${HOST_LOCAL}" #explicit source hostname, default is local hostname
else
  HOST_SRC="${SRC}" #explicit source hostname
fi

if [ "${HOST_LOCAL}" == "${BACKUPSERVER}" ] #if we are on BACKUPSERVER then do some fine tuning
then
  MD5LIST=1
  MIN_MIBSIZE=35000 #needed free space for chunk-file tape-arch.sh
  MAX_MIBSIZE=12000
  DU=0 # NB: 'du' is currently disabled on BACKUPSERVER for performance reasons
elif [ "${HOST_LOCAL}" == "${HOST_SRC}" ] #else if we are on a generic server then do other some fine tuning
then
  if [ "${HOST_SRC}" == "ZRHSV-TST01" ]; then MIN_MIBSIZE=500; CHATTR=0; DU=0; MD5LIST=0; fi
fi




# ------------- initialization -----------------------------------------
shopt -s extglob                                            #enable extended pattern matching operators

OPTION="--stats \
  --recursive \
  --links \
  --perms \
  --times \
  --group \
  --owner \
  --devices \
  --hard-links \
  --numeric-ids \
  --delete \
  --delete-excluded \
  --bwlimit=${BWLIMIT}"
#  --progress
#  --size-only
#  --stop-at
#  --time-limit
#  --sparse

if [ "${HOST_SRC}" != "${HOST_LOCAL}" ] #option for a remote server
then
  SOURCE="${HOST_SRC}:${SOURCE}"
  OPTION="${OPTION} \
  --compress \
  --rsh=\"ssh -p ${HOST_PORT} -i /root/.ssh/rsync_rsa -l root\" \
  --rsync-path=\"/usr/bin/rsync\""
fi
if [ "${OPT}" == "--recheck" ]
then
  OPTION="${OPTION} \
  --ignore-times"
elif [ -n "${OPT}" ]
then
  echo "Try rsync-snapshot.sh --help ."
  exit 2
fi




# ------------- check conditions ---------------------------------------
echo "$(date +%Y-%m-%d_%H:%M:%S) ${HOST_SRC}: === Snapshot backup is created into ${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.001 ==="
STARTDATE=$(date +%s)

# make sure we're running as root
if (($(id -u) != 0))
then
  echo "Sorry, must be root. Exiting..."
  echo "$(date +%Y-%m-%d_%H:%M:%S) ${HOST_SRC}: === Snapshot failed. ==="
  exit 2
fi

# make sure we have a correct snapshot folder
if [ ! -d "${SNAPSHOT_DST}/${HOST_SRC}" ]
then
  echo "Sorry, folder ${SNAPSHOT_DST}/${HOST_SRC} is missing. Exiting..."
  echo "$(date +%Y-%m-%d_%H:%M:%S) ${HOST_SRC}: === Snapshot failed. ==="
  exit 2
fi

# make sure we do not have started already rsync-snapshot.sh or rsync process (started by rsync-cp.sh or by a remote rsync-snapshot.sh) in the background.
if [ "${HOST_LOCAL}" != "${BACKUPSERVER}" ] #because BACKUPSERVER need sometimes to perform an rsync-cp.sh it must disable the check of "already started".
then
  #RSYNCPID=$(pgrep -f "/bin/bash .*rsync-snapshot.sh")
  #if ([ -n "${RSYNCPID}" ] && [ "${RSYNCPID}" != "$$" ]) #|| pgrep -x "rsync"
  if pgrep -f "/bin/\w*sh \w*rsync-snapshot\.sh" | grep -qv "$$"
  then
    echo "Sorry, rsync is already running in the background. Exiting..."
    echo "$(date +%Y-%m-%d_%H:%M:%S) ${HOST_SRC}: === Snapshot failed. ==="
    exit 2
  fi
fi




# ------------- remove some old backups --------------------------------
# remove certain snapshots to achieve an exponential distribution in time of the backups (1,2,4,8,...)
for b in 512 256 128 64 32 16 8 4
do
  let a=b/2+1
  let f=0 #this flag is set to 1 when we find the 1st snapshot in the range b..a
  for i in $(seq -f'%03g' "${b}" -1 "${a}")
  do
    if [ -d "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i}" ]
    then
      if [ "${f}" -eq 0 ]
      then
        let f=1
      else
        echo "$(date +%Y-%m-%d_%H:%M:%S) Removing ${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i} ..."
        [ "${CHATTR}" -eq 1 ] && chattr -R -i "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i}" &>/dev/null
        rm -rf "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i}"
      fi
    fi
  done
done

# remove additional backups if free disk space is short
remove_snapshot() {
  local MIN_MIBSIZE2=$1
  local MAX_MIBSIZE2=$2
  for i in $(seq -f'%03g' 512 -1 001)
  do
    if [ -d "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i}" ] || [ ${i} -eq 1 ]
    then
      [ ! -h "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.last" ] && [ -d "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i}" ] && ln -s "${NAME}.${i}" "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.last"
      let d=0 #disk space used by snapshots and free disk space are ok
      echo -n "$(date +%Y-%m-%d_%H:%M:%S) Checking free disk space... "
      FREEDISK=$(df -m ${SNAPSHOT_DST} | tail -1 | sed -e 's/  */ /g' | cut -d" " -f4 | sed -e 's/M*//g')
      echo -n "${FREEDISK} MiB free. "
      if [ ${FREEDISK} -ge ${MIN_MIBSIZE2} ]
      then
        echo "Ok, bigger than ${MIN_MIBSIZE2} MiB."
        if [ "${DU}" -eq 0 ]
        then #avoid slow 'du'
          break
        else
          echo -n "$(date +%Y-%m-%d_%H:%M:%S) Checking disk space used by ${SNAPSHOT_DST}/${HOST_SRC} ... "
          USEDDISK=$(du -ms "${SNAPSHOT_DST}/${HOST_SRC}/" | cut -f1)
          echo -n "${USEDDISK} MiB used. "
          if [ ${USEDDISK} -le ${MAX_MIBSIZE2} ]
          then
            echo "Ok, smaller than ${MAX_MIBSIZE2} MiB."
            break
          else
            let d=2 #disk space used by snapshots is too big
          fi
        fi
      else
        let d=1 #free disk space is too small
      fi
      if [ ${d} -ne 0 ] #we need to remove snapshots
      then
        if [ ${i} -ne 1 ]
        then
          echo "Removing ${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i} ..."
          [ "${CHATTR}" -eq 1 ] && chattr -R -i "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i}" &>/dev/null
          rm -rf "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i}"
          [ -h "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.last" ] && rm -f "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.last"
        else #all snapshots except snapshot.001 are removed
          if [ ${d} -eq 1 ] #snapshot.001 causes that free space is too small
          then
            if [ "${OVERWRITE_LAST}" -eq 1 ] #last chance: remove snapshot.001 and retry once
            then
              OVERWRITE_LAST=0
              echo "Warning, free disk space will be smaller than ${MIN_MIBSIZE} MiB."
              echo "$(date +%Y-%m-%d_%H:%M:%S) OVERWRITE_LAST enabled. Removing ${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.001 ..."
              rm -rf "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.001"
              [ -h "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.last" ] && rm -f "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.last"
            else
              for j in ${LNKDST//--link-dest=/}
              do
                if [ -d "${j}" ] && [ "${CHATTR}" -eq 1 ] && [ $(lsattr -d "${j}" | cut -b5) != "i" ]
                then
                  chattr -R +i "${j}" &>/dev/null #undo unprotection that was needed to use hardlinks
                fi
              done
              [ ! -h "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.last" ] && ln -s "${NAME}.${j}" "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.last"
              echo "Sorry, free disk space will be smaller than ${MIN_MIBSIZE} MiB. Exiting..."
              echo "$(date +%Y-%m-%d_%H:%M:%S) ${HOST_SRC}: === Snapshot failed. ==="
              exit 2
            fi
          elif [ ${d} -eq 2 ] #snapshot.001 causes that disk space used by snapshots is too big
          then
            echo "Warning, disk space used by ${SNAPSHOT_DST}/${HOST_SRC} will be bigger than ${MAX_MIBSIZE} MiB. Continuing anyway..."
          fi
        fi
      fi
    fi
  done
}

# perform an estimation of required disk space for the new backup
while : #this loop is executed a 2nd time if OVERWRITE_LAST was ==1 and snapshot.001 got removed
do
  OOVERWRITE_LAST="${OVERWRITE_LAST}"
  echo -n "$(date +%Y-%m-%d_%H:%M:%S) Testing needed free disk space ..."
  mkdir -p "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.test-free-disk-space"
  chmod -R 775 "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.test-free-disk-space"
  cat /dev/null >"${SNAPSHOT_DST}/${HOST_SRC}/${LOG}"
  LNKDST=$(find "${SNAPSHOT_DST}/" -maxdepth 2 -type d -name "${NAME}.001" -printf " --link-dest=%p")
  for i in ${LNKDST//--link-dest=/}
  do
    if [ -d "${i}" ] && [ "${CHATTR}" -eq 1 ] && [ $(lsattr -d "${i}" | cut -b5) == "i" ]
    then
      chattr -R -i "${i}" &>/dev/null #unprotect last snapshots to use hardlinks
    fi
  done
  eval rsync \
    --dry-run \
    ${OPTION} \
    --include-from="${SCRIPT_PATH}/rsync-include.txt" \
    ${LNKDST} \
    "${SOURCE}" "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.test-free-disk-space" >>"${SNAPSHOT_DST}/${HOST_SRC}/${LOG}"
  RES=$?
  if [ "${RES}" -ne 0 ] && [ "${RES}" -ne 23 ] && [ "${RES}" -ne 24 ]
  then
    echo "Sorry, error in rsync execution (value ${RES}). Exiting..."
    echo "$(date +%Y-%m-%d_%H:%M:%S) ${HOST_SRC}: === Snapshot failed. ==="
    exit 2
  fi
  let i=$(tail -100 "${SNAPSHOT_DST}/${HOST_SRC}/${LOG}" | grep 'Total transferred file size:' | cut -d " " -f5)/1048576
  echo " ${i} MiB needed."
  rm -rf "${SNAPSHOT_DST}/${HOST_SRC}/${LOG}" "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.test-free-disk-space"
  remove_snapshot $((${MIN_MIBSIZE} + ${i})) $((${MAX_MIBSIZE} - ${i}))
  if [ "${OOVERWRITE_LAST}" == "${OVERWRITE_LAST}" ] #no need to retry
  then
    break
  fi
done




# ------------- create the snapshot backup -----------------------------
# perform the filesystem backup using rsync and hard-links to the latest snapshot
# Note:
#   -rsync behaves like cp --remove-destination by default, so the destination
#    is unlinked first.  If it were not so, this would copy over the other
#    snapshot(s) too!
#   -use --link-dest to hard-link when possible with previous snapshot,
#    timestamps, permissions and ownerships are preserved
echo "$(date +%Y-%m-%d_%H:%M:%S) Creating folder ${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.000 ..."
mkdir -p "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.000"
chmod 775 "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.000"
cat /dev/null >"${SNAPSHOT_DST}/${HOST_SRC}/${LOG}"
echo -n "$(date +%Y-%m-%d_%H:%M:%S) Creating backup of ${HOST_SRC} into ${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.000"
if [ -n "${LNKDST}" ]
then
  echo " hardlinked with${LNKDST//--link-dest=/} ..."
else
  echo " not hardlinked ..."
fi
eval rsync \
  -vv \
  ${OPTION} \
  --include-from="${SCRIPT_PATH}/rsync-include.txt" \
  ${LNKDST} \
  "${SOURCE}" "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.000" >>"${SNAPSHOT_DST}/${HOST_SRC}/${LOG}"
RES=$?
if [ "${RES}" -ne 0 ] && [ "${RES}" -ne 23 ] && [ "${RES}" -ne 24 ]
then
  echo "Sorry, error in rsync execution (value ${RES}). Exiting..."
  echo "$(date +%Y-%m-%d_%H:%M:%S) ${HOST_SRC}: === Snapshot failed. ==="
  exit 2
fi
for i in ${LNKDST//--link-dest=/}
do
  if [ -d "${i}" ] && [ "${CHATTR}" -eq 1 ] && [ $(lsattr -d "${i}" | cut -b5) != "i" ]
  then
    chattr -R +i "${i}" &>/dev/null #undo unprotection that was needed to use hardlinks
  fi
done
mv "${SNAPSHOT_DST}/${HOST_SRC}/${LOG}" "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.000/${LOG}"
gzip -f "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.000/${LOG}"




# ------------- create the MD5 integrity signature ---------------------
# create a gziped 'find'-list of all snapshot files (including md5 signatures)
if [ "${MD5LIST}" -eq 1 ]
then
  echo "$(date +%Y-%m-%d_%H:%M:%S) Computing filelist with md5 signatures of ${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.000 ..."
  OWD="$(pwd)"
  cd "${SNAPSHOT_DST}"
#  NOW=$(date "+%s")
#  MYTZ=$(date "+%z")
#  let NOW${MYTZ:0:1}=3600*${MYTZ:1:2}+60*${MYTZ:3:2} # convert localtime to UTC
#  DATESTR=$(date -d "1970-01-01 $((${NOW} - 1)) sec" "+%Y-%m-%d_%H:%M:%S") # 'now - 1s' to avoid missing files
  DATESTR=$(date -d "1970-01-01 UTC $(($(date +%s) - 1)) seconds" "+%Y-%m-%d_%H:%M:%S") # 'now - 1s' to avoid missing files
  REF_LIST="$(find ${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.001/ -maxdepth 1 -type f -name 'snapshot.*.list.gz' 2>/dev/null)"
  if [ -n "${REF_LIST}" ] && [ "${OPT}" != "--recheck" ]
  then
    REF_LIST2="/tmp/rsync-reflist.tmp"
    gzip -dc "${REF_LIST}" >"${REF_LIST2}"
    touch -r "${REF_LIST}" "${REF_LIST2}"
    ${SCRIPT_PATH}/rsync-list.sh "${HOST_SRC}/${NAME}.000" 0 "${REF_LIST2}" | sort -u | gzip -c >"${HOST_SRC}/${NAME}.${DATESTR}.list.gz"
    rm -f "${REF_LIST2}"
  else
    ${SCRIPT_PATH}/rsync-list.sh "${HOST_SRC}/${NAME}.000" 0 | sort -u | gzip -c >"${HOST_SRC}/${NAME}.${DATESTR}.list.gz"
  fi
  touch -d "${DATESTR/_/ }" "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${DATESTR}.list.gz"
  cd "${OWD}"
  [ ! -d "${SNAPSHOT_DST}/${HOST_SRC}/md5-log" ] && mkdir -p "${SNAPSHOT_DST}/${HOST_SRC}/md5-log"
  cp -al "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${DATESTR}.list.gz" "${SNAPSHOT_DST}/${HOST_SRC}/md5-log/${NAME}.${DATESTR}.list.gz"
  mv "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${DATESTR}.list.gz" "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.000/${NAME}.${DATESTR}.list.gz"
  touch -d "${DATESTR/_/ }" "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.000"
fi




# ------------- finish and clean up ------------------------------------
# protect the backup against modification with chattr +immutable
if [ "${CHATTR}" -eq 1 ]
then
  echo "$(date +%Y-%m-%d_%H:%M:%S) Setting recursively immutable flag of ${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.000 ..."
  chattr -R +i "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.000" &>/dev/null
fi

# rotate the backups
if [ -d "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.512" ] #remove snapshot.512
then
  echo "Removing ${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.512 ..."
  [ "${CHATTR}" -eq 1 ] && chattr -R -i "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.512" &>/dev/null
  rm -rf "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.512"
fi
[ -h "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.last" ] && rm -f "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.last"
for i in $(seq -f'%03g' 511 -1 000)
do
  if [ -d "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i}" ]
  then
    let j=${i##+(0)}+1
    j=$(printf "%.3d" "${j}")
    echo "Renaming ${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i} into ${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${j} ..."
    [ "${CHATTR}" -eq 1 ] && chattr -i "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i}" &>/dev/null
    mv "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i}" "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${j}"
    [ "${CHATTR}" -eq 1 ] && chattr +i "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${j}" &>/dev/null
    [ ! -h "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.last" ] && ln -s "${NAME}.${j}" "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.last"
  fi
done

# remove additional backups if free disk space is short
OVERWRITE_LAST=0 #next call of remove_snapshot() will not remove snapshot.001
remove_snapshot ${MIN_MIBSIZE} ${MAX_MIBSIZE}
echo "$(date +%Y-%m-%d_%H:%M:%S) ${HOST_SRC}: === Snapshot backup successfully done in $(($(date +%s) - ${STARTDATE})) sec. ==="
exit 0
#eof

 

Then create the file ‘/backup/snapshot/rsync/rsync-include.txt’ (download rsync-include) that contains the include and exclude patterns:

#created by francois scheurer on 20120828
#
#note:
#  -be careful with trailing spaces, '- * ' is different from '- *'
#  -rsync stops at first matched rule and ignore the rest
#  -rsync descends iteratively the folders-arborescence
#  -'**' matches also zero, one or several '/'
#  -get the list of all root files/folders
#     pdsh -f 1 -w server[1-22] 'ls -la / | sed -e "s/  */ /g" | cut -d" " -f9-' | cut -d" " -f2- | sort -u
#  -include all folders with '+ */' (missing this rule implies that '- *' will override all the inclusions of any subfolders)
#  -exclude all non explicited files with '- *'
#  -exclude everything except /etc/ssh: '+ /etc/ssh/** \ + */ \ - *'
#  -exclude content of /tmp but include foldername: '- /tmp/* \ + */'
#  -exclude content and also foldername /tmp: '- /tmp/ \ + */'
#  -exclude content of each .ssh but include foldername: '- /**/.ssh/* \ + */'
#
#include everything except /tmp/:
#- /tmp/
#same but include /tmp/ as an empty folder:
#- /tmp/*
#include only /var/www/:
#+ /var/
#+ /var/www/
#+ /var/www/**
#- *
#same but also include folder structure:
#+ /var/www/**
#+ */
#- *




#pattern list for / (include by default):
+ /

#+ /boot/
#+ /boot/**
#- *

- /lost+found/*
- /*.bak*
- /*.old*
#- /backup/*
#- /boot/*
#- /etc/ssh/ssh_host*
#- /home/*
- /media/*
#- /mnt/*/*
#- /opt/*
#- /opt/fedora*/data/*
#- /opt/fedora*/lucene/*
- /opt/fedora*/tomcat*/logs/*
- /opt/fedora*/tomcat*/temp/*
- /opt/fedora*/tomcat*/work/*
- /proc/*
- /root/old/*
#- /root/.bash_history
- /root/.mc/*
#- /root/.ssh/*openssh*
- /root/.viminfo
- /root/tmp/*
#- /srv/*
- /sys/*
- /tmp/*
#- /usr/local/franz/logstat/logstat.log
- /var/cache/*
- /var/lib/mysql/*
- /var/lib/postgresql/*/main/wal_archive/*
- /var/lib/postgresql/*/main/pg_log/*
#- /var/lib/postgresql/*/main/pg_xlog/*
- /var/lib/postgresql/*/main/postmaster.opts
- /var/lib/postgresql/*/main/postmaster.pid
- /var/lib/postgresql/*/main/backup_in_progress
- /var/lib/postgresql/*/main/backup_label
#- /var/lib/postgresql/*/main/*/*
- /var/log/*
#- /var/spool/*
- /var/tmp/*

#pattern list for /backup/ and /mnt/ (exclude by default):
+ /backup/
- /backup/lost+found/*
- /backup/*.bak*
- /backup/*.old*
+ /backup/snapshot/
+ /backup/snapshot/rsync/
+ /backup/snapshot/rsync/**
- /backup/snapshot/*
+ /backup/db/
- /backup/*
- /mnt/*.bak*
- /mnt/*.old*
- /mnt/old/
- /mnt/*/*.bak*
- /mnt/*/*.old*
- /mnt/*/old/
+ /mnt/sas/*
+ /mnt/ssd/*
- /mnt/*/tmp/*
#- /mnt/*/opt/*
#- /mnt/*/opt/fedora*/data/*
#- /mnt/*/opt/fedora*/lucene/*
- /mnt/*/opt/fedora*/tomcat*/logs/*
- /mnt/*/opt/fedora*/tomcat*/temp/*
- /mnt/*/opt/fedora*/tomcat*/work/*
- /mnt/*/postgresql/*/main/wal_archive/*
- /mnt/*/postgresql/*/main/pg_log/*
#- /mnt/*/postgresql/*/main/pg_xlog/*
- /mnt/*/postgresql/*/main/postmaster.opts
- /mnt/*/postgresql/*/main/postmaster.pid
- /mnt/*/postgresql/*/main/backup_in_progress
- /mnt/*/postgresql/*/main/backup_label
#- /mnt/*/postgresql/*/main/*/*
+ /mnt/*/backup/
+ /mnt/*/backup/snapshot/
+ /mnt/*/backup/snapshot/rsync/
+ /mnt/*/backup/snapshot/rsync/**
- /mnt/*/backup/snapshot/*
+ /mnt/*/backup/db/
- /mnt/*/backup/*
- /mnt/*/*
+ /c/
+ /c/backup/
+ /c/backup/snapshot/
+ /c/backup/snapshot/rsync/
+ /c/backup/snapshot/rsync/**
- /c/backup/snapshot/*
+ /c/backup/db/
- /c/backup/*
- /c/*/*
+ /home/
+ /home/backup/
+ /home/backup/snapshot/
+ /home/backup/snapshot/rsync/
+ /home/backup/snapshot/rsync/**
- /home/backup/snapshot/*
+ /home/backup/db/
- /home/backup/*
- /USB/*

#pattern list for /boot/ (include by default):
+ /boot/
- /boot/lost+found/*
- /boot/*.bak*
- /boot/*.old*
+ /boot/**

#pattern list for /home/ (include by default):
+ /home/
- /home/lost+found/*
- /home/*.bak*
- /home/*.old*
- /home/xen/*
+ /home/**

#include folder structure by default:
#+ */
#include everything by default:
+ *
#exclude everything by default:
#- *
#eof

 

And finally create the optional shell-script ‘/backup/snapshot/rsync/rsync-list.sh’ (download rsync-list.sh) that calculates the MD5 integrity signatures:

#!/bin/bash
# created by francois scheurer on 20081109
# this script is used by rsync-snapshot.sh,
# it recursively prints to stdout the filelist of folder $1 and computes md5 signatures
# it deals correctly with special filenames with newlines or '\'
# note1: the script assumes that a file is unchanged if its size and ctime are unchanged;
#   this assumption has a very small risk of being wrong:
#   it could be wrong if two files with different contents but same filename and size are created in the same second in two directories;
#   if the first directory is then removed and the second is renamed as the first, the file is not detected as changed.
# note2: ctime checking can be replaced by mtime checking if CTIME_CHK=0;
#   this is needed by rsync-snapshot.sh (because of hard links creation that do not preserve ctime).




# ------------- the help page ---------------------------------------------
if [ "$1" == "-h" ] || [ "$1" == "--help" ]
then
  cat << "EOF"
Version 1.6 2009-06-19

USAGE: rsync-list.sh PATH/DIR CTIME_CHK [REF_LIST]

PURPOSE: recursively prints to stdout the filelist of folder PATH/DIR and computes md5 integrity signatures.
  It deals correctly with special filenames with newlines or '\'.
  If a ref_list is provided, it is used to avoid the re-calculation of md5 on files
  with unchanged filename and ctime.
  A ref_list is a file containing the output of a previously execution of this shell-script.
  The script assumes that a file is unchanged if its size and ctime are unchanged.
  The ref_list_mtime is used to force a md5 re-calculation of all files with newer ctime:
  -if file_ctime > ref_list_mtime then re-calculate md5
  -if file_ctime = ref_file_ctime then use ref_list
  CTIME_CHK can be 1 to base the algorithm on ctime or 0 to base it on mtime.

NOTE: the script assumes that all processes avoid all file modifications in PATH/DIR during the script's execution,
  you should read following remarks if this assumption cannot be guaranted:
  -a recent ref_list_mtime (>= date_of_first_write_to_ref_list) causes the script
   to miss all files with: ref_list_mtime >= file_ctime > ref_file_ctime
   solution: 'touch' ref_file_mtime with date_of_first_write_to_ref_list - 1 second
  -an old ref_list_mtime (< date_of_last_write_to_ref_list) causes the script
   to double all files with: ref_list_mtime < file_ctime = ref_file_ctime
   solution: pipe the output to 'sort -u'

EXAMPLE:
  DATESTR=$( date -d "1970-01-01 UTC $(( $( date +%s ) - 1 )) seconds" "+%Y-%m-%d_%H:%M:%S" ) # 'now - 1s' to avoid missing files
  REF_LIST="/etc.2008-11-23_10:00:00.list.gz"
  REF_LIST2="/tmp/rsync-reflist.tmp"
  gzip -dc "${REF_LIST}" >"${REF_LIST2}"
  touch -r "${REF_LIST}" "${REF_LIST2}"
  ./rsync-list.sh "/etc/" 1 "${REF_LIST2}" | sort -u | gzip -c >"/etc.${DATESTR}.list.gz" # 'sort -u' to avoid doubling files
  rm "${REF_LIST2}"
  touch -d "${DATESTR/_/ }" "/etc.${DATESTR}.list.gz"
EOF
  exit 1
elif [ $# -ne 2 ] && [ $# -ne 3 ]
then
  echo "Sorry, you must provide 2 or 3 arguments. Exiting..."
  exit 2
fi




# ------------- file locations and constants ---------------------------
SRC="$1" #name of source of backup, remote or local hostname
CTIME_CHK=$2 #1 for ctime checking, 0 for mtime checking
if [ "$CTIME_CHK" -eq 1 ]
then
  CTIME_STAT="%z"
  CTIME_FIND="-cnewer"
else
  CTIME_STAT="%y"
  CTIME_FIND="-newer"
fi
REF="$3" #filename of optional reference list
SCRIPT_PATH="/backup/snapshot/rsync"
FINDSCRIPT="$SCRIPT_PATH/rsync-find.sh.tmp" # temporary shell-script to calculate filelist




# ------------- using reference list to to reduce md5 calculation time -
if [ -n "$REF" ] #we have a previous md5 list
then

  if ! [ -s "$REF" ] #invalid reference list
  then
     echo "Error: $REF is not a valid reference list. Exiting..."
     exit 2
  fi

  touch /tmp/testsystime.tmp
  if ! [ /tmp/testsystime.tmp -nt "$REF" ] #if system time is incorrect then exit
  then
    echo "Error: system time is older than mtime of $REF. Exiting..."
    rm /tmp/testsystime.tmp
    exit 2
  fi
  rm /tmp/testsystime.tmp

  cat "$REF" | while read -r LINE #consider all previous files that still exist now with same ctime and size and print their already calculated md5
  do
    SIZE_AND_CTIME="${LINE#* md5sum=* * * * }" #extract size and ctime from reference list
    SIZE_AND_CTIME="${SIZE_AND_CTIME% \`*}"
    LINE2="${LINE%% md5sum=*}"    #1) keep only the filename part of the line
    LINE2="${LINE2//\\\\n/
}"                                #2) replace '\n' with newline, the problem now is that '\\n' is replaced, too (following is not a solution because it removes previous char LINE2="${LINE2//[^\\\\]\\\\n/newline}")
    LINE2="${LINE2//\\\\
/\\\\n}"                          #3) replace '\'+newline with '\\n', fixing the problem of 2)
    LINE2="${LINE2//\\\\\\\\/\\}" #4) replace '' with '\'
    if [ -a "$LINE2" ] || [ -h "$LINE2" ] #check if file still exists
    then
      SIZE_AND_CTIME2=$( stat -c"%s $CTIME_STAT" "$LINE2" )
      SIZE_AND_CTIME2="${SIZE_AND_CTIME2#* md5sum=* * * * }" #get size and ctime from current file
      SIZE_AND_CTIME2="${SIZE_AND_CTIME2% \`*}"
      if [ "$SIZE_AND_CTIME" == "$SIZE_AND_CTIME2" ] #current file unchanged (see above note), so print the already calculated md5
      then
        echo "$LINE"
      elif [ "${SIZE_AND_CTIME#* }" == "${SIZE_AND_CTIME2#* }" ] #size is different but ctime is same: update current file's ctime to force md5's recalculation (see below)
      then
        if [ "$CTIME_CHK" -eq 1 ]
        then
          chmod --reference="$LINE2" "$LINE2" #update ctime (note: system time is assumed to be correct)
        else
          touch -m "$LINE2" #update mtime (note: system time is assumed to be correct)
        fi
      fi
    fi #else the file has been either deleted or modified (different ctime) and reference list is here useless
  done
  CNEWER_REF="$CTIME_FIND $REF" #prepare 'find' -cnewer option
else
  CNEWER_REF=""
fi




# ------------- calculation of md5 sums --------------------------------
#this 1st method is not slow but fails on filenames with newlines or '\'
find "${SRC}" $CNEWER_REF \! \( -path "*
*" -o -path "*\\\*" -o -path " *" -o -path "* " \) | while read LINE
do
  LINE2="$LINE"
  if ! [ -h "$LINE" ] && [ -f "$LINE" ]
  then
    RES=$( md5sum "$LINE" )
    LINE2="$LINE2 md5sum=${RES%% *}"
  else
    LINE2="$LINE2 md5sum=-"
  fi
  RES=$( echo $( stat -c"%A %U %G %s $CTIME_STAT \`%F'" "$LINE" ) )
  echo -E "$LINE2 $RES"
done
#this 2nd method is slow but works on filenames with newlines or '\'
( cat << "EOF"
#!/bin/bash
  LINE="$1"
#  LINE2="${LINE//\\\\/\\\\}" # replace \ with \\
  LINE2="${LINE//\\/\\\\}" # replace \ with \\
  LINE2="${LINE2//
/\n}" # replace newline with \n
  if ! [ -h "$LINE" ] && [ -f "$LINE" ]
  then
    RES=$( md5sum "$LINE" )
    LINE2="$LINE2 md5sum=${RES%% *}"
  else
    LINE2="$LINE2 md5sum=-"
  fi
  RES=$( echo $( stat -c"%A %U %G %s $CTIME_STAT \`%F'" "$LINE" ) )
  echo -E "$LINE2 $RES"
EOF
) >"$FINDSCRIPT"
chmod +x "$FINDSCRIPT"
find "${SRC}" $CNEWER_REF \( -path "*
*" -o -path "*\\\*" -o -path " *" -o -path "* " \) -print0 | xargs --replace --null "$FINDSCRIPT" "{}"
rm "$FINDSCRIPT"
#eof

 

Set the ownerships and permissions:

chown -cR root:root /backup/snapshot/rsync/
chmod 700 /backup/snapshot/rsync/rsync-*.sh
chmod 600 /backup/snapshot/rsync/rsync-include.txt

 

Usage

When you call the script ‘rsync-snapshot.sh’ without parameters or with the hostname of the server itself (or localhost), the script performs a self-snapshot of the complete filesystem ‘/’.
You can and should use filter rules to exclude things like ‘/proc/*’ and ‘/sys/*’. For this you need to edit the configuration file ‘/backup/snapshot/rsync/rsync-include.txt’.
A description of the filter rules syntax is written as comments in the file itself.

The snapshot backup is created into ‘/backup/snapshot/HOST/snapshot.001′, where ‘HOST’ is your server’s hostname. If the folder ‘snapshot.001′ already exists it is rotated to ‘snapshot.002′ and so on, up to ‘snapshot.512′, thereafter it is removed. So if you create one backup per night, for example with a cronjob, then this retention policy gives you 512 days of retention. This is useful but this can require to much disk space, that is why we have included a non-linear distribution policy. In short, we keep only the oldest backup in the range 257-512, and also in the range 129-256, and so on. This exponential distribution in time of the backups retains more backups in the short term and less in the long term; it keeps only 10 or 11 backups but spans a retention of 257-512 days.
In the following table you can see on each column the different steps of the rotation, where each column shows the current set of snapshots (limited from snapshot.1 to snapshot.16 in this example):

1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
    2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
        3   3   3   3   3   3   3   3   3   3   3   3   3   -   3   -   3   -   3   - 
            4   -   4   -   4   -   4   -   4   5   4   4   4   -   4   -   4   -   4 
                5   -   5   -   5   -   -   -   -   -   -   -   5   -   -   -   5   - 
                    6   -   -   -   5   5   -   -   -   5   -   -   6   -   -   -   6 
                        7   -   -   -   -   -   -   6   -   -   -   -   7   -   -   - 
                            8   -   -   7   -   -   -   7   8   -   -   -   8   -   - 
                                9   -   -   8   9   -   -   -   9   -   -   -   -   - 
                                    10  -   -   -   -   -   -   -   10  -   -   -   - 
                                        11  -   -   -   -   -   -   -   11  -   -   - 
                                            12  -   -   -   -   -   -   -   12  -   - 
                                                13  -   -   -   -   -   -   -   13  - 
                                                    14  -   -   -   -   -   -   -   14
                                                        15  -   -   -   -   -   -   - 
                                                            16  -   -   -   -   -   - 

 

To save more disk space, ‘rsync’ will make hard links for each file of ‘snapshot.001′ that already existed in ‘snapshot.002′ with identical content, timestamps and ownerships.
For example, the following example creates a backup and then use commands to let you see the used disk space of the 4 existing backups:

root@server05:~# /backup/snapshot/rsync/rsync-snapshot.sh
2012-09-11_19:07:43 server05: === Snapshot backup is created into /backup/snapshot/server05/snapshot.001 ===
2012-09-11_19:07:43 Testing needed free disk space ... 0 MiB needed.
2012-09-11_19:07:45 Checking free disk space... 485997 MiB free. Ok, bigger than 5000 MiB.
2012-09-11_19:07:45 Checking disk space used by /backup/snapshot/server05 ... 11011 MiB used. Ok, smaller than 20000 MiB.
2012-09-11_19:07:46 Creating folder /backup/snapshot/server05/snapshot.000 ...
2012-09-11_19:07:46 Creating backup of server05 into /backup/snapshot/server05/snapshot.000 hardlinked with  /backup/snapshot/server05/snapshot.001 ...
2012-09-11_19:07:52 Setting recursively immutable flag of /backup/snapshot/server05/snapshot.000 ...
Renaming /backup/snapshot/server05/snapshot.003 into /backup/snapshot/server05/snapshot.004 ...
Renaming /backup/snapshot/server05/snapshot.002 into /backup/snapshot/server05/snapshot.003 ...
Renaming /backup/snapshot/server05/snapshot.001 into /backup/snapshot/server05/snapshot.002 ...
Renaming /backup/snapshot/server05/snapshot.000 into /backup/snapshot/server05/snapshot.001 ...
2012-09-11_19:07:55 Checking free disk space... 485958 MiB free. Ok, bigger than 5000 MiB.
2012-09-11_19:07:55 Checking disk space used by /backup/snapshot/server05 ... 11050 MiB used. Ok, smaller than 20000 MiB.
2012-09-11_19:07:56 server05: === Snapshot backup successfully done in 13 sec. ===
-----------------------------
root@server05:~# du -chslB1M /backup/snapshot/localhost/snapshot.* | column -t
10901  /backup/snapshot/localhost/snapshot.001
10901  /backup/snapshot/localhost/snapshot.002
10901  /backup/snapshot/localhost/snapshot.003
10901  /backup/snapshot/localhost/snapshot.004
0      /backup/snapshot/localhost/snapshot.last
43602  total
-----------------------------
root@server05:~# du -chsB1M /backup/snapshot/localhost/snapshot.* | column -t
10898  /backup/snapshot/localhost/snapshot.001
40     /backup/snapshot/localhost/snapshot.002
45     /backup/snapshot/localhost/snapshot.003
45     /backup/snapshot/localhost/snapshot.004
0      /backup/snapshot/localhost/snapshot.last
11026  total
-----------------------------

 

We can see that the 4 snapshot backups use 10.9 GB each, so without hard links they would sum to 43 GB; the last command shows on the contrary that the real used size is only 11 GB, thanks to the hard links.
BTW, the following command can be very useful to replace all duplicate files with hard links to the first file in each set of duplicates, even if they have different name or path:

chattr -fR -i /backup/snapshot/localhost/snapshot.*
fdupes -r1L /backup/snapshot/localhost/snapshot.*

A good tutorial on how to use the ‘rsync’ command is available here:
http://www.thegeekstuff.com/2010/09/rsync-command-examples/
 

When called with a remote hostname as parameter, the script performs a snapshot backup via the network. This can be very useful for a DRP (Disaster Recovery Plan), in order to have a servers’ farm replicated every night to a secondary site. In addition to that you could implement a continuous replication of the databases for example. The ‘BWLIMIT’ can then be changed inside the shell-script to limit here the network bandwidth usage and the disk I/O overhead; it can help so to moderate the performance impact and avoid any slow down on critical production servers.
Other variables can also be modified at the beginning of the script, either as a global setting or specific tuning for some servers; a ‘BACKUPSERVER’ section is already provided for this purpose and let you tune specific settings for the remote central backup server:

HOST_PORT=22                            #port of source of backup
SCRIPT_PATH="/backup/snapshot/rsync"
SNAPSHOT_DST="/backup/snapshot"         #destination folder
NAME="snapshot"                         #backup name
LOG="rsync.log"
MIN_MIBSIZE=5000                        #older snapshots (except snapshot.001) are removed if free disk <= MIN_MIBSIZE. the script may exit without performing a backup if free disk is still short.
OVERWRITE_LAST=0                        #if free disk space is too small, then this option let us remove snapshot.001 as well and retry once
MAX_MIBSIZE=20000                       #older snapshots (except snapshot.001) are removed if their size >= MAX_MIBSIZE. the script performs a backup even if their size is too big.
BWLIMIT=100000                          #bandwidth limit in KiB/s. 0 does not use slow-down. this allows to avoid rsync consuming too much system performance
BACKUPSERVER="rembk"                    #this server connects to all other to download filesystems and create remote snapshot backups
MD5LIST=0                               #to compute a list of md5 integrity signatures of all backuped files, need 'rsync-list.sh'
CHATTR=1                                #to use 'chattr' command and protect the backups again modification and deletion
DU=1                                    #to use 'du' command and calculate the size of existing backups, disable it if you have many backups and it is getting too slow (for example on BACKUPSERVER)
SOURCE="/"                              #source folder to backup

if [ "${HOST_LOCAL}" == "${BACKUPSERVER}" ] #if we are on BACKUPSERVER then do some fine tuning
then
  MD5LIST=1
  MIN_MIBSIZE=35000 #needed free space for chunk-file tape-arch.sh
  MAX_MIBSIZE=12000
  DU=0 # NB: 'du' is currently disabled on BACKUPSERVER for performance reasons
elif [ "${HOST_LOCAL}" == "${HOST_SRC}" ] #else if we are on a generic server then do other some fine tuning
then
  if [ "${HOST_SRC}" == "ZRHSV-TST01" ]; then MIN_MIBSIZE=500; CHATTR=0; DU=0; MD5LIST=0; fi
fi

 

To make the backup server able to connect via ‘ssh’ to the target servers without interactive entering of a password, you should create a ‘ssh’ host-key with empty passphrase into ‘/root/.ssh/rsync_rsa’ and copy the public key to the target servers:

#on each targetserver:
mkdir -p ~root/.ssh/
chown root:root ~root/.ssh/
chmod 700 ~root/.ssh/
touch ~root/.ssh/authorized_keys
chown root:root ~root/.ssh/authorized_keys
chmod 600 ~root/.ssh/authorized_keys
#update manually /etc/ssh/sshd_config to have 'AllowUsers root'
service ssh reload

#on the backupserver, create the key with an empty passphrase:
ssh-keygen -f ~/.ssh/rsync_rsa
#and upload the public key to the targetserver:
MYIP=$(hostname -i) #assign here the backupserver's external IP if necessary
echo "from=\"${MYIP%% *}\",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,command=\"rsync \${SSH_ORIGINAL_COMMAND#* }\" $(ssh-keygen -yf ~/.ssh/rsync_rsa)" | ssh targetserver "cat - >>~/.ssh/authorized_keys"

Note that the ‘command=’ restriction (http://larstobi.blogspot.ch/2011/01/restrict-ssh-access-to-one-command-but.html) will not apply if ‘/etc/sshd_config’ has already a ‘ForceCommand’ directive.
This central backup server could also be used to centralize the administration of all other servers via pdsh/ssh.
 

Because the script does not freeze the filesystem during its operation, there is no guaranty that the snapshot backup will be a strict snapshot, in other words the files will not be copied at the exact same moment. This is usually not an issue, except for databases. In order to keep the consistency of a database, you should follow the instructions of http://www.postgresql.org/docs/9.1/static/continuous-archiving.html and http://www.anchor.com.au/blog/documentation/better-postgresql-backups-with-wal-archiving/.

The following example applies for PostgreSQL 9.1 on Debian:

#!/bin/bash
# ----------------------------------------------------------------------
# created by francois scheurer on 20121102
# ----------------------------------------------------------------------




#http://www.postgresql.org/docs/9.1/static/continuous-archiving.html
#http://www.anchor.com.au/blog/documentation/better-postgresql-backups-with-wal-archiving/

#sudo -u postgres mkdir /var/lib/postgresql/9.1/main/wal_archive
#sudo -u postgres chmod 700 /var/lib/postgresql/9.1/main/wal_archive

#vi /etc/postgresql/9.1/main/postgresql.conf
# wal_level = archive             # minimal, archive, or hot_standby
# # - Archiving -
# archive_mode = on               # allows archiving to be done
#                                 # (change requires restart)
# archive_command = 'test ! -f /var/lib/postgresql/9.1/main/backup_in_progress || (test ! -f /var/lib/postgresql/9.1/main/wal_archive/%f && cp %p /var/lib/postgresql/9.1/main/wal_archive/%f)'           # command to use to archive a logfile segment
# #archive_timeout = 0            # force a logfile segment switch after this
#                                 # number of seconds; 0 disables


	
if pgrep -f "/bin/\w*sh \w*rsync-snapshot\.sh" | grep -qv "$$"
then
  echo "Sorry, rsync is already running in the background. Exiting..."
  exit 1
fi

#check if postgresql is running:
if sudo -u postgres /usr/lib/postgresql/9.1/bin/pg_ctl -D /var/lib/postgresql/9.1/main/ status &>/dev/null
then
  if [ ! -d /var/lib/postgresql/9.1/main/wal_archive ]
  then
    echo "ERROR: /var/lib/postgresql/9.1/main/wal_archive is missing."
    exit 1
  fi
  touch /var/lib/postgresql/9.1/main/backup_in_progress
  #freeze posgresql writing (all writes will go only in pg_xlog WAL-files), in order to make a clean backup at filesystem-level:
  sudo -u postgres psql -c "SET LOCAL synchronous_commit TO OFF; SELECT pg_start_backup('rsync-snapshot', true);" &>/dev/null
fi

#perform the backup
/backup/snapshot/rsync/rsync-snapshot.sh
RES=$?

#check if postgresql is running:
if sudo -u postgres /usr/lib/postgresql/9.1/bin/pg_ctl -D /var/lib/postgresql/9.1/main/ status &>/dev/null
then
  #unfreeze posgresql writing:
  sudo -u postgres psql -c "SET LOCAL synchronous_commit TO OFF; SELECT pg_stop_backup();" &>/dev/null
  rm -f /var/lib/postgresql/9.1/main/backup_in_progress
  chattr -R -i /backup/snapshot/localhost/snapshot.001/var/lib/postgresql/9.1/main &>/dev/null
  rm -rf /backup/snapshot/localhost/snapshot.001/var/lib/postgresql/9.1/main/pg_xlog/* &>/dev/null
  mv /var/lib/postgresql/9.1/main/wal_archive/* /backup/snapshot/localhost/snapshot.001/var/lib/postgresql/9.1/main/pg_xlog/ &>/dev/null
  chattr -R +i /backup/snapshot/localhost/snapshot.001/var/lib/postgresql/9.1/main &>/dev/null
fi

exit $RES

#NB:
#  -If you need to re-create a standby server while transactions are waiting, make sure that the commands to run pg_start_backup() and pg_stop_backup() are run in a session with synchronous_commit = off, otherwise those requests will wait forever for the standby to appear.
#  -see also 'pg_dump' and 'pg_basebackups' commands.
#  -to avoid problems, you should avoid using 'CREATE TABLESPACE' and 'CREATE DATABASE' during the backup. after a recovery you should do a 'sudo -u postgres reindexdb --all' and 'sudo -u postgres vacuumdb --all' 
#24.3.6. Caveats
#
#At this writing, there are several limitations of the continuous archiving technique. These will probably be fixed in future releases:
#
#    Operations on hash indexes are not presently WAL-logged, so replay will not update these indexes. This will mean that any new inserts will be ignored by the index, updated rows will apparently disappear and deleted rows will still retain pointers. In other words, if you modify a table with a hash index on it then you will get incorrect query results on a standby server. When recovery completes it is recommended that you manually REINDEX each such index after completing a recovery operation.
#
#    If a CREATE DATABASE command is executed while a base backup is being taken, and then the template database that the CREATE DATABASE copied is modified while the base backup is still in progress, it is possible that recovery will cause those modifications to be propagated into the created database as well. This is of course undesirable. To avoid this risk, it is best not to modify any template databases while taking a base backup.
#
#    CREATE TABLESPACE commands are WAL-logged with the literal absolute path, and will therefore be replayed as tablespace creations with the same absolute path. This might be undesirable if the log is being replayed on a different machine. It can be dangerous even if the log is being replayed on the same machine, but into a new data directory: the replay will still overwrite the contents of the original tablespace. To avoid potential gotchas of this sort, the best practice is to take a new base backup after creating or dropping tablespaces.

 

It is even possible to freeze an ext3/ext4 filesystem before backuping, but this is quite dangerous, because all processes that try to write on it will get frozen until you unfreeze the filesystem.
You should therefore avoid using this on production server! But for the sake of information here are the steps to freeze the filesystem mounted on ‘/mnt/folder’ during 30 seconds on Debian:

wget ftp://ftp.kernel.org/pub/linux/utils/util-linux/v2.22/util-linux-2.22-rc2.tar.gz
tar xfz util-linux-2.22-rc2.tar.gz
cd xfz util-linux-2.22-rc2
aptitude install ncurses-dev libncurses5-dev mkcramfs cramfsprogs zlib1g-dev libpam-dev libpam0g-dev
./configure
make
make clean
man -l sys-utils/fsfreeze.8
./fsfreeze -f /mnt/folder && sleep 30 && ./fsfreeze -u /mnt/folder

 

I hope this ‘rsync-snapshot.sh’ script can be useful to you! ^_^
Another script will be posted on the blog soon to show you how to archive those snapshot backups on tapes using ‘tar’ with encrypted split chunks of data.

12 comments on “Howto – local and remote snapshot backup using rsync with hard links

  1. Thanks Francois!

  2. Wim Paulussen on said:

    Thank you very much for sharing this.

    I succeeded in creating the necessary files and got it running. It creates a complete backup with a .000 extension, but rerunning the program just recreates the complete backup with the same .000 extension instead of the expected snapshot.001. Could you give me hints why I get this unexpected behaviour ?

    • Francois.Scheurer on said:

      Hi Wim!

      The ‘snapshot.000′ is created when you start the script and will be renamed to ‘snapshot.001′ at the end of the process.
      So in your case something caused the script to exit before completion.
      Could you send the output of the script with the error message?
      You can also send the output of ‘uname -a; cat /proc/mounts’.
      You may try to disable chattr (modify the script with ‘CHATTR=0′) if this command is not supported on your system.

      Best regards

      • Francois.Scheurer on said:

        I noticed that the version on the blog is not up-to-date and has a bug with the ‘seq’ command, that need an explicit ‘-1′ increment if the serie is decreasing.
        I used previously ‘for i in {512..001}’ which does not need an explicit increment but this is unsupported on older bash.
        I will update the blog this evening with the latest version of the script.

  3. Hi,

    Thanks for sharing the scripts, they are very interesting.

  4. I tried to do a proof-of-concept of the retention scheme and I got a different result than you. I took 8 snapshots as an example which gave this result :

    1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1
    ..2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2
    ….3…3…3…3…3…3…3…3
    ……4…4…4…4…4…4…4..
    ……..5…5…5…5…5…5…5
    ……….6………6………6..
    …………7………7………7
    …………..8………8……..

    So we converge on the column with the set : 1 2 3 5 7, where you get 1 2 3 7 (no 5), normally your script will not delete 4 and then it will be included in the rotation process and transformed to 5. Do you agree ?

    • Francois.Scheurer on said:

      Hi Anouar,

      Yes on the 7th daily backup we will have the set 1 2 3 5 7, where 4 is renamed to 5 as you wrote.
      On the 4th day you will have 1 2 3 4 and not 1 2 4 however.
      See in article I updated the table to show it.
      Best Regards

  5. this will be nicer :
    1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    —2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
    —–3 3 3 3 3 3 3 3
    ——-4 4 4 4 4 4 4
    ———-5 5 5 5 5 5 5
    ————6 6 6
    —————7 7 7
    —————–8 8 8

  6. Catalin on said:

    Very nice script, I use it daily however it would be nice to add exclude lists per host as not all servers are alike and an option to specify the SSH port per host as well, as not all servers listen on 22.
    I modified the script to check for rsync-include-{host-name}.txt

  7. J Wilson on said:

    What a wonder script. I can’t thank you enough for this. I did make some small ajustments to the rsync command (less verbose and make the results more humanly readable -h ) but apart from that it is working really well.

    Thank you for putting this online.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

13,294 Spam Comments Blocked so far by Spam Free Wordpress

HTML tags are not allowed.