Backups service

snapperd was introduced in HyperCloud 2.x as a high performance replacement for the previous snapshot daemon.

Info

HyperCloud 2.2.x introduced support for retrying missed or failed backups.

In the event a backup window was missed due to cluster maintenance or other activities, if the next backup window has not been crossed, the backup service will immediately kick off a backup.
In the event a backup fails, it will retry until the next backup window.
In either event, crossing the threshold of the next backup window will result in that backup being taken instead. That is, only one missed or failed backup will be queued at any given time.

Advisement

There are several simple rules to follow when creating snapshots for recovery:

Snapshot deltas (changes since last snap) directly and linearly impact transfer times when remote and/or archive backups are configured.
Only take as many snapshots as needed for recovery point objectives (RPO). Subsequent snapshots degrade performance.
Be cautious when setting snapshot frequency. Scheduling more than one snapshot per hour can negatively affect performance for the system's workload, or the entire cluster. Always prioritize the actual RPO needs of the system.
The total storage usage for an image is the sum of all its snapshot deltas, plus the base image size. Images that undergo significant changes quickly will have increased storage usage. This happens because of how copy-on-write snapshots work.
Multiple factors impact snapshot transfer speeds for remote and/or archive backups. Depending on the configuration, snapshot transfers may be skipped if a transfer takes too long and rolls into the next scheduled transfer. Some things to consider include:
- Local or remote cluster I/O capability, which is dramatically impacted by running workloads.
- Network connectivity between clusters, which includes latency, packet loss, bandwidth, and MTU.

Usage

There are three types of snapshots:

Local
Remote
Archive

Local snapshots are stored on the local HyperCloud system.

Remote snapshots are copies of local snapshots that are stored on another (remote) HyperCloud system.

Archive snapshots are copies of local snapshots that are stored on another system that runs the S3 protocol.

Snapshot schedules

The schedule is a mix of destinations:

Local
Remote
Archive

And frequencies:

Hourly
Daily
Weekly
Monthly
Yearly

Hourly

An hourly schedule for local snapshots has the format:

hourly:local=<NUM>[@MINUTE]

Where:

NUM (required) is the number of snapshots to keep on the local system.
@MINUTE (optional) is the minute of the hour to take the snapshot.
- Valid MINUTE values are 0 - 59.

Example

hourly:local=4@20

This denotes that the system will keep 4 hourly snapshots on the local system and the snapshots for this VM will occur on the 20th minute of each hour; that is, at 1:20, 2:20, 3:20, etc.

Daily

A daily schedule for local snapshots has the format:

daily:local=<NUM>[@HOUR]

Where:

NUM (required) is the number of snapshots to keep on the local system.
@HOUR (optional) is the hour of the day to take the snapshot.
- Valid HOUR values are (denoted in 24 hour clock) 0 - 23.

Example

daily:local=3@3

This denotes that the system will keep 3 daily snapshots on the local system and the snapshots for this VM will occur once per day at 3 AM.

Weekly

A weekly schedule for local snapshots has the format:

weekly:local=<NUM>[@DAYOFWEEK]

Where:

NUM (required) is the number of snapshots to keep on the local system.
@DAYOFWEEK (optional) is the day of the week to take the snapshot.
- Valid DAYOFWEEK values are:
  - Numbering the days of the week (starting with Sunday): 0 - 6
  - Abbreviating the days of the week: Sun, Mon, Tue, Wed, Thu, Fri, Sat
  - Writing out the full name of the days of the week: Sunday, Monday, Tuesday, etc.

Example

weekly:local=2@4
weekly:local=2@Wed

This denotes that the system will keep 2 weekly snapshots on the local system and the snapshots for this VM will occur once per week on Wednesday. The latter example is an equivalent scheduling to the former listing.

Monthly

A monthly schedule for local snapshots has the format:

monthly:local=<NUM>[@DAYOFMONTH]

Where:

NUM (required) is the number of snapshots to keep on the local system.
@DAYOFMONTH (optional) is the day of the month to take the snapshot.
- Valid DAYOFMONTH values are the numbers of the day of the month: 1 - 31.

Example

monthly:local=5@15

This denotes that the system will keep 5 monthly snapshots on the local system and the snapshots for this VM will occur once per month on the 15th day of the month.

Yearly

A yearly schedule for local snapshots has the format:

yearly:local=<NUM>[@DAYOFYEAR]

Where:

NUM (required) is the number of snapshots to keep on the local system.
@DAYOFYEAR (optional) is the day of the month to take the snapshot.
- Valid DAYOFYEAR values are:
  - Numbered months of the year: 01 - 12
  - Days of the year: 001 - 366
  - Abbreviated names of the month: Jan, Feb, Mar, Apr, May, Jun, July, August, Sep, Oct, Nov, or Dec
  - Full names of the months: January, February, March, etc.

Example

yearly:local=2@015

This denotes that the system will keep 2 yearly snapshots on the local system and the snapshots for this VM will occur once per year on January 15th.

Or,

Additional example

yearly:local=2@May

This denotes that the system will keep 2 yearly snapshots on the local system and the snapshots for this VM will occur once per year on May 1st.

Combining Schedules

You can combine local, remote, and archive schedules.

Example

hourly:local=2,remote=4,archive=3

This denotes that the system will take hourly local snapshots at the top of each hour and keep the most recent two (2). Additionally, at the top of each hour, it will copy the local snapshot to a remote system, where it will save the most recent four (4). Furthermore, during this same instance, another copy of the snapshot will be saved on an archive system, where the three (3) most recent snapshots filed will be maintained.

Hourly, Daily, Weekly, Monthly, and Yearly schedules can be combined.

Example

hourly:local=2 daily:local=7 weekly:local=4

This denotes that the system will take hourly local snapshots at the top of each hour and keep the most recent two (2). It will also take a daily snapshot at midnight and keep the most recent seven (7), a full week. Additionally, it will take a weekly snapshot on Sunday at midnight and keep the most recent four (4).

Scheduling Snapshots on a VM

To schedule snapshots for a VM, a SNAPSHOT_SCHEDULE Attribute needs to be added to the VM and the Value will be the snapshot schedule.

An example of the Attribute and Value are depicted below:

Configuring remote and archive backup targets

Remote configuration

Remote snapshots are copies of local snapshots that are stored on another (remote) HyperCloud system. We use RBD commands within Ceph to export a local snapshot to a remote HyperCloud system. The snapshot is then transferred to the remote system via SSH.

Since SSH is used as a transport, the public key for the local system must be copied to the remote system. The public key for the user oneadmin must be used. This key can be found on the local system in:

/home/oneadmin/.ssh/id_rsa.pub

Then, on the remote system, put the above key in the file:

/var/run/cluster-control/facts/authorized_keys

Remote snapshot configuration is defined in a file named:

/var/run/cluster-control/facts/snapshot/remote.json

Here is a sample remote.json:

{
    "destination": "remote",
    "host":         "10.1.2.3",
    "image_prefix": "sailfish",
    "pool":         "rbd",     
    "ssh_options":  "",      
    "compress":     "",     
    "decompress":   ""   
}

Parameters:

destination: Must be remote
host: IP address of remote HyperCloud cluster's dashboard
image_prefix: The label that will be the prefix of the RBD image on the remote system that holds the snapshots
pool: The name of the pool on the remote system that the snapshots should be stored in
ssh_options: Any extra ssh options needed to login to the remote system
compress: Optional Must be a valid compress command (i.e., bzip2 --compress)
decompress: Optional Must be a valid decompress command (i.e., bzip2 --decompress)

Archive configuration

Archive snapshots are copies of local snapshots that are stored on another system that runs the S3 protocol.

The archive snapshot configuration is defined in a file named:

/var/run/cluster-control/facts/snapshot/archive.json

Here is a sample archive.json:

{
    "destination": "archive",
    "bucket":       "sailfish",
    "accesskey":    "JQ37MHYZBMKZ1GKJ1Y78",
    "secretkey":    "0fYiDTZqv4nbTHIbpgo15zvpQ8qhUVOziGmhxbIB",
    "host":         "10.1.2.3",
    "port":         7480,
    "options":      "--no-ssl",
    "compress":     "cat",     
    "decompress":   "cat"
}

Parameters:

destination: Must be archive.
bucket: The name of the S3 bucket to store the snapshots.
accesskey: Your S3 access key.
secretkey: Your S3 secret key.
host: The IP address of the S3 endpoint.
port: The port number of the S3 endpoint.
options: Optional. Extra flags that s3cmd needs to upload snapshots.
compress: Optional. Must be a valid compress command (i.e., bzip2 --compress).
decompress: Optional. Must be a valid decompress command (i.e., bzip2 --decompress).

snapctl commands

The syntax for the snapctl command is snapctl [argument(s)] [option(s)].

list schedules

# snapctl list schedules --help
List snapshot schedules for all VMs
Usage:
    snapctl list schedules [flags]
Flags:
    --config-dir string   path to configuration files (default "/var/run/cluster-control/facts/snapshot")
    --debug               print API calls
    --endpoint string     url for ONE API
-h, --help                help for schedules
    --log string          log level (default "warn")
    --password string     password for authentication to the ONE API
    --username string     username for authentication to the ONE API

list snapshots

# snapctl list snapshots --help
Lists existing snapshots
Usage:
    snapctl list snapshots [flags]
Flags:
    --all                 list local,archive and remote snapshots
    --archive             list archive snapshots
    --config-dir string   path to configuration files (default "/var/run/cluster-control/facts/snapshot")
    --debug               print API calls
    --endpoint string     url for ONE API
-h, --help                help for snapshots
    --local               list local snapshots (default true)
    --log string          log level (default "warn")
    --manual              list manual snapshots (default true)
    --password string     password for authentication to the ONE API
    --remote              list remote snapshots
    --username string     username for authentication to the ONE API

list work

# snapctl list work --help
List status of snapshot daemon
Usage:
    snapctl list work [flags]
Flags:
-h, --help          help for work
    --host string   snapperd hostname (default "localhost")
    --port int      snapperd port number (default 7627)

nuke snapshots

# snapctl nuke snapshots --help
Nuke existing snapshots
Usage:
    snapctl nuke snapshots [flags]
Flags:
    --all                 nuke local,archive and remote snapshots
    --archive             nuke archive snapshots
    --config-dir string   path to configuration files (default "/var/run/cluster-control/facts/snapshot")
    --debug               print API calls
    --endpoint string     url for ONE API
-h, --help                help for snapshots
    --local               nuke local snapshots (default true)
    --log string          log level (default "warn")
    --password string     password for authentication to the ONE API
    --remote              nuke remote snapshots
    --username string     username for authentication to the ONE API