Documentation about Backing-Up under Linux

Tux

At another page, I described how I backup Windows machines. Under Windows, I prefer image backups because the integration between user data and the OS is tight, especially for user settings.

Under Linux, I prefer files based backups, because user settings and data can be cleanly separated from the OS.

I have chosen to adopt tar fo file backups, because it is

proven
available for all distributions
easy to understand

About `tar`

tar, whose name comes from "tape archive", is an old and proven utility. It is available for all Unix-like operating systems.

There seem to be two major tar variants:

GNU tar. This is available on most Linux systems. However, there is also a port available for Windows. For example, I got GNU tar version 1.30 from 2017 as part of git for Windows.
BSD tar. This is available on systems like FreeBSD, but is also standard on Mac OS. Since Windows 10 1903, it is also part of the standard Windows command line tools (It had been introduced the same time as curl)

Comparison of Functionality of the Two Variants

tar is mostly used to do backups. It used to be used to do backups on physical tapes. The functionality still reflects limitations of tapes. For example the tar "update" mode will write the updated file(s) to the end of the archive, while leaving the older information in the archive untouched. This is because "seeking" on a tape was slow and costly.

Incremental backups using "update" mode

To do incremental backups, you might consider using the "update" mode, to add new or updated files to the archive. This mode is supported by both BSD tar and GNU tar.

To restore, you would use the "extract" mode together with option -k "Keep (don't overwrite) existing files". However, this approach does not cater for files that were deleted in the source between the full backup and subsequent updates. Also files that were renamed would appear twice after extract:

under their old name, and
under their new name.

Similar for files that were moved: they appear

at their old location, and
at their new location.

Another drawback of "update" is the fact, that it works only on uncompressed archives. This is due to the fact that tar compresses after creating the archive and before writing to tape/file. The approach seems to be due to legacy constraints:

it is slow and costly to "seek" on tapes
compression used to be costly when CPU power was limited

Incremental backups using the "listed-incremental" option

GNU tar has additional functionality for incremental backups. This is not available in BSD tar. GNU tar creates and maintains an additional file with meta data along side the archive file. The standard extension for this file is snar. Example syntax:

tar -czvg mybackup_full.snar -f mybackup_full.tgz dir-to-backup

The options

c stands for "create"
z for gzip compression
v for verbose messages
g is short hand for --listed-incremental.

An additional option p ‑ to preserve meta data like access rights ‑ is only relevant for extract.

There is no mention of the "differential backup" strategy in the on-line documentation. However, the GNU tar manual (see section 5.) calls a full backup a "level 0" backup and the first incremental backup "level 1". The manual explicitely states that you might want to do more level 1 backups by creating a "working copy" of the snar file created at level 0. I tend to call this approach a differential backup. The approach would be:

cp mybackup_full.snar mybackup_diff-1.snar
tar -czvg mybackup_diff-1.snar -f mybackup_diff-1.tgz dir-to-backup

Extract from Incremental Backups

To restore or "extract" from incremental backups, the GNU tar manual stipulates to also use option --listed-incremental with =/dev/null as argument (it could be any argument). So with just a full and a level 1 backup:

tar -xzpf mybackup_full.tgz
tar -xzpg /dev/null -f mybackup_diff-1.tgz

The following alternative also works:

tar --incremental -xzpf mybackup_diff-1.tgz

To list what is going on, you might do:

tar --incremental -tvvzpf mybackup_diff-1.tgz

Caveats

tar stores "device numbers" in the snapshot file and also uses these to check if a file has changed since the last backup. You might want to use --no-check-device option to avoid a full backup if device numbers change for some reason. There is a tar-snapshot-edit (tar-edit-snapshot ?) utility to deal with such snapshot file issues.

References

https://unix.stackexchange.com/questions/13093/how-to-add-update-a-file-to-an-existing-tar-gz-archive
Introduction to the tar command line syntax https://www.computernetworkingnotes.com/linux-tutorials/tar-command-examples-in-linux.html
Introduction to the syntax of incremental backups https://www.computernetworkingnotes.com/linux-tutorials/create-and-restore-incremental-backups-in-linux-with-tar.html
Explains level 1 (=differential) and multi-level backups http://paulwhippconsulting.com/blog/using-tar-for-full-and-incremental-backups/
The tar manual https://www.gnu.org/savannah-checkouts/gnu/tar/manual/tar.html#Backups
Also explains why to copy the snar file https://etutorials.org/Linux+systems/how+linux+works/Chapter+13+Backups/13.4+Using+tar+for+Backups+and+Restores/
Difference between --listed-incremental and --newer https://unix.stackexchange.com/questions/307530/what-is-the-difference-between-tars-newer-and-listed-incremental-options
More details on incremental backups https://floatingoctothorpe.uk/2018/incremental-tar-backups.html
Torture-testing Unix Backup and Archive Programs https://web.archive.org/web/20010622155048id_/http://reality.sgi.com/zwicky_neu/testdump.doc.html