Backing up one’s home directory with tar

April 20, 2008 at 2:01 pm 3 comments

This is something that should be fairly easy. However in practice there are some problems:

  1. Tar has a proliferation of options and can be an effort to use – at least if you are using it infrequently
  2. One’s home directory often contains large files (like perfectly legally downloaded music and videos), which you don’t really want to back up repeatedly.
  3. Some applications put rather large things in hidden directories under your home directory

There are various solutions:

    1. Use some convenient tool (perhaps a gui) which serves to
      1. Make decisions that you don’t care about for you
      2. Provides some sort of reference to options available to you while you use then
      3. Has less theory behind it
    2. Get someone else to tell you the commands you need and ignore the documentation
  1. Exclude particularly large files that stay fixed for large periods of time and backup them up separately or just get a faster computer, a larger backup device or a faster internet connection
    1. Exclude all configuration files from your backup
    2. Exclude all configuration directories from your backup (single files can’t be that large)
    3. Find out which configuration files are larger and exclude them specifically.

As far as these solutions go – I don’t know of a good archiving gui for doing such things (but someone might like to inform me of the existence of one), nor have I really used any other tools for this; I’m going to try to explain some convenient commands here. Excluding files is fairly easy to do with tar – but finding the files to exclude can take quite a lot of work.

Using tar (for the impatient)
To create and archive one types

tar -c -f "OUTPUT FILE" input files and directory

(c for create, f for file)

If you want to exclude files you can add the options:

  1. –exclude=PATTERN to exclude filenames matching a wildcard expression (if you add this repeatedly you exclude several files)
  2. -X FILE_CONTAINING_LIST_OF_PATTERNS to exclude based on a lot of patterns

If you want to gzip the file you create add -z, if you want to bzip the file add -j, but this means you can’t add things to your archive later. (j is quite like z if you think for long enough).

You probably want to tar from the directory above your home directory, also if you are outputting the file into your home directory you might want to exclude the file you are outputting.

You can use -h if you want to include the files referenced by symlinks.

tar -c -f "OUTPUT FILE" input files and directory

Finding files to exclude
Normal file exploring tools tend to be rather bad at saying the sides of directories, or at least I don’t know how to use them. The command du (disc usage is quite useful for this task). To find the files that are taking up most of the space you could running the following in your home directory:

du -h .* * --exclude=".." | grep -E "^[^[:blank:]]*M" | sort -r  -n | less

(Find the disk usage in human-readable format of all the hidden files apart from “..”, search for those files that have a size of over one megabyte – since the start with something containing no space followed by a capital M, and then sort these files in numerical before presenting them in a pager). You could then exclude the largest files.

If you just want to find those top level directories that are large you could try

du -hs .* * --exclude=".." | grep -E "^[^[:blank:]]*M" | sort -r  -n | less

here s stands for summary.

Actually excluding files
To exclude files you can create an exclusion file consisting of a newline separated list of patterns to exclude.

These need to be relative to the directory from which you are running tar, so if you are running tar from the directory above your home directory make sure to include your home directories name as a prefix to all the paths.

Howto for the impatient

  1. Run

    du -hs HOME_NAME/.* HOME_NAME/* --exclude=".." | grep -E "^[^[:blank:]]*M" | sort -r  -n > HOME_NAME/exclude_list

    from the directory above your home directory.

  2. Edit exclude_list, truncate the file when the files start getting small, and remove any large files that you want to keep.
  3. Run

    tar -c -z -f HOME_NAME/BACKUP_NAME.tgz -X HOME_NAME/exclude_file --exclude=HOME_NAME/BACKUP_NAME.tgz

    from the directory above your home directory.

  4. List the files you have archived with

    tar -f HOME_NAME/BACKUP_NAME.tgz --list | less
  5. Save the file somewhere safe.

Entry filed under: Uncategorized. Tags: , , , .

How configuration should work. Higher-order functions are quite like objects

3 Comments Add your own

  • 1. artagnon  |  June 16, 2008 at 5:45 am

    I’d rather just use a versioning system like SVN or git to version my home directory. Solves all the problems you’re trying to address.

  • 2. existentiality  |  June 16, 2008 at 9:51 am

    Thanks for your reply. This is probably the case – so long as you have a server box with subversion sitting somewhere – if this isn’t the case (like it is for me at the moment) then this is a problem. If one has a (virtual) server anywhere a version control system is probably better for this.

    Local git doesn’t really deal with a machine dying – but for more mundane problems – like deleting single files – it is probably useful.

  • 3. Doug Bierer  |  January 22, 2009 at 6:20 am

    Another way to back up selectively would be to use “tar” in conjunction with the “find” command. This is how you can back up those pesky “hidden” files, for example. Here’s what I use:

    find ~ -print -exec tar cvfz name.of.tar.file.gz {} \:

    If you wish to filter what goes into the tar file, then add the “-name” filter:

    find ~ -name pattern -print -exec tar cvfz name.of.tar.file.gz {} \:

    Where “pattern” would be something like xyz*.jpg etc.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed

April 2008
« Mar   May »

%d bloggers like this: