Snapshots are important whether you are running a simple virtual machine on your home computer or if it’s an enterprise database which is constantly being updated and modified. Having snapshots, that is, a copy of the entire filesystem as it was at a given period of time is important.
People often lose track of where things went wrong, a file was deleted and no one noticed that it was gone. Several backups have passed and now you realize that an important file is missing from all the available backups of last 5 weeks. In this tutorial, we shall see how to use ZFS snapshots and touch upon various snapshotting policies that would work optimally, in terms of both resource utilization and recoverability.
ZFS has both the high level overview of files and directories and understands how data is written on the disk. When physically writing data onto a disk, it is done so in discrete blocks. Typically, the block size can go up to 1 MB but the default is usually 128 KB. Now, this means every modification (read, write or deletion) will happen in the discrete blocks.
The copy-on-write mechanism ensures that whenever a block is modified, instead of modifying the block directly, it makes a copy of the block with the required modifications done on the new block.
This is especially helpful in cases where, say, there’s a power failure and your system crashes while new data was being written onto the disk. If that happens in a traditional filesystem, your files will get corrupted or left with holes in them. But if you are using ZFS, you may lose the on-going transaction as that was happening, but your files’ last valid state will still be left untouched.
Snapshots also rely on this functionality, and quite heavily in fact. When you take a snapshot of a given dataset (‘dataset’ is the ZFS term for a filesystem), ZFS just records the timestamp when the snapshot was made. That is it! No data is copied and no extra storage is consumed.
Only when the filesystem changes, and the data in it diverges from the snapshot, does the snapshot start consuming extra storage. What happens under the hood is this – Instead of recycling the old blocks over time, ZFS keeps them around. This also improves storage utilization. If you snapshot a 20GB dataset and modify only a few text files here and there the snapshot may only take a few MBs of space.
To demonstrate the use of snapshots, let’s start with a dataset which has a lot of text files, just to keep the matter simple. The virtual machine I will be using for the demo is running FreeBSD 11.1-RELEASE-p3 which is the latest stable release available at the time of this writing. The root filesystem is mounted on the zroot pool by default and lot of the familiar directories like /usr/src, /home, /etc are all their own datasets mounted on zroot. If you don’t know what a pool (or a zpool) means, in the ZFS vernacular, it would be well worth reading up on it before continuing.
One of the many filesystems, or datasets, that come by default on FreeBSD is: zroot/usr/src
To look at it properties, run the following command.
[email protected]:~$ zfs list zroot/usr/src
As you can see it uses 633 MB of storage. It contains the entire source tree for the operating system.
Let’s take a snapshot of zroot/usr/src
[email protected]:~$ zfs snapshot zroot/usr/[email protected]
The @ symbol acts as a delimiter between the dataset and the snapshot name, which in our case is snapshot1.
Now let’s look at the state of the snapshot as it is created.
By running the command:
zfs list -rt all zroot/usr/src
You can see that the snapshot uses no extra space when it is born. There is no available space either, because it is a strictly read only dataset, the snapshot itself can’t grow, modify or shrink. Lastly, it is not mounted anywhere which makes it completely isolated from the given filesystem hierarchy.
Now, let’s remove the sbin directory in /usr/src/
[email protected]:$ rm /usr/src/sbin
Looking at the snapshot you will now see that it has grown,
This is expected because the copy-on-write mechanism is at work here and deleting (or modifying) the files has led to more of the data being associated only to the snapshot and not the dataset actually in use.
Notice the REFER column in the above output. It gives you the amount of accessible data on the dataset whereas the USED column just shows you how much space is occupied on the physical disk.
ZFS’ Copy-On-Write mechanism often gives these counter-intuitive results where deleting a file would make it look as if more space is now being used than before. However, having read so far, you know what’s actually happening!
Before finishing, let’s recover the sbin from snapshot1. To do that simply run:
[email protected]:/usr/src$ zfs rollback zroot/usr/[email protected]
The next question to ask is – How often you want to take the snapshots? While it may vary from one enterprise to another, let’s take the example of a very dynamic database that changes every so often.
To begin with you would start taking snapshots every 6 hours or so, but because the database changes so much it would soon become infeasible to store all the numerous snapshots that were created. So the next step would be to purge snapshots that are older than, say, 48 hours.
Now, the problem would be to recover something that has been lost 49 hours ago. To circumvent this problem, you can keep one or two snapshots from that 48 hour history and keep them around for a week. Purge them when they get older than that.
And if you can keep going this way, you could cram snapshots up to the very genesis of the system, just in decreasing order of frequency. Lastly, I would like to point out that these snapshots are READ-ONLY which means if you get infected by a ransomware and get all of your data encrypted (modified). These snapshots would, most likely, still be intact.