Comparing ISO Images

Chưa phân loại
In order to setup and maintain computing devices, Linux distributors regularly provide according ISO images for their releases. This simplifies keeping our systems up-to-date with the help of a full compilation of software that actually fits together, in ideal circumstances.

Imagine that you have several of these ISO images stored locally. How do you figure out that the retrieved ISO images are authentic? In this article we show you how to verify the integrity and authenticity of an ISO image that has been downloaded before, and how to figure out what are the differences between the actual content of two ISO images. This helps you to verify the building process for the ISO image, and allows you to see what may have changed between two builds, or releases that are available.

Image formats

The format of disk images has its own history [11]. The common standard is ISO 9660 [12] that describes the contents of an optical disc as a whole. In use is the file extension .iso in order to identify an image file (cloned copy).

The original ISO 9660 format comes with a number of limitations such as 8 directory levels as well as the length of file names. These limitations have been reduced by the introduction of a number of extensions such as Rock Ridge [13] (preservation of POSIX permissions and longer names), Joliet [14] (storage of Unicode names in UCS-2), and Apple ISO 9660 Extensions [15] that introduced HFS support.

In order to get more details regarding an image file use the `file` command followed by the name of the data file as follows:

.Listing 1: Displaying the details for an ISO file

$ file *.iso
debian-10.1.0-amd64-netinst.iso:   DOS/MBR boot sector;
partition 2 : ID=0xef, start-CHS (0x3ff,254,63), end-CHS (0x3ff,254,63),
startsector 3808, 5664 sectors
xubuntu-18.04.3-desktop-amd64.iso: DOS/MBR boot sector;
partition 2 : ID=0xef, start-CHS (0x3ff,254,63), end-CHS (0x3ff,254,63),
startsector 11688, 4928 sectors $

Verifying downloaded ISO files

Trustworthy software providers always offer you two things for download — the actual ISO image as well as the according checksum of the image in order to do an integrity check for the downloaded file. The latter one allows you to confirm that your local file is an exact copy of the file present on the  download servers, and nothing went wrong during the download. In case of an error during the download the local file is corrupted, and can trigger random issues during the installation [16].

Furthermore, in case the ISO image has been compromised (as it happened with Linux Mint in early 2016 [17]) the two checksums will not match. You can calculate the checksums using `md5sum` (deprecated, no longer recommended) and `sha256sum` as follows:

.Listing 2: Calculating the checksum for ISO files

$ md5sum *.iso
b931ef8736c98704bcf519160b50fd83  debian-10.1.0-amd64-netinst.iso
0c268a465d5f48a30e5b12676e9f1b36  xubuntu-18.04.3-desktop-amd64.iso

$ sha256sum *.iso
7915fdb77a0c2623b4481fc5f0a8052330defe1cde1e0834ff233818dc6f301e debian-10.1.0-amd64-netinst.iso
3c9e537ee1cf64088251e56b4ca1694944ad59126f298f24a78cd43af152b5b3 xubuntu-18.04.3-desktop-amd64.iso


You can invoke the comparison between the provided checksum file and the locally stored ISO image as displayed in listing 3. The output of OK at the end of a line signalizes that both checksums are the same.

.Listing 3: Compare provided checksums

$ sha256sum –check sha256sum.txt xubuntu-18.04.3-desktop-amd64.iso: OK

Comparing two locally stored ISO files

It may happen that you have downloaded two ISO files, and you would like to figure out if they are entirely the same. The `sha256sum` command is useful, again, and we recommend you to encapsulate this check in a shell script. In Listing 4 you see an according bash script that combines the four commands  `sha256sum`, `cut`, `uniq`, and `wc` in order to separate the first column for all the output lines, merge them in case they are identical, and count  the number of lines that remain. If the two (or more) ISO files are the same then its checksums are identical, only a single line will remain, and the bash script will output the message “the files are the same”, eventually:

.Listing 4: Automatically comparing checksums of ISO files using `sha256sum`

if [ `sha256sum *.iso | cut -d‘ ‘ -f1 | uniq | wc -l` eq 1 ]
  echo "the files are the same"
  echo "the files are not identical"

In case the script returns that the two files are different you may be interested in the exact position of inequality. A byte-order comparison can be done using the `cmp` command that outputs the first byte that differs between the files:

.Listing 5: See the differences between two or more files using `cmp`

$ cmp *.iso
debian-10.1.0-amd64-netinst.iso xubuntu-18.04.3-desktop-amd64.iso differ: byte 433, line 4

Comparing the actual content

So far, we did a byte-order comparison, and now we will have a closer look inside — at the actual content of the ISO files to be compared with each other. At this point a number of tools come into play that help to compare single files, entire directory structures as well as compressed archives, and ISO images.
The `diff` command helps to compare a directory using the two switches `-r` (short for `–recursive`) and `-q` (short for `–brief`) followed by the two directories to be compared with each other. As seen in

Listing 6, `diff` reports which files are unique to either directory, and if a file with the same name has changed.

.Listing 6: Comparing two directories using `diff`

$ diff -qr t1/ t2/
Only in t1/: blabla.conf.
The files t1/nsswitch.conf and t2/nsswitch.conf are different.
Only in t2/: pwd.conf.

In order to compare two ISO images simply mount the two image files to separate directories, and go from there.
A more colourful output on the commandline is provided by the tools `colordiff` [1,2] and `icdiff` [18,19]. Figure 1 shows the output of `icdiff` in which the differences between the two files of `nsswitch.conf` are highlighted in either green or red.

Figure 1: Comparing two directories using `icdiff`

Graphical tools for a comparison of directories include `fldiff` [5], `xxdiff` [6] and `dirdiff` [7]. `xxdiff` was inspired by `fldiff`, and that’s why they look rather similar. Entries that have a similar content come with a white or gray background, and entries that differ come with a light-yellow background, instead. Entries with a bright-yellow or green background are unique to a directory.

Figure 2: Comparing two directories using `fldiff`

`xxdiff` displays the file differences in a separate window by clicking on an entry (see Figure 3).  

Figure 3: Comparing two directories using `xxdiff`

The next candidate is `dirdiff`. It builds on top of the functionality of `xxdiff`, and can compare up to five directories. Files that exist in either directory are marked with an X. Interestingly, the colour scheme that is in use for the output window is the same one as `icdiff` uses (see Figure 4).

Figure 4: Comparing two directories using `dirdiff`

Comparing compressed archives and entire ISO images is the next step. While the `adiff` command from the `atool` package [10] might be already known to you, we will have a look at the `diffoscope` command [8,9], instead. It describes itself as “a tool to get to the bottom of what makes files or directories different. It recursively unpacks archives of many kinds and transforms various binary formats into more human readable forms to compare them”. The origin of the tool is The Reproducible Builds Project [19,20] which is “a set of software development practices that create an independently-verifiable path from source to binary code”. Among others, it supports the following file formats:

* Android APK files and boot images
* Berkeley DB database files
* Coreboot CBFS filesystem images
* Debian .buildinfo and .changes files
* Debian source packages (.dsc)
* ELF binaries
* Git repositories
* ISO 9660 CD images
* MacOS binaries
* OpenSSH public keys
* OpenWRT package archives (.ipk)
* PGP signed/encrypted messages
* PDF and PostScript documents
* RPM archives chives

Figure 5 shows the output of `diffoscope` when comparing two different versions of Debian packages — you will exactly see the changes that have been made. This includes both file names, and contents.

Figure 5: Comparing two Debian packages using `diffoscope` (excerpt)

Listing 7 shows the output of `diffoscope` when comparing two ISO images with a size of 1.9G each. In this case the two ISO images belong to Linux Mint  Release 19.2 whereas one image file was retrieved from a French server, and the other one from an Austrian server (hence the letters `fr` and `at`). Within seconds `diffoscope` states that the two files are entirely identical.

.Listing 7: Comparing two ISO images using `diffoscope`

$ diffoscope
|####################################################|  100%    Time: 0:00:00

In order to look behind the scenes it helps to call `diffoscope` with the two options `–debug` and `–text -` for both more verbose output to the terminal. This allows you to learn what the tool is doing. Listing 8 shows the according output.

.Listing 8: Behind the scenes of `diffoscope`

$ diffoscope –debug –text –

201910-03 13:45:51 D: diffoscope.main: Starting diffoscope 78
201910-03 13:45:51 D: diffoscope.locale: Normalising locale, timezone, etc.
201910-03 11:45:51 D: diffoscope.main: Starting comparison
201910-03 11:45:51 D: diffoscope.progress: Registering < diffoscope.progress.ProgressBar object at 0x7f4b26310588> as a progress observer
201910-03 11:45:52 D: diffoscope.comparators: Loaded 50 comparator ETA:  –:–:–
201910-03 11:45:52 D: diffoscope.comparators.utils.specialize: Unidentified file. Magic says: DOS/MBR boot sector; partition 2 : ID=0xef, start-CHS (0x3ff,254,63), end-CHS (0x3ff,254,63), startsector 652, 4672 sectors
201910-03 11:45:52 D: diffoscope.comparators.utils.specialize: Unidentified file. Magic says: DOS/MBR boot sector; partition 2 : ID=0xef, start-CHS (0x3ff,254,63), end-CHS (0x3ff,254,63), startsector 652, 4672 sectors
201910-03 11:45:52 D: Comparing (FilesystemFile) and (FilesystemFile)
201910-03 11:45:52 D: diffoscope.comparators.utils.file: Binary.has_same_content: <<class ‘diffoscope.comparators.binary.FilesystemFile’>> <<class ‘diffoscope.comparators. binary.FilesystemFile’>>
201910-03 11:45:53 D:  has_same_content_as returned True; skipping further comparisons
|####################################################|  100%  Time: 0:00:01
201910-03 11:45:53 D: diffoscope.tempfiles: Cleaning 0 temp files
201910-03 11:45:53 D: diffoscope.tempfiles: Cleaning 0 temporary directories

Well, so far, so good. The next tests have been done on images from different releases and with different file sizes. All of them resulted in an internal error that traces back to the `diff` command running out of internal memory. It looks like that there is a file size limit of about 50M. That’s why I have built two smaller images of 10M each, and handed it over to `diffoscope` for a comparison. Figure 6 shows the result. The output is a tree structure containing the file `nsswitch.conf` with the highlighted differences.

Figure 6: Comparing two ISO images using `diffoscope`

Also, an HTML version of the output can be provided. Figure 7 shows the output as an HTML file in a webbrowser. It is achievable via the switch

`–html output.html`.

Figure 7: Comparing two ISO images using `diffoscope` (HTML output)

In case you do not like the output style, or would like to match it with the corporate identity of your company, you can customize the output by your own CSS file using the switch `–css style.css` that loads the style from the referenced CSS file.  


Finding differences between two directories or even entire ISO images is a bit tricky. The tools shown above help you mastering this task. So, happy hacking!

Thank you
The author would like to thank Axel Beckert for his help while preparing the article.

Links and references

* [1] colordiff
* [2] colordiff, Debian package,
* [3] diffutils
* [4] diffutils, Debian package,
* [5] fldiff
* [6] xxdiff
* [7] dirdiff
* [8] diffoscope
* [9] diffoscope, Debian package
* [10] atool, Debian package
* [11] Brief introduction of some common image file formats
* [12] ISO 9660, Wikipedia
* [13] Rock Ridge, Wikipedia
* [14] Joliet, Wikipedia
* [15] Apple ISO 9660 Extensions, Wikipedia
* [16] How to verify ISO images, Linux Mint
* [17] Beware of hacked ISOs if you downloaded Linux Mint on February 20th!
* [18] icdiff
* [19] icdiff, Debian package
* [20] The Reproducible Builds Project
* [21] The Reproducible Builds Project, Debian Wiki

ONET IDC thành lập vào năm 2012, là công ty chuyên nghiệp tại Việt Nam trong lĩnh vực cung cấp dịch vụ Hosting, VPS, máy chủ vật lý, dịch vụ Firewall Anti DDoS, SSL… Với 10 năm xây dựng và phát triển, ứng dụng nhiều công nghệ hiện đại, ONET IDC đã giúp hàng ngàn khách hàng tin tưởng lựa chọn, mang lại sự ổn định tuyệt đối cho website của khách hàng để thúc đẩy việc kinh doanh đạt được hiệu quả và thành công.
Bài viết liên quan

Interfacing with GitHub API using Python 3

GitHub as a web application is a huge and complex entity. Think about all the repositories, users, branches, commits, comments,...

How to Install Cantata Music Player for Linux

Cantata Music Player is a feature-rich and user friendly client for Music Player Daemon (MPD). MPD is a background application...
Chưa phân loại, vps giá rẻ

How to Install Android in Dual Boot with Linux

The most dominant mobile operating systems are iOS and Android but did you know that if you have a laptop, you can run...
Chưa phân loại, vps giá rẻ