The Serpent

// Cursing the Internet since 1998

The Art of tar.gz

Posted October 5, 2007 Linux

One of the very first tasks that baffled me when I started using Linux was extracting files. The reason being, I was unfortunately (at the time anyway) introduced to source code distributions rather then more convenient ways of distribution software such as RPM’s.

This didn’t really hold much hope for me, since compiling and installing software for Linux is usually no small task, yet here I was, struggling to extract the files in the first place. That’s why this guide exists - because one of the first tasks you’ll need to do in Linux is gain access to all the goodies you download, and the chances are… they’ll come in tar.gz format.

What is tar.gz?

While the majority of Linux files do not require filename extensions, you’ll come across a lot of files with the above extension. It’s a little vague at first, but when you understand it - it actually makes total sense. A tar.gz file is an archived and compressed file. This means there are actually two applications at work on our tar.gz file, and in order to extract the contents, two applications will be needed.

Your Linux machine most likely comes with the tools need to create and extract files, the first being ‘tar’, and the second being ‘gunzip’. If you enter each command into the shell, you should get output similar to the following:

[joey@linux ~]$ tar
tar: You must specify one of the `-Acdtrux' options
Try `tar --help' for more information.
[joey@linux ~]$
[joey@linux ~]$ gunzip
gunzip: compressed data not read from a terminal. Use -f to force decompression.For help, type: gunzip -h
[joey@linux ~]$

If, however you receive the following output:

[joey@linux ~]$ tar
bash: tar: command not found
[joey@linux ~]$

Then your Linux machine (rather strangely) does not have the basic GNU utils. You’ll need to use your package manager to install them, or head over to http://www.gnu.org/software/tar/ and http://www.gzip.org.

Assuming you have the software, your ready to see what it can do. To help you first understand how a tar.gz file came to be, we’ll create a few test ones. Afterwards, you’ll hopefully know that a tar.gz file is simply a file that has been archived (using tar), and then compressed (using gzip).

Creating A Tar File

It is possible to simply have a .tar file. Tarfile’s files stem back to the older UNIX days when multiple files needed to be archived into one simple, manageable file, usually to be stored on a tape drive for backup purposes. It’s important to remember that tar does not compress files, it only archives them.

You can get help on any UNIX command by viewing it’s man page, simply issue the man command, followed by the name of the program you wish to read up on:

[joey@linux ~]$ man tar

So let’s try and archive a couple of files in our home directory. For this test, we’re first going to create some files, and then archive them:

[joey@linux ~]$ touch test_file1 | echo "This is test file 1" > test_file1

The above command creates a blank file called ‘test_file1’ and then writes the string ‘This is test file 1’ to the new file. We’ll do the same for the second file:

[joey@linux ~]$ touch test_file2 | echo "This is test file 2" > test_file2

Now we have two files in our home directory, lets have a look at the options to archive them. Tar has a lot of options, but for now, try:

[joey@linux ~]$ tar -cf archive.tar test_file1 test_file2

We’ve called tar with two options. The first tells tar that we want to create an archive (-c), and the second tells it that we want to call the archive the following (in our case, archive.tar). Then we list the files to include in the archive. If you look in your home directory, you’ll have the file archive.tar. Almost there.

There are many options you can use with tar, including appending files to an existing archive, or simply listing the files within one. Read the man page, and see if you can find new things to do with your experimental archive. Common options include -v, which lists exactly what is being added to an archive, and if you want to archive an entire directory, simply choose the directory name instead of files to list:

[joey@linux ~]$ tar -cvf archive.tar my_dir/
my_dir/
my_dir/file1
my_dir/file2
my_dir/file3
[joey@linux ~]$

Extracting A Tar File

Now we’ve got our files archived in archive.tar, lets delete our original files and extract them back:

[joey@linux ~]$ rm test_file1 test_file2
[joey@linux ~]$ tar -xvf archive.tar
test_file1
test_file2
[joey@linux ~]$

As before, we need to include at least two switches, the -x tells tar we want to extract a file, and the -f tells it that the following filename is the archive to extract from. I’ve also used the -v switch to display output about what is happening.

That’s all there is to creating and extracting a basic tar file. You may be wondering why you would want to wrap multiple files into a single file without any compression, but there are actually many benefits. Organisation is the main reason. if you need to backup hundreds of users home directories onto magnetic storage, it would be far simpler to have a single file to represent them, then the entire directory structure. Tar retains settings such as permissions, owners and directory structure, and it’s able to extract single files from an archive, or the entire thing.

Of course, nowadays, compression is a natural part of file distribution, and tar works perfectly with gunzip.

To quickly recap – a tar file is many files\directories archived into a single file, with no compression. To create a tar file, you use the -c switch, and to extract, you use -x.

gunzip – The gz part

However, most files you’ll download from the Internet will be tar.gz format, so what’s the difference? They are simply compressed, archived files – or tarballs.

Tarball’s is a cute little term created by the UNIX community to represent these tar.gz files. Since there are still parts of the Internet that are rather slow, it made sense to distribute packages with a little bit of compression on top, and still preserve the precious file structure that most source packages need. Gunzip was the answer.

In short, gunzip decompresses files which have been compressed with gzip. This is the only difference between the two applications.

Let’s go straight to some examples, first off, lets turn our archive.tar into a compressed archive using gzip:

[joey@linux ~]$ gzip -v archive.tar
archive.tar: 98.5% -- replaced with archive.tar.gz
[joey@linux ~]$

As you can see (thanks to the -v switch which give us more output), gzip has replaced our old archive.tar with archive.tar.gz. We now have our archive compressed. Because it wasn’t exactly a big archive to start with, it wont have compressed by much, but you get the point.

Now you know how these tar.gz files come to be, we can begin to learn how to extract them properly. Chances are you’ve found a method online and stuck with it, not actually knowing what the switches do, or if you actually need to use both tar and gunzip. Well hopefully the following will answer any questions you might have.

Extracting using gunzip

If you want to be old fashioned, you can uncompress the file using gunzip, and then use tar as before to extract the files. The following steps will accomplish this:

  [joey@linux ~]$ gunzip -dv archive.tar.gz
  archive.tar.gz: 98.5% -- replaced with archive.tar
  [joey@linux ~]$

As you can see, it does the exact opposite of compressing the file, and we are left with our original tar file. We can then use tar to extract the contents as before:

[joey@linux ~]$ tar -xvf archive.tar

There’s really nothing to it. Using the two steps above, you can now access the files which have been archived, compressed and distributed to you. However, there are simpler ways.

Speeding Things Up

As with most Linux tools, there is always more then one way to get things done. In recognition of this, the gunzip and tar tools can be used together in several ways.

First up, pipes! Pipe are a method of executing a command, and forwarding the output to another, in simple terms, it executes two commands in one. To uncompress and de-archive our file above, try the following:

First, create the file again:

[joey@linux ~]$ touch big_file
[joey@linux ~]$ tar -cf archive.tar big_file
[joey@linux ~]$ gzip archive.tar

Then, let’s use pipes to extract:

[joey@linux ~]$ gunzip < archive.tar.gz | tar -xv

This looks a little different, noticeably the pipe symbol |. This means that the result of the command gunzip is passed over to tar to extract the resulting file.

Alternatively, you can use the built in switches in tar. tar has an extra switch -z, which means ‘filter the archived file through gzip’. Give the following a try:

[joey@linux ~]$ tar -xzvf archive.tar.gz
big_file
[joey@linux ~]$

The command above takes advantage of the -z flag to merge the two commands together. In reality, your computer calls both commands, but seamlessly merges the results. This is most likely the command you’ll want to remember, as it’s the most simplest, effective way of extracting tarballs.

Summary

hopefully by now you know what these tar.gz files are, and the difference between a tar and a gz file. It’s possible to use one tool without the other, for example, running:

[joey@linux ~]$ gzip big_file

Will produce the compressed file ‘big_file.gz’, notice there is no ‘tar’ extension because we never archived it with tar. Tarballs are basically Windows ZIP files, with the exception that they are more efficient, and support far more storage mediums.

So to summarize, the following commands can be used to manipulate tarfiles and tarballs:

[joey@linux ~]$ tar -cvf archive.tar file1 file2 file3

Create an archive (-c) named archive.tar (-f) and display all results (-v)

[joey@linux ~]$ tar -xvf archive.tar

Extract (-x) archive named archive.tar (-f) and display all results (-v)

[joey@linux ~]$ gzip archive.tar

Compress archive.tar to a gz format

[joey@linux ~]$ gzip -d archive.tar.gz

Uncompress archive.tar.gz

[joey@linux ~]$ tar -xzvf archive.tar.gz

Extract (-x), and uncompress via gzip (-z) file archive.tar.gz (-f) and display all results (-v)

The Art of tar.gz
Posted October 5, 2007
Written by John Payne