UNIX Utilities


unix2dos

Some of you would like to work at home and bring in files or ftp in files that you have created on your IBM PC using some Microsoft product. Now assuming that the files are basically ASCII that is great. However, DOS uses a different set of characters for many things especially the end-of-line characters. If you look at a file created under DOS you will most probably note ^M (control-M) characters in the file. These and some others you can't see cause problems on UNIX systems.

The cure is simple. Use dos2unix to convert these characters to standard UNIX characters and use unix2dos to change UNIX text files to something that DOS will recognize.

mtools

Assume that you wrote this program at home. It compiled and ran and now you want to bring it into campus and put it in your home area. Further assume that your modem just died. What will you do? Simple. Copy the file to a zip diskette and bring it in. You take your disk to the Cetus lab because you know that there are zip drives on those machines and you stick in the disk. Now what?

Now you use mtools. The mtools package allows a PC-format Zip disk to be used with the Suns. This is not a utility but rather a set of utilities that allow you to access the drives of our machines. Man mtools yields a page describing the separate utilities and their basic usage. Each individual utility has its own page as well.

Under Solaris, you must first mount the drive by running volcheck. This checks the media in the drive and lets the OS know that the media is valid. Then you can use the other tools:

There are some more tools but they don't get used too often. What I have listed are enough to do most operations.

There is a problem with long UNIX filenames. If the name is more than 8 characters or contains characters DOS does not like it is truncated and a tilde (~) is put in. If the extension is not there or has more than 3 characters similar things are done. I suggest that you read the man page concerning this. It has a better explanation than I can give.

Compression

Ever get a nastygram from root, complaining that you are using too much disk space? Or perhaps work on a system with disk usage quotas? When your appetite for data exceeds your storage capacity, you have several choices: delete things, move them elsewhere, steal other people's disk space etc. -- or compress your files. Compression is the simplest and can save a lot of space. There are several compression utilities available, but we will talk about the 4 most common ones.

The first is compress . This is available in some version with all the current UNIX releases. It does an okay job of compression. Syntax is simple

UNIX> compress myfile

and it creates myfile.Z in place of myfile. There are some options that I never use but are in the man page. To uncompress the file use uncompress and if you want to see what is in the file but NOT uncompress it use zcat, which is like cat but for compressed files.

The second utility is gzip (GNU zip) which does the same type of thing as compress but usually does much better compression. Just type "gzip filename" and it produces a file with a .gz extension. To uncompress these files use gunzip.

Gzip has some nice options but I like that you can specify what level of compression to use. The greater the compression the slower the job but if the interest is saving space, a minute or two won't matter. Gunzip has an option I use a lot. -c. This tells gunzip to write the uncompressesd file to standard output. We'll see why you might want to do this in a minute. Oh, and this can uncompress files compresses by compress and sometimes pkzip too.

The third utility, bzip2, is still fairly new and is just coming into wide usage. It usually provides better compression than either gzip or compress but can be slow sometimes. You cannot count on finding bzip2 on every UNIX box you encounter, though. Luckily we do have it installed here at UT CS. Usage, again, is simple: "bzip2 filename" will create a compressed file with the ".bz2" extension. Use "bzip2 -d filename.bz2" to uncompress a file.

If you have files that came from you don't know where that are obviously compressed and none of these 3 will uncompress them, try unzip. This one claims it is compatible with most every O/S compressions scheme. There is a zip as well. I find I use the Gnu product more because it is also readily available on Linux too, and works the same way it does here.

tar

While we're on the subject of compression, let's talk about tar. tar creates a tape archive from a list of files or directories, but it does not provide any compression. Instead, one generally tars a collection of smaller files into one big tarfile (or tarball) and then compresses the tarfile.

Tar has simple syntax but a couple of peculiarities. One of the peculiarities is that tar assumes that it is writing to a tape drive, specifically it reads the file /etc/defaults/tar for a drive. In order to create a tarfile then you need to use the f option

UNIX> tar -cvf myfile.tar mydir
This creates the tarfile "myfile.tar" and writes to it the mydir directory, the files in the mydir directory, subdirectories, files, etc. The default action for tar is to recursively search the listed directories and place the files in the archive.

The options for tar are supplied without leading dashes (-). The modifiers are placed in the order the options appear. In the above example the "c" means create and write at the beginning (remember, tape drive). The "v" is verbose , it tells you what its doing, and the "f" says use the filename as the name of the tarfile instead of some drive. The .tar file extension is not required but helps alleviate confusion. After tar is done, the original files still exist, you have only created copies of them. Also, I recommend that you don't start with the current directory because tar will try to incorporate the tarfile into the tarfile, not good because it is constantly growing.

The tarfile can be extracted using "xf myfile.tar". Wherever you do this, the default behavior is to restore the original directory structure so if you are not careful things can get a little confused. And when restoring, the tarfile is not modified, it still exists as before.

A tarfile can be updated using "u" and the files specified are written to the end of the file IF they are not already in the tarfile or they are newer than the ones in the tarfile. Files can be replaced using "r" and the listed files are written at the end of the tarfile, regardless of their status. Lastly you can find out what is in a tarfile by using "t", table of contents.

The best way to handle those pesky left-over files other than deleting them from the system is to tar the files then compress or gzip the tarfile.

I had talked earlier about the -c option to gunzip. Rather than first uncompressing a tarfile and the extracting the files from it, you can do this all in one line

UNIX> gunzip -c compressed.tar.gz | tar xvf - <

In this command the file is uncompressed and written to standard output without actually changing the input file. Then tar is told to extract the file - which of course is read as Standard Input. This can be important to do because if you simply uncompress the file, then extract it, you have DOUBLE the number of bytes. If you have barely enough space left for the expanded tarfile you can run out of disk space and cause lots of problems.

Comparing Files

Have you ever wanted to see if two files were exactly the same, or if they are not the same, how they differ? Well for text files you can use diff. This utility takes as arguments two filenames and produces as output the minimal information required to convert the first file to the second file. If the files are identical there is no output. diff's output is a little cryptic. Here is a sample
UNIX> diff textfile1 textfile2
1a2,4
>
>
>
12a16
> perl, python, rcs, jgraph,
15c19
< The first 3 topics will be covered in the order listed to try to give a
small
---
> The first 5 topics will be covered in the order listed to try to give a
small
23c27
< Grading: Pass/Fail
---
> Grading: A-F
24a29,35
> Requirements:  In order to get a pass in this course, you will need to
> COMPLETE all lab assignments with an overall average of 70%.  You will
> also be required to attend class.  I will allow 3 absences before I begin
> to penalize you.  After that I will reduce your overall grade by 10% per
> additional absence.  Even if you miss a class, you will still be required
> to complete the lab associated with that class on time.
>
35c46
< otherwise each day later will incur a penalty of 10%.  This may seem harsh
---
> otherwise each day late will incur a penalty of 10%.  This may seem harsh
55,56d65
< Miscellaneous: This is just something I put in here so that diff would
have
< some lines to work on.

The < symbols indicate the content of the lines affected in file1 and the > symbols indicate lines affected in file2. The output has lines of these forms; If the starting line and ending line are the same then they are abbreviated to a single number. It is sort of confusing but if you do it often enough it starts to make sense. More importantly, it is a fast way to tell if the files differ.

What about binary files like a.out and gifs or jpeg files? There is a similar utility that handles these and it is cmp. This does a byte-by-byte comparison and says nothing if the files are the same. If the files differ, cmp reports the first differing byte to the user.

Shipping Things Around

You already understand whatever email program you use, I hope. Well, they all have a peculiar problem. Depending on the system they may not handle binary data. To get around this, uuencode/uudecode was developed. These were desigend to work with uucp (Unix-to-Unix system copy) but not every version is the same, uses the same algorithm or even is compatible with all UNIX systems. Along comes MIME (Multi-part Internet Mail Extension) and mimencode. mimencode takes the place of uuencode/uudecode. It works well in pipes. It is compatible with any mailer that can handle MIME. It will allow you "safely" send binary data to-and-fro. This data often comes across in MIME mail as "base64" encoding.

Pine uses this without your knowledge if you send a binary. MH will use it if you do some fancy stuff or use "mhn -store". One of the common usages of either of these encoders is to send files that cannot be read without knowing what has happened to the file. Like using compress on some tarfile and then mimencode prior to mailing it. I don't really expect you to use this a lot, but you do need to know that it is there. Oh, and mimencode is also installed as mmencode but don't do man on the latter.