LaTeX, latex2html, BibTeX etc



This is another of those lectures that has a little bit of everything. The first subject is LaTeX (usually pronounced 'la-tech'). LaTeX is used primarily in academia but is also used in places where typesetting of documents is done. LaTeX is similar to HTML in that you type in the text, add a few commands and get whatever the program gives you. LaTeX is better in that you have specific controls over many aspects of the document and you can add packages which expand the capabilities of LaTeX. The most current version of LaTeX is LaTeX2e and the previous version is LaTeX-2.09. It can be a little confusing figuring out what is what. The new version supports most of the previous version, but has changed many things. For the most part the changes are for the better. I will try to make sure I tell you about LaTeX2e.

The man page for LaTeX is very sparse and tells you to buy a book. That is not unreasonable if you need to do any serious amount of typesetting using LaTeX. The best references are;

LaTeX: A Document Preparation System

    Leslie Lamport,
    Addison Wesley, 2nd ed, 1994. 

The LaTeX Companion

    Goossens, Mittelbach and Samarin,
    Addison Wesley, 1994. 
There are a couple of on-line documents that will get you started. One is The Not So Short Introduction to LaTeX2e which is available in several formats. (please DON'T print it, download it and use some viewer on the file because it is a SERIOUS waste of resources otherwise). Another is LaTeX2e for authors. Additional information about LaTeX2e can be found at http://www-sal.cs.uiuc.edu/latex/. LaTeX also has a homepage. There is also the excellent tutorial by Andrew Roberts on latex, which has been the source of information and inspiration for these lecture notes. Some local pages on LaTeX2e are here. There are some Hypertext Help pages available; I haven't used them but they seem pretty comprehensive.

Preamble

It always helps to see what a LaTeX document looks like. Here's an example1. The file contains both content and layout commands - just like HTML. example2 is a much more complex LaTeX document that gives you an idea about using figures, tables and much more. This is an actual document that was used to generate a conference paper.

Before we start cracking at LaTeX, hello.tex is another program, a very simple "hello world".

% hello.tex - It is generalla good idea to make the first line a comment. This is what we see here. COmments are specified by the percent symbol (%), which when Latex sees, simply ignores the rest of the line.

\documentclass{article} - This line tells Latex to use the article document class. A document class file defines the formatting, which could be of different types like article, journal format etc. The handy thing is that if you want change the appearance of your document, substitute article for another class file that exists. Make sure the class file actualy exists.

\begin{document} - This command alerts Latex that the content of the document is about to commence. Anything above this command is known generally to belong in the preamble.

Hello World! - This was the only actual line containing real content

\end{document} It tells Latex that the document source is complete.

In order to use LaTeX on a document, first type the document. LaTeX makes some assumptions and does some formatting without specific direction. So be careful, it may not always behave the way you expect it to!

Paragraphs are set off from each other by a blank line. If there are multiple blank lines, LaTeX ignores the extra ones. Words are separated from each other by a space, multiple spaces are compacted into one space.

LaTeX uses a number of characters for special purposes as part of its package. In order to use them in text you have to do something special. For the characters $ & % # _ { } , just precede them with a \ (backslash). To get the backslash, tilde and caret you must start an environment using "\begin{verbatim}" on a line by itself and ending with "\end{verbatim}". This works somewhat like <pre> in html. There is another version of this if you want just a few special characters and it is \verb(char)[text](char), i.e.


I want the \verb next \^~~$$%%2 set of characters to be verbatim.

results in

I want the next \^~~$$%% set of characters to be verbatim.
This concept of an environment is important to LaTeX. Every part of the document is in some environment. You can think of environments as boundaries of operation. The body of the text is in the document environment started with "\begin{document}" and ending with "\end{document}". Within the document can be many other environments, either small ones like \verb or larger ones like \begin{verbatim}.

LaTeX documents start with "\documentclass[(options)]{(class)}". Some standard classes are article, book, report, letter and slides. Options include landscape, final|draft, twoside, and twocolumn. This tells LaTeX what basic kind of things it will have to do to produce the type of document you want. To start using them begin the text of the document, you use "\begin{document} and end with "\end{document}". This is literally all you have to do to create a LaTeX document, type the text, put in a "\documentclass[(options)]{(class)}" and \begin{document}" and put "\end{document}" at the end. ( This is a small sample of an input file.) After you have saved your text file as a .tex file do


UNIX> latex mydoc.tex 

This will produce a huge amount of output including several files. The main document will be in mydoc.dvi , additional information in mydoc.aux and the entire output will be in mydoc.log. As LaTeX is not usually included in UNIX distributions it often resides in directories other than /usr/bin and /usr/local/bin. In our case it is in /usr/local/teTeX/bin/latex.

Here are some samples of the files produced by a run of LaTeX on a file provided with the distribution which is named "mydoc.tex"


Viewing the Output

You note the output is a file with an extension of .dvi. This is a "device independent" file. It is easily viewed using xdvi. This is similar to ghostview but works on dvi files instead of PostScript files. You can print .dvi files using lpr with the -d filter option but I suggest converting it to PostScript first. If you do want to convert this to a PostScript file you can use dvips. There are several programs available to do the PostScript conversion but dvips has become the standard. To use dvips just

UNIX> dvips -o myfile.ps myfile.dvi 

The -o causes the output to be written to a file. If you do not specify a filename then the base of the input filename is used with a .ps extension. If you do not use the -o option the output will be sent to your default printer, which could be very wasteful. You do not have have to use the extension on the input filename, because dvips assumes that it is .dvi. Once the output has been converted to ps form, it is a good idea to convert them to pdf extension. Use ps2pdf to convert from ps to pdf,

UNIX> ps2pdf  [options...] (input.[e]ps|-) [output.pdf|-]


Fancy Stuff

Top Matter

Most documents will be start with information about the document itself , such as the title and date, and also information about the authors, such as name, address, email etc. All of this type of information within Latex can be collectively classified as the top matter environment. An example:

\title{How to Structure a \LaTeX{} Document}
\author{Andrew Roberts\\
  School of Computing,\\
  University of Leeds,\\
  Leeds,\\
  United Kingdom,\\
  LS2 1HE\\
  \texttt{andyr@comp.leeds.ac.uk}}
\date{\today}
\maketitle
The \title command is fairly obivous. Simply put the title you want between the curly braces. \author is also easy, note though it has all sorts of other information along with the name. This is merely a common, albeit, ungraceful hack, due to the basic article class. In the meantime, you can see how the new line command (\\) has been used so that address can be written. The email address is at the end, and the \texttt commands formats the email address using a monospaced font. The \date command takes an argument to signify the date the document was written. I've used a built-in command called \today which, when processed by Latex, will be replaced with the current date. But you are free to put whatever you want as a date, in no set order. Generally, dates are not added when you write a jornal article. If braces are left empty, then the date is then omitted. Without \maketitle, the top matter would not appear in the document. So it is needed to commit your article attributes to paper.

Abstract

As most research papers have an abstract, then there is a predefined commands for telling Latex which part of the content makes up the abstract. This appears after the top matter, but before the main sections of the body.

\begin{abstract}
Your abstract goes here...
...
\end{abstract}
Take a look at example2 to get an idea of how the abstract might look like. Part of the reason to use LaTeX is that it allows you to provide text enhancement easily. One of the things it does wonderfully is math. You can \begin{math} and \end{math} or use short forms such as \(...\) or $...$ . For instance if you need to write 234 you can do 2\(^{34}\).

Basic math

Elementary arithmetic operations: The plus (+), minus (-), division (/) symbols have the usual meaning. To denote multiplication explicitly (this is rarely necessary), use \cdot (producing a centered dot) or \times (producing an "x"). The "equal", "less than", and "greater than" symbols on the keyboard work as expected; to get "less than or equal", use "\le"; similarly, "\ge" gives "greater than or equal". Square roots: Square roots are generated with the command \sqrt{...}. For example, $z=\sqrt{x^2+y^2}$. Subscripts and superscripts: These are indicated by carets (^) and underscores (_), as in $2^n$ or $a_1$. If the sub/superscript contains more than one character, it must be enclosed in curly braces, as in $2^{x+y}$. Fractions and binomial coefficients: Fractions are typeset with $\frac{x}{y}$, where x stands for the numerator and y for the denominator. There is a similar construct $\binom{x}{y}$ for binomial coefficients. Operators: TeX has commands for common mathematical "operators" or "functions", such as \sin, \cos, \log, \ln, \exp, \arctan, etc. You should always use these commands instead of simply typing "sin", "cos", etc., without the backslash. Using the TeX commands ensures that the operators get typeset in the proper font and takes care of the spacing surrounding these operators.

More math

Greek letters and other special characters: The commands for Greek letters are easy and intuitive: Just type $\epsilon$, $\delta$, $\nu$, $\phi$, etc. To get upper case versions of these letters, capitalize the appropriate command; e.g., $\Delta$ gives a "cap-Delta" (which looks like a triangle). Parentheses: The symbol pairs (), [], and \{ \} (note the backslash!) generate round, square, and curly parentheses in normal size. They work fine in math mode, but mathematical expressions often look better if the parentheses are enlarged to match the size of the expression. There are ways to manually enlarge these parentheses (by preceding the symbol with a command like \big, \bigg, \Big, etc.), but one rarely has to use these, since TeX can (in most cases) automatically size parentheses. To let TeX do the sizing, precede the left brace by \left, and the right brace by \right.
\[
\left|\sum_{i=1}^n a_ib_i\right|
\le
\left(\sum_{i=1}^n a_i^2\right)^{1/2}
\left(\sum_{i=1}^n b_i^2\right)^{1/2}
\]

Displayed equations

Single line displays: To get a single line, displayed equation (without equation number), just use the pair "\[", "\]". If you want TeX to automatically number the equation, use instead the \begin{equation} ... \end{equation} environment. (The asterisk variant, \begin{equation*} ... \end{equation*}, turns off the equation numbering, and is equivalent to typing \[ ... \].) Multi-line equation environments: Things get more complicated if you have multi-line equations that need to be lined up at suitable places. For most situations, the \begin{align} ... \end{align} environment, and its variant \begin{align*} ... \end{align*}, are sufficient. As with the equation environment, the asterisk version does not automatically number equations.

Spacing in math mode

In math mode (both ordinary and display math), TeX decides on spacings between symbols in math mode, using rather sophisticated algorithms; in particular, any blank spaces inside math mode are ignored, For example, the formula "$a^2 + b^2 = c^2$ could have been typed as "$a^2+b^2=c^2$", or even placed on two different lines, without any difference in the output. Letting TeX figure out the spacings almost always results in very good looking output, and you should avoid putting explicit spaces into mathematical formulas. However, there are a few situations where one does need appropriate spacing instructions. For those cases, there is a standard spacing command, "\quad" which generates the right amount of horizontal spacing to separate two equations on the same line, or a formula from an associated range or condition (such as "n=2,3,...") that is given on the same line, usually in parentheses.

Besides math, LaTeX has standard enhancements like \emph{} which emphasizes the text, usually by printing it in italics. \textsl{} does slanted text, \textit{} does Italicized, \textbf{} is bold font, \texttt{} is typewriter. You can change sizes in short pieces using \tiny, \large, \small,\normalsize, etc. Some of these commands are fragile, which means that if they are used within certain environments they must be \protected, you should refer to documentation if you are having problems. Another thing LaTeX does is provide counters for automatic referencing of tables, figures, sections, etc. This allows you put things where you want and let LaTeX keep track of things. Many of these types of references are not really simple so I will talk only about sections and subsections.

Like html does with <hn> LaTeX changes the size and face of the text used for section and subsection titles. You start a section using "\section{label}" and a subsection using "\subsection{label}". LaTeX will take care of numbering. When a new section is started the subsection numbers are restarted.

A last thing I will mention is the use of PostScript in LaTeX. Strangely enough the eps files that jgraph produces are easily included in LaTeX documents. At the beginning of the file after the "\documentclass{}" and before "\begin{document}" put in the line "\usepackage{graphics}". Then wherever you want an eps figure to appear in the document put in the line "\includegraphics{filename}". Pretty simple huh?


Inserting single figures into LaTeX articles

First off, you have to have some text in your file before you can include any diagrams without LaTeX complaining. Then you can insert a picture with something like:
\begin{figure}[htp]
\centering
\includegraphics{figure1}
\caption{Transverse momentum distributions}\label{fig:erptsqfit}
\end{figure}
* [fig:figure1] is your basic picture.

* The [htp] command works like usual to tell LaTeX how to position the graphic in the text.

* The placement specifier p lets the figure take up a full page without waiting until the end of the chapter

* centering tells LaTeX to centre everything within the graphics environment.

* Includegraphics is your basic command to include a graphics object. (The bits in [...] shown below are various options that can be omitted. In the curly brackets is the name of the file you want included.) As you can see the extension of the graphics filename is omitted - LaTeX looks for this file with various possible extensions - .eps, .ps, ... Note: LaTeX will look for filename.ps before filename.eps.

*The most important option in [...] is the size. There are a few choices that are useful, [width=xxx cm], ... [height=xxx cm], etc. A slightly more complicated version looks like the below example. This is used to select only a portion of your picture, and the part of the caption in square brackets is taken as the caption in a list of figures. This means that you can format the caption that appears with your figure differently to the way it will appear in the list of figures. Also, the figure caption will typically be longer, and more detailed than that which appears in the list of figures.

\begin{figure}[htp]
\centering
\includegraphics[totalheight=0.8\textheight,viewport=50 260 400 1000,clip]
{erptsqfit}
\caption[Transverse momentum distributions - E-R model.]
{Transverse momentum distributions - E-R model fit (intercept 1.2).}
\label{fig:erptsqfit}
\end{figure}
The biggest problem most people face with figures and latex is to have them in the right format. As mentioned above, the figures in latex should be of type .eps, extended post script. There are a lot of ways to convert jpegs to eps in unix and windows. Andrew Roberts tutorial on importing graphics steps you through one such process for Linux and Windows.


Tables

Tables are a common document inclusion, here is a simple example

\begin{table}[htbp]
%\begin{center}
%\begin{tabular}{|c|c|c|c|}
\caption{Failure states detected by the leader robot and
implemented recovery actions from [15]}
\begin{tabular}{|p{110pt}|p{115pt}|}
\hline
\textbf{Failure Type}&
\textbf{Fault Recovery Action} \par \\
\hline
Can't reach waypoint \par &
Re-plan path. \\
\hline
Lost simple robot \par &
Leave lost robot in wait state and move on to next robot in chain.   \\
\hline
Leader robot camera failure \par &
Leave simple robot(s) in wait state, send camera failure feedback to human 
operator and return home.   \\
\hline
Simple robot motor failure \par &
Check if simple robot is close to goal; if so, change simple robot state
 to sensor detection and proceed; else, leave simple robot in wait state and proceed. \\
\hline
Localization drift&
Check if simple robot is close enough to goal; if so, change simple robot 
state to sensor detection and proceed;
else, leave simple robot in wait state and proceed; \\
\hline
Lost marker \par &
Leave simple robot in wait state and move on to next robot in chain. \\
\hline
Communication failure&
Return back home. \\
\hline
\end{tabular}
\label{tab1}
%\end{center}
\end{table}
The table is opened with \begin{tabular} after that the parameters are specified , the parameters indicate how each column should be displayed. A pipe symbol ('|') is used to designate a vertical line so one is placed at the start so that the table will have a left edge, the next character is an 'l', this specifies that we would like this column to be left justified, the choices are (l - left, r - right, c - centre). So you can see how the column justification has been specified for each column and the columns have been separated by vertical lines and the there is a vertical line after the last column to close off the right edge of the table. Each line is terminated with a double backslash. Next comes the table data itself, well actually, first we draw a horizontal line with the \hline directive to indicate that the table should have a top to it. Then comes the table headings, the '&' character is used as a column separator and each row is terminated with a double backslash (\\). Finally, we end the table with a horizontal line so it has a bottom to it and then close of the tabular directive and our center directive.

Well, that is enough to get you started. I recommend that you give LaTeX a try, especially if you are planning to go on to grad school. Most theses are written using LaTeX.


latex2html

This one does what it says. It takes a LaTeX document and turns it into an html document. That is not to say it is simple. First, prepare a CLEAN directory to work in. This thing can potentially generate dozens of files. Make sure the command is in your path - use /usr/local/latex2html/ as the path. Then copy the input file into this new directory and let go.
UNIX> latex2html -split 0 -no_navigation -no_subdir myfile.tex
I guess I need to explain that. First the man page says pretty much nothing. You need to refer to the PostScript manual to figure what you need to do. The -split 0 means only generate 0 levels of document, which translate to a single html document. The on-line help pages I listed above were generated with latex2html and that one page per html document drives me nuts. The -no_navigation tells it NOT to put those page navigation links in the document. -no_subdir tells it to put all the files in the current directory.

You are of course free to do what you like, but I find that latex2html tends towards overkill. Do not get me wrong. I think it is a great piece of software. But I end up having to go in fix the results. Some things don't translate well. Some choices seem to made on the side of fancy stuff versus KISS. That sort of thing. I will say that it saved my neck when a professor asked me to put his lecture notes on-line and they were ALL in LaTeX!


ispell

How many of you have needed a spell checker for simple things you have authored and did not have one? Well ispell is there to help. It is a very straight forward implementation that is pretty efficient. It is invoked using
UNIX> ispell mydoc.txt
and will then give you an interactive environment. It scans the input file, flags what it perceives to be misspellings and allows you to do several things with single key strokes. A accepts the word as entered for the rest of the session, something you should do with acronyms. Space accepts the spelling this once. R allows you type in a new word to replace the one misspelled. I accepts the word exactly as is and adds it to your private dictionary (created and maintained by ispell). Then if ispell thinks you might have actually tried to spell a real word, it will present you with a numbered list and by typing in the number you replace the misspelling with the offered word. A note on this. When there are more than nine (9) offerings, ALL numbers must be 2 digits, thus one (1) becomes 01, two 02, etc.

The man page on ispell is fairly easy to comprehend. Read it to find out more about the menu options, which by the way are displayed on the bottom of the screen. The man pages also talk about user dictionaries, and many special things you can do with ispell. One of the things I like is that if you allow it run all the way through, it saves the original file, with a .bak extension, and writes the corrected file to the original filename.

A last note. ispell has a -t option that tells it the input file is in LaTeX format so it ignores things preceded by a \. It may also ignore things surrounded by {} braces. Read the man page.


ps2pdf: PostScript-to-PDF converter

ps2pdf is a work-alike for nearly all the functionality (but not the user interface) of Adobe's AcrobatTM DistillerTM product: it converts PostScript files to Portable Document Format (PDF) files.

Note: ps2pdf is implemented as a very small command script (batch file) that invokes Ghostscript, selecting a special "output device" called pdfwrite.

Usage

The usage for ps2pdf is
ps2pdf [options] input.[e]ps output.pdf 


BibTeX

BibTeX is a program and file format designed by Oren Patashnik and Leslie Lamport in 1985 for the LaTeX document preparation system. The format is entirely character based, so it can be used by any program (although the standard character set for accents is TeX). It is field (tag) based and the BibTeX program will ignore unknown fields, so it is expandable. It is probably the most common format for bibliographies on the Internet.

Software Support

The BibTeX program uses style files, a list of citations from LaTeX, and a BibTeX database to create a LaTeX file listing the cited references. Examples
@article{Gettys90,
   author = {Jim Gettys and Phil Karlton and Scott McGregor},
   title = {The {X} Window System, Version 11},
   journal = {Software Practice and Experience},
   volume = {20},
   number = {S2},
   year = {1990},
   abstract = {A technical overview of the X11 functionality.  This is an update
of the X10 TOG paper by Scheifler \& Gettys.}
}
Format Description Special features The @STRING command is used to define abbreviations for use by BibTeX. The command @string{jgg1 = "Journal of Gnats and Gnus, Series~1"} defines 'jgg1' to be the abbreviation for the string "Journal of Gnats and Gnus, Series~1". The @COMMENT command lets you put any text inside it. It isn't really necessary, since BibTeX will ignore any text that isn't inside an entry. However, you can not have an @ character outside of an item.

Standard entry types


@article
    An article from a journal or magazine. 
@book
    A book with an explicit publisher. 
@booklet
    A work that is printed and bound, but without a named publisher or sponsoring institution. 
@conference
    The same as inproceedings. 
@inbook
    A part of a book, which may be a chapter (or section or whatever) and/or a range of pages. 
@incollection
    A part of a book having its own title. 
@inproceedings
    An article in a conference proceedings. 
@manual
    Technical documentation. 
@mastersthesis
    A Master's thesis. 
@misc
    Use this type when nothing else fits. 
@phdthesis
    A PhD thesis. 
@proceedings
    The proceedings of a conference. 
@techreport
    A report published by a school or other institution, usually numbered within 
	a series. 
@unpublished
    A document having an author and title, but not formally published. 
example2.bib is a sample .bib file written in conjunction with example2.


How to generate a pdf document from LateX and BibTeX?

   1. Create a database (.bib) file that describes the articles that you want to
      reference.
   2. Specify the style and location of the bibliography in your LaTeX document.
   3. Process the .tex file with LaTex
      This results in a new file, example.dvi, being created.
   4. Use dvips to convert the .dvi file to a .ps file.
   5. You can then create a .pdf file using ps2pdf
 
example2.pdf is the pdf file generated for example2 and example2.bib

Why should you use tex?

    * Let the style file worry about formatting the bibliography.
    * Avoid retyping the same references for your next paper 
    * It is more efficient.
    * It is not hard! 
    * If you plan to go to grad school, it is a requirement!!