A Beginner's Guide to HTML

A Beginner's Guide to HTML

This is a primer for producing documents in HTML, the markup language used by the World Wide Web.

Introduction

Acronym expansion

WWW
World Wide Web
SGML
Standard Generalized Markup Language - This is perhaps best be thought of as a programming language for style sheets.
DTD
Document Type Definition - This is a specific implementation of document description using SGML. One way to think about this is: Fortran is to a computer program as SGML is to a DTD.
HTML
HyperText Markup Language - HTML is a SGML DTD. In practical terms, HTML is a collection of styles used to define the various components of a World Wide Web document.

What this primer doesn't cover

This primer assumes that you have:

Creating HTML documents

HTML documents are in plain text format and can be created using any text editor (e.g., Emacs or vi on Unix machines). A couple of WWW browsers (tkWWW for X Window System machines and CERN's WWW browser for the NeXT) do include rudimentary HTML editors in a WYSIWYG environment, and you may want to try one of them first before delving into the details of HTML.

You can preview documents in progress with NCSA Mosaic (and some other WWW browers). Open the document using the Open Local option under the File menu. Use the Filters, Directories, and Files fields to locate the document or enter the path and name of the document in the Name of Local Document to Open field. Press OK.

If you see edits you want to make, enter them in the source file. Save the changes. Return to NCSA Mosaic and press the Reload button on the bottom menu. The edits are reflected in the on-screen display.

The minimal HTML document

Here is a barebones example of HTML:

____________________________________________________________________

  <TITLE>The simplest HTML example</TITLE>

  <H1>This is a level one heading</H1>

  Welcome to the world of HTML.  
  This is one paragraph.<P>

  And this is a second.<P>
____________________________________________________________________

Click here to see the formatted version of the example.

HTML uses tags to tell the World Web viewer how to display the text. The above example uses

HTML tags consist of a left angular bracket (<), known as a ``less than'' symbol to mathematicians, followed by some text (called the directive) and closed by a right angular bracket (>). Tags are usually paired, e.g. <H1> and </H1>. The ending tag looks just like the starting tag except a slash (/) precedes the text within the brackets. In the example, <H1> tells the viewer to start formatting a top level heading; </H1> tells the viewer that the heading is complete.

The primary exception to the pairing rule is the <P> end-of-paragraph tag. There is no such thing as </P>.

Note: HTML is not case senstive. <title> is completely equivalent to <TITLE> or <TiTlE>.

Not all tags are supported by all World Wide Web browsers. If a browser does not support a tag, it should just ignore it, though.

Titles

Every HTML document should have a title. A title is generally displayed separately from the document and is used primarily for document identification in other contexts (e.g., a WAIS search). Choose about half a dozen words that describe the document's purpose.
In NCSA Mosaic, the Document Title field is at the top of the screen just below the pulldown menus.

The directive for the title tag is <title>. The title generally goes on the first line of the document.

Headings

HTML has six levels of headings (numbered 1 through 6), with 1 being the most prominent. Headings are displayed in larger and/or bolder fonts than the normal body text. The first heading in each document should be tagged <H1>. The syntax of the heading tag is:
  <Hy>Text of heading</Hy>
where y is a number between 1 and 6 specifying the level of the heading.

For example, the coding for the ``Headings'' section heading above is

  <H3>Headings</H3>
Title versus first heading: In many documents (including this one), the first heading is identical to the title. For multi-part documents, the text of the first heading should be suitable for a reader who is already browsing related information (e.g., a chapter title), while the title tag should identify the node in a wider context (e.g., include both the book title and the chapter title).

Paragraphs

Unlike documents in most word processors, carriage returns and white space in HTML files aren't significant. Word wrapping can occur at any point in your source file, and multiple spaces are collapsed into a single space (except in the <TITLE> field). Notice that in the barebones example, the first paragraph is coded as

  Welcome to HTML.
  This is the first paragraph. <P>
In the source file, there is a line break between the sentences. A Web browser ignores this line break and starts a new paragraph only when it reaches a <P> tag.

Important: You must end each paragraph with <P>. The viewer ignores any indentations or blank lines in the source text. Without the <P> tags, the document becomes one large paragraph. HTML relies almost entirely on the tags for formatting instructions. (The exception is text tagged as ``preformatted,'' explained below.) For instance, the following would produce identical output as the first barebones HTML example:

________________________________________________________________________

  <TITLE>The simplest HTML example</TITLE><H1>This is a level 
  one heading</H1>Welcome to the world of HTML. This is one 
  paragraph.<P>And this is a second.<P>
________________________________________________________________________

However, to preserve readability in HTML files, headings should be on separate lines, and paragraphs should be separated by blank lines.

Linking to other documents

The chief power of HTML comes from its ability to link regions of text (and also images) to another document (or an image). These regions are typically highlighted by the browser to indicate that they are hypertext links.
In NCSA Mosaic, hypertext links are in color and underlined by default. It is possible to modify this in the Options menu as well as in your .Xdefaults file.
HTML's single hypertext-related directive is A, which stands for anchor. To include anchors in your document:

  1. Start by opening the anchor with the leading angle bracket and the anchor directive followed by a space: <a
  2. Specify the document that's being pointed to by giving the parameter href="filename.html" followed by a closing angle bracket: >
  3. Enter the text that will serve as the hypertext link in the current document (i.e., the text that will be in a different color and/or underlined)
  4. Enter the ending anchor tag: </A>
Below is an sample hypertext reference:

  <a href="MaineStats.html">Maine</a>
This entry makes ``Maine'' the hyperlink to the document MaineStats.html.

Uniform Resource Locator

A Uniform Resource Locator (URL) refers to the format used by WWW documents to locate other files. A URL gives the type of resource being accessed (e.g., gopher, WAIS) and the path of the file. The format used is:

scheme://host.domain[:port]/path/filename
where scheme is one of:
file
a file on your local system, or a file on an anonymous ftp server
http
a file on a World Wide Web server
gopher
a file on a Gopher server
WAIS
a file on a WAIS server
The scheme can also be news or telnet, but these are used much less often than the above. The port number can generally be omitted from the URL.

For example if you wanted to insert a link to this primer, you would insert

  <A HREF="http://www.ncsa.uiuc.edu/General/Internet/WWW/HTMLPrimer.html">
  NCSA's HTML Primer</A>
into your document. This would make the text ``NCSA's HTML Primer'' a hyperlink leading to this document.

Refer to the Addressing document prepared by CERN for additional information about URLs. A Beginner's Guide to URLs is located on the NCSA Mosaic Help menu.

Anchors to Specific Sections in Other Documents

Anchors can also be used to move to a particular section in a document. Suppose you wish to set a link from document A and a specific section in document B. First you need to set up what is called a named anchor in document B. For example, to add an anchor named ``Jabberwocky" to document B, you would insert
  Here's <A NAME="Jabberwocky">some text</a>.
Now when you create the link in document A, you include not only the filename, but also the named anchor, separated by a hash mark(``#''):
    This is my <A HREF="documentB.html#Jabberwocky">link</a>.
Now clicking on the word ``link'' in document A would send the reader directly to the words ``some text'' in document B.

Anchors to Specific Sections within the Current Document

The technique is exactly the same except the file name is now omitted.

Note: The NCSA Mosaic Back button does not work for an anchor within a document because the Back button is designed to move to a previous document. Move back manually within the document using the scroll bar. (The Back button will return to the start of a hyperlink effective with Version 2.0 of NCSA Mosaic.)

Additional markup tags

The above is sufficient to produce simple HTML documents. For more complex documents, HTML also has tags for several types of lists, extended quotes, character formatting and other items, all described below.

Lists

HTML supports unnumbered, numbered, and descriptive lists. For list items, no paragraph separator is required. The tags for the items in the list terminate each list item.

Unnumbered Lists

  1. Start with an opening list <ul> tag.
  2. Enter the <li> tag followed by the individual item. (Remember that no closing tag is needed.)
  3. End with a closing list </ul> tag.
Below an example two-item list:

  <UL>
  <LI> apples
  <LI> bananas
  </UL>
The output is:

Note that different viewers display an unordered list differently. A viewer might use bullets, filled circles, or dashes to show the items.

Numbered Lists

A numbered list (also called an ordered list, from where the abbreviation comes) uses the <ol> directive to start a list rather than the <ul> directive. The items are tagged using the same <li> tag as for a bulleted list. For example:

  <OL>
  <LI> oranges
  <LI> peaches
  <LI> grapes
  </OL>
The list looks like this online:

  1. oranges
  2. peaches
  3. grapes

Descriptive Lists

A description list usually consists of alternating a description title (abbreviated as dt) and a description description (abbreviated as dd). The description generally starts on a new line, because the viewer allows the full line width for the contents of the dt field.

Below is an example description list as included in your source file:

  <DL>
  <DT> National Center for Supercomputing Applications
  <DD> NCSA is located on the campus of the University 
  of Illinois at Urbana-Champaign. NCSA is a one of 
  four member institutions in the National Metacenter for 
  Computational Science and Engineering.
  <DT> Cornell Theory Center
  <DD> CTC is located on the campus of Cornell 
  University in Ithaca, New York. CTC is another member 
  of the National Metacenter for  Computational Science 
  and Engineering.
  </DL>
The output looks like this:

National Center for Supercomputing Applications
NCSA is located on the campus of the University of Illinois at Urbana-Champaign. NCSA is a one of four member institutions in the National Metacenter for Computational Science and Engineering.
Cornell Theory Center
CTC is located