TCP/IP Tutorial and Technical Overview

Table of Contents Veronica
TCP/IP Tutorial and Technical Overview

6.2 World Wide Web

The World Wide Web is a global hypertext system which was initially developed in 1989 by Tim Berners Lee at the European Laboratory for Particle Physics, CERN in Swizerland. In 1993 the Web started to grow rapidly which was mainly due to the NCSA (National Center for Supercomputing Applications) developing a Web browser program called Mosaic, an X Windows-based application. This application provided the first graphical user interface to the Web and made browsing more convenient.

Today there are Web browsers and servers available for nearly all platforms. You can get them either from an FTP site for free or buy a licensed copy. The rapid growth in popularity of the Web is due to the flexible way people can navigate through world-wide resources in the Internet and retrieve them. To get an idea of the growth of the Web, here are some statistics:

June 1993 - only 130 Web sites available
December 1994 - over 11500 Web sites available

The number of Web servers is growing very rapidly (between 50 and 100 daily) and the traffic over port 80, which is the well known Web port, on the NSF backbone has a phenomenal rate of growth too.

There are already some companies doing business on the Web. You can look at prospectuses and product offerings and of course order products over the Web. Most of the multinational companies have a Web server in place to distribute product specific information, their portfolio or simply to get in contact with customers. IBM of course has a Web home page with a large number of interesting items. A page is just the Web term for a document and the home page is a starting point to a collection of documents. It is, if you will, the table of contents of a Web site. From there you can easily explore and search the whole Web. Please see http://www.ibm.com which is the IBM home page.

Presenting a document in hypertext has certain advantages for the user. For example, if you want more information about a particular subject mentioned, you can usually "just click on it" to read further details. Subjects with a link to another document can be easily identified through highlighting. In fact, documents can be and often are linked to other documents by completely different authors, much like footnoting, but you can get the referenced document or graphic instantly displayed. A document on the Web could include links to other documents residing on different Web sites. If you activate the link, mostly done by a mouse click, the other document is automatically retrieved from the corresponding server and displayed. This document could include links to other resources as well and so on.

The standard communication protocol between Web servers and clients is the Hypertext Transfer Protocol (HTTP) which is a draft Internet standard. The HTTP is a generic stateless object-oriented protocol. The IETF has set up a working group to improve the performance of HTTP. Web browsers can also use many other Internet protocols like FTP, Gopher, WAIS and NNTP (Network News Transfer Protocol) for example. So you don't need a particular client product to get access to all these other resources also available on the Net. How the Web browser can differentiate between all these different protocols and which protocols are supported is explained later in this section.

An HTTP transaction consists basically of:

Connection: The establishment of a connection by the client to the server. TCP/IP port 80 is the well-known port, but other non-reserved ports may be specified in the URL.
Request: The sending, by the client, of a request message to the server.
Response: The sending, by the server, of a response to the client.
Close: The closing of the connection by either or both parties.

For a more detailed description of HTTP please refer to the draft documents of the corresponding IETF working group.

The standard markup language for Web documents is HTML (Hypertext Markup Language) which is a draft Internet standard and is presently under construction by several IETF working groups. HTML is an SGML (Standard Generalized Markup Language) application. IBM's GML is very similar as you can see in the example below. If you want to create a Web document you have to use the HTML tags to build the logical structure of the document, for example headers, lists and paragraphs. There are some tags available to define links to other documents or to imbed a picture in your text.

<HTML>  <!-- Begin of document -->
 <HEAD>  <!-- A sample document -->
  <TITLE>This is a Sample</TITLE>
 </HEAD>  <!-- End of the heading section -->
 <BODY>  <!-- Begin of text body -->
  <H1>First Header</H1>
   <P>The first paragraph.
   <UL>  <!-- unordered list -->
    <LI>Item one
   </UL> <!-- End of list -->
 </BODY> <!-- End of text body -->
</HTML> <!-- End of document -->

If you who would like an introduction to HTML please refer to the following document: http://info.cern.ch/hypertext/WWW/MarkUp/MarkUp.html.

All documents, images, audio or video clips on the Web are called resources. To address and identify the access method for these resources the Web uses URLs (Uniform Resource Locators). URL is an Internet standards track protocol and can be found under RFC 1738. The global framework for building new addressing schemes to encode names and addresses of objects on the Internet is described in the informational RFC 1630. This RFC introduces the term URI (Universal Resource Identifiers) as a more theoretical model for building these schemes. URIs which refer to an object address (IP address and path information) mapped to an access method using an existing network protocol like HTTP or FTP for example are known as URLs. Therefore an URL is a specific form of a URI. In general, URLs are written as follows:

<scheme>:<scheme-specific-part>

An URL contains the name of the scheme being used (<scheme>) followed by a colon and then a string (the <scheme-specific-part>) whose interpretation depends on the scheme. The following schemes are covered by the RFC, and other schemes may follow in the future:

ftp - File Transfer protocol
http - HyperText Transfer Protocol
gopher - The Gopher protocol
mailto - Electronic mail address
news - USENET news
nntp - USENET news using NNTP access
telnet - Interactive sessions
wais - Wide Area Information Servers
file - Host-specific file names
prospero - Prospero Directory Service

While the syntax for the rest of the URL may vary depending on the particular scheme selected, URL schemes that involve the direct use of an IP-based protocol to a specified host on the Internet use a common syntax for the scheme-specific data:

//<user>:<password>@<host>:<port>/<url-path>

Some or all of the parts "<user>:<password>@", ":<password>", ":<port>", and "/<url-path>" may be excluded. The scheme specific data starts with a double slash "//" to indicate that it complies with the common Internet scheme syntax.

The "url-path" at the end of the scheme supplies the details of how the specified resource can be accessed. Note that the "/" between the host (or port) and the url-path is not part of the url-path.

According to the definition above the HTTP URL looks like this:

http://<host>:<port>/<path>?<searchpart>

Where:

host: The fully qualified domain name of a network host or a dotted decimal IP address (for example, www.ibm.com).
port: The port number to connect to. If this parameter is omitted in an HTTP URL, it defaults to 80.
path: The path specifies the HTTP selector, a route to an HTML document for example.
? searchpart: The searchpart is a query string which is indicated with a preceding question mark.

The URL of the RFC 1630 for example looks like this:

http://info.cern.ch/hypertext/WWW/Addressing/URL/URI_Overview.html

The syntax of all the other defined schemes like FTP and Gopher for example are explained in the RFC 1738.

There are three ways to access the Web:

Use a Web browser on your own machine
This is the best option but your corporate LAN must have access to the Internet. In most cases these networks have no direct Internet access, but are connected via a firewall to the Internet. In this case you have to specify either a SOCKS server or a proxy gateway where you are registered to get Internet access. Another way to get connected is the use of the SLIP protocol. With this you set up your own modem connection to an Internet access provider. See IBM NetSP Secured Network Gateway.
Use a browser on a machine to which you have TELNET access (not as good but also possible).
Access the Web by E-mail (not very attractive but still possible).

Web browsers are available for most platforms. To get a list of FTP sites providing Web browsers and other useful information please look at ftp://rtfm.mit.edu/pub/usenet/news.answers/www/faq and get the two files located in this subdirectory by anonymous FTP.

These files include the frequently asked questions for Web users. Also included are the host names for TELNET or E-mail access to the Web. Look for the host closest to you.

6.2.1 Implementations

The OS/2 Internet Connection shipped with the Bonus Pack of OS/2 Warp includes the IBM WebExplorer.

Internet Connection V3.0 for Windows contains WebExplorer Mosaic.

Table of Contents Firewalls