next up previous
Next: URNs Up: UNIFORM RESOURCE IDENTIFIERS & Previous: Preamble

Problems with URLs

There are two major problems associated with the use of URLs as a naming scheme for on-line resources. The first of these relates to the transitory nature of the information encoded in the URL - URLs are essentially a recipe of instructions for accessing the information pointed to by the URL. For example, the URL

  http://ukoln.bath.ac.uk/roads/

indicates that the user's browser should use the HTTP[10] protocol to contact the World-Wide Web server running on the computer registered on the Internet as ukoln.bath.ac.uk, and request a copy of the resource known as /roads/.

This is a very fragile basis to work from when building a global information infrastructure - particularly so when it is taken into account that there is rarely any contact between the makers of citing links and those responsible for the resources which are being linked to. This often leads to the dangling links problem: gopher and World-Wide Web links which stop working because the resource they refer to is no longer reachable via the URLs which cite it.

Some typical causes of the dangling link problem are

The situation is not completely hopeless - tools such as www can be used by server administrators to monitor the accessibility of the resources their server has outgoing links to, and some servers can be configured to log the HTTP Referer: header. This information enables the server adminstrator to determine who has made links to their server, but only if the links are followed. It be used to generate a list of sites to inform in the event of the server being reconfigured, but still requires a degree of human intervention.

The second significant problem with the current URL technology is also related to the information content of the URL. Recall that this provides no more than a mechanism for retrieving a given instance of the resource. The URL does not provide a way of associating extra information with the resource - such as file format, author, size, language and character set. In many cases the only way to find these things out is to download the object and study it manually. This state of affairs is undesirable, since it often results in unnecessary network traffic and wasted time for the user.

Some protocols provide their own mechanisms for making this sort of meta-information available, for example:

Even where these features are available, they require the use of a URL to identify a particular instance of the resource, and so are vulnerable to the dangling link problem described above.

As it happens, many resources are actually very widely distributed, e.g.

Locating the preferred copy of a widely available resource can be problematic, often requiring knowledge about the nearest available source of directory information. For example, in order to search for a file using archie, one first has to select an archie server to interrogate.

Once the user has obtained a list of URLs, their selection criteria would be likely to include such factors as:

Clearly it is desirable that there be some way of helping the user to make a choice, or of automating the selection process based on a user profile. A common approach to this problem is to make the information available via a home page. Whilst this is easily capable of providing both pointers to multiple instances of a resource and meta-information, it too is subject to the dangling links problem.



next up previous
Next: URNs Up: UNIFORM RESOURCE IDENTIFIERS & Previous: Preamble



Martin Hamilton
Tue Jun 20 12:43:30 BST 1995