What Is a "404" Error?
When a Web site visitor requests a nonexistent URL from a Web server, the server sends the visitor an error page. This event is recorded as a 404 Not Found error in the Web server log. Encountering an error page is a frustrating experience for a Web site visitor, and studies have indicated this is a leading reason why people leave Web sites.
There are several possible causes for a 404 Not Found error:
- Incorrect or outdated link on one or more of your pages
- Incorrect or outdated link to your site from another site
- Search engine index contains an outdated page
- Outdated user bookmark
- Visitor made error when manually entering a URL
Identify and Fix Incorrect and Outdated Links on Your Pages
In addition to reviewing 404 error reports for your site, you can
identify broken links using a link checker or validator. Many
tools are available for checking Web links. Products range from
commercial software to freeware. Some products can be installed
on your own computer system so that you can check links before you
put a page on the Web. Others operate as online services and can
only check links on pages that are accessible from the Internet.
Free online link checking services include:
Formats vary, but in most cases you will receive a list of links
for each page that was checked. The report will show which links
produced an error. Some reports only list the bad links; some
include additional information about errors they find.
Link checkers may not validate links within scripts or non-HTTP
links such as FTP or mailto links. Non-HTTP links do not generate
404 Not Found errors, but are mentioned here because the browser
error messages and mail nondelivery messages these types of links
generate when they do not work are just as frustrating to site
visitors. Because broken non-HTTP links are more difficult to
identify, their maintenance requires special attention in site
management planning.
Identify Other Web Pages with Old or Broken Links to Your Site
If your server log contains 404 Not Found errors that don't seem
to be generated from your own Web pages, they may be the result
of links on other Web sites. If you are lucky enough to have
access to referrer page information in your Web server logs or
reports, you can use this information to identify sites with
links which have generated 404 Not Found errors on your site.
You can also use a search engine to identify external pages which
contain these incorrect or outdated links and look for a contact
name or address to request that they be corrected.
Search Engine Format
Alta Vista
(www.altavista.com) link:www.mysite.com/remainder-of-url
Lycos (www.lycos.com) link:www.mysite.com/remainder-of-url
Google (www.google.com) link:www.mysite.com/remainder-of-url
1. In the "Look for:" field, choose
Hotbot (www.hotbot.com) links to this URL
2. In the Search field, enter the full
URL, including "http://"
Search for pages you identified from your 404 error listing. The
search engine will return a list of pages from its index that
contain the specified link, if any exist. You may be surprised
how many there are. Note that search engines do not index every
page on the Web, so the list may be incomplete even if you use
more than one search engine.
When you identify a Web page with a bad link to your site, either
from referrer page information or a search, visit the page and
look for the link to your site. Check to see whether it needs to
be updated or corrected. If so, look for a contact to whom you
can provide the correct link information.
When you identify a Web page with a bad link to your site, either
from referrer page information or a search, visit the page and
look for the link to your site. Check to see whether it needs to
be updated or corrected. If so, look for a contact to whom you
can provide the correct link information.
Most search engines don't try to index the entire Web anymore, nor
do they index pages as frequently. As a result, when you move or
delete a page, a considerable amount of time may elapse before the
search engine corrects its index. In the meantime, it may keep
referring people to that page. When you move or delete a page,
send the page's old URL to major search engines.
Sometimes you may need to publish Web pages that are expected to
have a very short life. For these ephemeral pages, it may be
desirable to avoid search engine indexing altogether. Meta robots
tags are HTML tags which can be included in a Web page header to
instruct search engine robots not to index a Web page by using the
noindex directive. This tag can additionally ask search engines
not to follow any links from the page by including a nofollow
directive as well. Here is an example of a header:
<head>
<title>My Ephemeral Page</title>
<meta name="robots" content="noindex,nofollow">
</head>
Practice Good Web Site Ecology
The obvious way to prevent your URLs from becoming outdated within
your own Web site, in links from other Web sites, and in your
visitors' bookmarks, is to never change them. Unfortunately, this
is more easily said than done.
Even if your site is not a business site, register a domain name
for it. If you later decide you want to change to another domain
name, it's OK as long as you continue to support your previous
domain name. If you create your site using an ISP's domain name,
and later wish to change ISPs, it may be impossible to direct
visitors from your old site location to your new one.
Careful planning of your information space can help reduce the
number of URL changes you need to make. Consider the life
expectency of your information in your planning. When information
becomes out of date, will you replace it with new information at
the same URL? Will you keep it as archival information? Will you
replace it with a summary of the old information and a link to
newer information? Think of ways to reduce, reuse, and recycle to
create URLs that will live forever even if some of the
information they represent changes.
When planning ahead doesn't work, redirects can be a useful
technique to gently guide your visitors to the information they
want in its new location. Some browsers will even update their
bookmark database to use the new URL in the future if the user
had bookmarked the old URL.
There are two types of redirects, client side redirects and server
side redirects.
- Client side redirects provide a simple way to transport a
visitor to a different page. This method requires replacing each
page which has been moved or deleted with its own redirect page.
Redirect pages include meta refresh tags in the header section
of the document. Because some search engines penalize sites
which use refresh tags, it's a good idea to use them together
with meta noindex tags.
The example below shows a header that would redirect users to
www.mysite.com/otherdirectory.otherpage.html:
<head>
<title>My Redirect Page</title>
<meta name="robots content="noindex">
<meta http-equiv="refresh" content="15;
url=<a href="http://www.mysite.com/otherdirectory.otherpage.html">"
<http://www.mysite.com/otherdirectory.otherpage.html"></a>
</head>
Client side redirects are processed by the user's browser. The
"15" in the meta refresh tag in the example instructs the
browser to wait 15 seconds before fetching the new page. It is
possible to set this value to 0, but doing so makes it difficult
for visitors to return to previously visited pages using their
Back buttons, creating a "mouse trap." For this reason, and
because client side redirects are not supported by some older
browsers, the body of your redirect page should explain that the
requested page has been superceded or moved and provide a link to
the new page (the same one used in the refresh tag), including
its URL. Redirect pages represent your site just as much as your
content pages do. They should be friendly and helpful, and they
should conform with the rest of your site design.
- Server side redirects instruct your Web server to give visitors
a different page when they request a non-existent URL. They are
usually implemented at the directory level rather than on a page
by page basis as client side redirects are. Server side
redirects are processed by the Web server, not visitors'
browsers. They can be implemented in different ways on different
servers. For example they may require placing information in the
configuration file, or you may need to create a file with a
particular name in the directory from which you wish to redirect
visitors. You will need to ask the folks who maintain your
server what the procedure is for your site. When possible,
redirect users to the information they were seeking in the
original directory rather than making them look for it from your
home page or via a search.
Make Your URLs Error-resistant
The best URLs are short and simple. When this is not possible,
you can still reduce the chances of typos and other URL problems
by avoiding upper-case letters and special characters in your
URLs.
Many Web servers treat the URLs "www.mysite.com/myfile.html" and
"www.mysite.com/MyFile.html" and "www.mysite.com/MYFILE.HTML" as
different documents. Using all lower-case characters for directory
and file names reduces capitalization errors when people type URLs
by hand. Similarly, URLs which contain underscores can be
problematic because underscores can look like spaces when viewed
online as links.
Other charcters should be avoided in file and directory names
because they may be interpreted in a special way by the server or
the browser and produce different results in a URL than you
intended. These include colons (:), forward slashes (/),
tildes (~), percent signs (%), at symbols (@), question marks (?),
plus signs (+), equal signs (=), ampersands (&), carets (^), curly
braces ({}), square brackets ([]) and commas (,).
Give Your Visitors What They Came For
There are a number of techniques you can use to reduce 404 not
found errors and minimize the frustration that can lose visitors.
Some may be more helpful for your site than others. By using these
techniques when you organize, create, and maintain your Web pages
you can provide a better experience for the users of your site.
Marsha Glassner spent about five years as a webmaster at a federal
agency in San Francisco. She has also done "tons" of user training
and support which has had a significant effect on her Web
philosophy. Marsha can be reached at mdg@postmark.net
See also:
http://webreference.com/authoring/languages/html/validation.html
Reprinted from internet.com's WebReference.