[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4. Using LinkController to Check Links

This chapter covers in reasonable detail how to use each of the programs in LinkController.

4.1 Extracting Links  Getting link information from WWW pages.
4.2 Testing Links  How to run the link testing program.
4.3 Reporting Problems  Getting information the state of your links.
4.4 Email Reporting of Newly Broken Links  Automatic reporting of newly broken links.
4.5 Examining Individual Files  Checking individual HTML files.
4.6 Repairing Links  Replacing old URLs with new ones.
4.7 Making Suggestions  Making suggestions for other users.
4.8 CGI Interface  The LinkController web interface (primitive).

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.1 Extracting Links

This section is written assuming that you are using a standard HTML infostructure in a directory or on the World Wide Web

The first part of using link controller is to extract the links. When doing this, a pair of index files is built which list which URLs happen on which pages along with a file listing all of the URLs in the infostructure.

FIXME: compare and contrast multi-user configuration with single user

The first stage of the process is done by extract-links (2).

There are two modes for extract links directory and www. The key difference between them is that the latter actually downloads from a server so it is less efficient but will work in more circumstances and is more likely to represent your site as seen by users. This is assuming that all of your WWW pages are interconnected so it can find them.

FIXME : need to describe modes of operation of extract link

extract-links creates three files. The first two files (`*.cdb') are the index files for your infostructure and are located wherever you have configured them to by default they are called `link_on_page.cdb', `page_has_link.cdb'. The third file is the database file `links.db'. extract-links can also optionally create a text file which lists all of the URLs in the infostructure, one per line.

There are a number of other ways of using extract-links and it has many options. See section F.3 Invoking extract-links, for more information about using extract links.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.2 Testing Links

If you are using someone else's link information then you may be able to skip this part and go straight on to the next one on generating reports. If not then the next stage is to test your links using test-link.

Testing links takes a long time. Reporting of broken links will not begin until after several days. This is a deliberate feature of LinkController. Most problems that will be found in a well maintained web page will be temporary configuration or system problems. By wainting to report problems we give people responsible for the other end of the problem link a chance to repair their resources. Once we have made this decision, we may as well check slowly and in a way which will reduce the amount of network bandwidth LinkController uses at a given time and so its impact on other people's Internet usage.

The key program which you want to use is test-link. I run this from a shell script which directs its output to a log file

FIXME actually I now just use a cron job.

#this is just a little sample script of how I run the program.

test-link >> \
        $LOGDIR/runlog-`/bin/date +%Y-%m-%d`.log 2>&1 
#assumes the use of a gnu style date command which can print 
#out full dates.

And I run this shell script from my `crontab' with a command like this

42 02 * * *     /..directories./run-daily-test.sh

The string /..directories./ should be replaced with the directory where you have the script. Remember to make the script executable.

This will now run until completion each night. However, you should make sure that it does actually finish. If you have too many links to check in the given time, then you can end up with a backlog and the system will take a long time to stop. To avoid this, either make testing less frequent or make checking run faster. This will have to be done by editing the program itself at present.

The test-link program has a number of options. These control the limits on checking and the speed of checking. See section F.3 Invoking extract-links, for more information on these.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.3 Reporting Problems

The easiest way to find out which links are broken is to use the command line interface. The simplest report you can generate is just a list of all the known broken links. Do this like so:


On the system I'm testing on right now, this gives:

broken:-       file://ftp.ncsa.uiuc.edu/Web/httpd/Unix/ncsa_httpd/cgi
broken:-       file://ftp.ncsa.uiuc.edu/Web/httpd/Unix/ncsa_httpd/curr
broken:-       file://ftp.ncsa.uiuc.edu/Web/httpd/Unix/ncsa_httpd/curr
broken:-       file://ftp.ncsa.uiuc.edu/Web/httpd/Unix/ncsa_httpd/curr
broken:-       file://ftp.ncsa.uiuc.edu/Web/httpd/Unix/ncsa_httpd/curr
broken:-       file://ftp.ncsa.uiuc.edu/Web/httpd/Unix/ncsa_httpd/curr
Sorry, couldn't find info for url file://ftp.ncsa.uiuc.edu/Web/httpd/U
please remember to check you have put it in full format
broken:-       file://ftp.ncsa.uiuc.edu/Web/httpd/Unix/ncsa_httpd/docu

Which just tells you which links are broken. We also know which page they are broken on and can go and look at that on the World Wide Web or directly as a file on the server.

There are many different options which control the output of link-report. These include options which select which kinds of problems to report, options which select which pages to report from and options which allow other output formats such as HTML. See section F.1 Invoking link-report, for more information about these.

For more advanced reporting and editing of documents with broken links you may want to use the Emacs interface (see section 6. The Emacs Interface).

4.4 Email Reporting of Newly Broken Links  Automatic notification of broken links.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.4 Email Reporting of Newly Broken Links

It's possible to arrange automatic reporting by email of links which have become newly broken. This is done by getting test-link to make a list of links that become broken using the `$::link_stat_log' variable (see section 2.3 LinkController Configuration Variables) and calling link-report to report on those links.

Typically, you may don't want to have a report every time that test-link runs, but probably once a day instead. In this case, run a script like the following from your crontab.

if [ -s $WORK ]
   link-report --broken --url-file=$STAT_LOG | 
      mail -s "link-report for `date`" $EMAIL

Every time that this script is run, it will rename the status change log file and then mail a report with all of the new broken links to the specified email address.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.5 Examining Individual Files

When you have just written an HTML page, you often want to check it before you put it up for use. You can do this immediately using the check-page program. Simply run something like

check-page filename.html

And it will list all of the links that it is unsure about along with the line number the problem occurred on. This program works particularly well when you editing with Emacs (see section The Emacs Interface).

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.6 Repairing Links

The program responsible for repairing links is fix-link. It simply accepts two URLs and changes all of the occurrences of the first link in your documents into the second link. It assumes that you have permission to edit all of the problem files and that there is a replacement link. For example

fix-link http://www.ed.ac.uk/~mikedlr/climbing/ \

Typed at the shell prompt would have updated the location of my Climbing pages when they moved some while ago and

fix-link http://www.tardis.ed.ac.uk/~mikedlr/climbing/ \
fix-link http://www.tardis.ed.ac.uk/climb/ \

Will change them to the very latest location. More information about fix-link can be found in See section F.4 Invoking fix-link.

At present, there's no facility for automatically updating the databases when you do this. Instead, you have to run extract-links after some time so that new links are noticed. In practice this doesn't matter because you shouldn't be creating new pages with broken links and can check that you don't with check-page. A later version of LinkController will may change this.

The other way to fix links is to edit the files by hand. This is the only solution where a link has disappeared forever and so text changes have to be made to the web site. This can be made more convenient by using the `link-report-dired' emacs module included in the distribution. This is covered elsewhere in this manual (see section 6. The Emacs Interface).

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.7 Making Suggestions

A link in the database can have suggestions associated with it. These are normally alternative URLs which somebody or something has decided would make a good replacement for the URL of the Link. Humans can add to the database with the suggest program. For example use:

suggest file://ftp.ncsa.uiuc.edu/Web/httpd/Unix/ncsa_httpd/current/htt
pd_1.4_linux.Z \
Link suggestion accepted.  Thank you

If you try the same thing again you get

suggest file://ftp.ncsa.uiuc.edu/Web/httpd/Unix/ncsa_httpd/current/htt
pd_1.4_linux.Z \
Already knew about that suggestion.  Thanks though.

These suggestions will make it easier for others to repair links, especially if they are using the CGI interface.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.8 CGI Interface

The CGI interface is not fully developed and has a number of issues related to security to be considered. I have however used it and shown that it can work, so if you want to you could try the same. The two programs fix-link.cgi and link-report.cgi replace the normal ones fix-link and link-report. They should be interfaced through an HTML page which feeds the needed information to link-report.cgi.

The main security question is how to do authentication of the user. This will have to be set up using the features of the web server. You should not leave these programs available for non-authenticated users since that would give them the ability to edit your web pages directly and probably do worse.

[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Michael De La Rue on February, 3 2002 using texi2html