[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This chapter covers in reasonable detail how to use each of the programs in LinkController.
4.1 Extracting Links Getting link information from WWW pages. 4.2 Testing Links How to run the link testing program. 4.3 Reporting Problems Getting information the state of your links. 4.4 Email Reporting of Newly Broken Links Automatic reporting of newly broken links. 4.5 Examining Individual Files Checking individual HTML files. 4.6 Repairing Links Replacing old URLs with new ones. 4.7 Making Suggestions Making suggestions for other users. 4.8 CGI Interface The LinkController web interface (primitive).
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This section is written assuming that you are using a standard HTML infostructure in a directory or on the World Wide Web
The first part of using link controller is to extract the links. When doing this, a pair of index files is built which list which URLs happen on which pages along with a file listing all of the URLs in the infostructure.
FIXME: compare and contrast multi-user configuration with single user
The first stage of the process is done by extract-links
(2).
There are two modes for extract links directory
and www
.
The key difference between them is that the latter actually downloads
from a server so it is less efficient but will work in more
circumstances and is more likely to represent your site as seen by
users. This is assuming that all of your WWW pages are interconnected
so it can find them.
FIXME : need to describe modes of operation of extract link
extract-links
creates three files. The first two files
(`*.cdb') are the index files for your infostructure and are
located wherever you have configured them to by default they are called
`link_on_page.cdb', `page_has_link.cdb'. The third file is
the database file `links.db'. extract-links
can also
optionally create a text file which lists all of the URLs in the
infostructure, one per line.
There are a number of other ways of using extract-links
and
it has many options. See section F.3 Invoking extract-links, for more information
about using extract links.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
If you are using someone else's link information then you may be able to
skip this part and go straight on to the next one on generating reports.
If not then the next stage is to test your links using test-link
.
Testing links takes a long time. Reporting of broken links will not begin until after several days. This is a deliberate feature of LinkController. Most problems that will be found in a well maintained web page will be temporary configuration or system problems. By wainting to report problems we give people responsible for the other end of the problem link a chance to repair their resources. Once we have made this decision, we may as well check slowly and in a way which will reduce the amount of network bandwidth LinkController uses at a given time and so its impact on other people's Internet usage.
The key program which you want to use is test-link
. I run
this from a shell script which directs its output to a log file
FIXME actually I now just use a cron job.
#!/bin/sh #this is just a little sample script of how I run the program. LOGDIR=$HOME/log test-link >> \ $LOGDIR/runlog-`/bin/date +%Y-%m-%d`.log 2>&1 #assumes the use of a gnu style date command which can print #out full dates. |
And I run this shell script from my `crontab' with a command like this
42 02 * * * /..directories./run-daily-test.sh |
The string /..directories./
should be replaced with the directory
where you have the script. Remember to make the script executable.
This will now run until completion each night. However, you should make sure that it does actually finish. If you have too many links to check in the given time, then you can end up with a backlog and the system will take a long time to stop. To avoid this, either make testing less frequent or make checking run faster. This will have to be done by editing the program itself at present.
The test-link
program has a number of options. These control
the limits on checking and the speed of checking. See section F.3 Invoking extract-links, for more information on these.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The easiest way to find out which links are broken is to use the command line interface. The simplest report you can generate is just a list of all the known broken links. Do this like so:
link-report |
On the system I'm testing on right now, this gives:
broken:- file://ftp.ncsa.uiuc.edu/Web/httpd/Unix/ncsa_httpd/cgi http://www.ippt.gov.pl/docs-1.4/cgi/examples.html broken:- file://ftp.ncsa.uiuc.edu/Web/httpd/Unix/ncsa_httpd/curr ent/httpd_1.4_irix5.2.Z http://www.ippt.gov.pl/docs-1.4/setup/PreExec.html broken:- file://ftp.ncsa.uiuc.edu/Web/httpd/Unix/ncsa_httpd/curr ent/httpd_1.4_linux.Z http://www.ippt.gov.pl/docs-1.4/setup/PreExec.html broken:- file://ftp.ncsa.uiuc.edu/Web/httpd/Unix/ncsa_httpd/curr ent/httpd_1.4_osf3.0.Z http://www.ippt.gov.pl/docs-1.4/setup/PreExec.html broken:- file://ftp.ncsa.uiuc.edu/Web/httpd/Unix/ncsa_httpd/curr ent/httpd_1.4_solaris2.4.Z http://www.ippt.gov.pl/docs-1.4/setup/PreExec.html broken:- file://ftp.ncsa.uiuc.edu/Web/httpd/Unix/ncsa_httpd/curr ent/httpd_1.4_solaris2.4.tar.Z http://www.ippt.gov.pl/docs-1.4/setup/PreCompiled.html Sorry, couldn't find info for url file://ftp.ncsa.uiuc.edu/Web/httpd/U nix/ncsa_httpd/current/httpd_1.4_source.tar.Z please remember to check you have put it in full format broken:- file://ftp.ncsa.uiuc.edu/Web/httpd/Unix/ncsa_httpd/docu ments/usage.ps http://www.ippt.gov.pl/docs-1.4/postscript-docs/Overview.html ..etc... |
Which just tells you which links are broken. We also know which page they are broken on and can go and look at that on the World Wide Web or directly as a file on the server.
There are many different options which control the output of
link-report
. These include options which select which kinds
of problems to report, options which select which pages to report from
and options which allow other output formats such as HTML.
See section F.1 Invoking link-report, for more information about these.
For more advanced reporting and editing of documents with broken links you may want to use the Emacs interface (see section 6. The Emacs Interface).
4.4 Email Reporting of Newly Broken Links Automatic notification of broken links.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
It's possible to arrange automatic reporting by email of links which
have become newly broken. This is done by getting test-link
to make a list of links that become broken using the `$::link_stat_log'
variable (see section 2.3 LinkController Configuration Variables) and calling link-report
to
report on those links.
Typically, you may don't want to have a report every time that
test-link
runs, but probably once a day instead. In this
case, run a script like the following from your crontab.
#!/bin/sh STAT_LOG=$HOME/link-data/stat-log WORK=$STAT_LOG.work EMAIL=me@example.com mv $STAT_LOG $WORK if [ -s $WORK ] then link-report --broken --url-file=$STAT_LOG | mail -s "link-report for `date`" $EMAIL fi |
Every time that this script is run, it will rename the status change log file and then mail a report with all of the new broken links to the specified email address.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When you have just written an HTML page, you often want to check it
before you put it up for use. You can do this immediately using the
check-page
program. Simply run something like
check-page filename.html |
And it will list all of the links that it is unsure about along with the line number the problem occurred on. This program works particularly well when you editing with Emacs (see section The Emacs Interface).
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The program responsible for repairing links is fix-link
. It
simply accepts two URLs and changes all of the occurrences of the first
link in your documents into the second link. It assumes that you have
permission to edit all of the problem files and that there is a
replacement link. For example
fix-link http://www.ed.ac.uk/~mikedlr/climbing/ \ http://www.tardis.ed.ac.uk/~mikedlr/climbing/ |
Typed at the shell prompt would have updated the location of my Climbing pages when they moved some while ago and
fix-link http://www.tardis.ed.ac.uk/~mikedlr/climbing/ \ http://scotclmb.org.uk/ fix-link http://www.tardis.ed.ac.uk/climb/ \ http://scotclmb.org.uk/ |
Will change them to the very latest location. More information about
fix-link
can be found in See section F.4 Invoking fix-link.
At present, there's no facility for automatically updating the databases
when you do this. Instead, you have to run extract-links
after some time so that new links are noticed. In practice this doesn't
matter because you shouldn't be creating new pages with broken links and
can check that you don't with check-page
. A later version of
LinkController will may change this.
The other way to fix links is to edit the files by hand. This is the only solution where a link has disappeared forever and so text changes have to be made to the web site. This can be made more convenient by using the `link-report-dired' emacs module included in the distribution. This is covered elsewhere in this manual (see section 6. The Emacs Interface).
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A link in the database can have suggestions associated with it. These
are normally alternative URLs which somebody or something has decided
would make a good replacement for the URL of the Link. Humans can add
to the database with the suggest
program. For example use:
suggest file://ftp.ncsa.uiuc.edu/Web/httpd/Unix/ncsa_httpd/current/htt pd_1.4_linux.Z \ http://delete.me.org/ Link suggestion accepted. Thank you |
If you try the same thing again you get
suggest file://ftp.ncsa.uiuc.edu/Web/httpd/Unix/ncsa_httpd/current/htt pd_1.4_linux.Z \ http://delete.me.org/ Already knew about that suggestion. Thanks though. |
These suggestions will make it easier for others to repair links, especially if they are using the CGI interface.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The CGI interface is not fully developed and has a number of issues
related to security to be considered. I have however used it and shown
that it can work, so if you want to you could try the same. The two
programs fix-link.cgi
and link-report.cgi
replace the
normal ones fix-link
and link-report
. They should be
interfaced through an HTML page which feeds the needed information to
link-report.cgi
.
The main security question is how to do authentication of the user. This will have to be set up using the features of the web server. You should not leave these programs available for non-authenticated users since that would give them the ability to edit your web pages directly and probably do worse.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |