NAME

test-link - test links and update the link database


SYNOPSIS

 test-link [arguments]
  -V --version            Give version information for this program
  -h --help --usage       Describe usage of this program.
     --help-opt=OPTION    Give help information for a given option
  -v --verbose[=VERBOSITY] Give information about what the program is doing.  
                           Set value to control what information is given.
  --quite -q --silent     Program should generate no output except in case of
                          error.
     --config-file=FILENAME Load in an additional configuration file
  -u --user-address=STRING Email address for user running link testing.
  -H --halt-time=MINUTES  stop after given number of minutes
     --never-stop         keep running without stopping
     --no-robot           Don't follow robot rules.  Dangerous!!!
  -w --no-waitre=NETLOC-REGEX Home HOST regex: no robot rules.. (danger?)!!!
     --test-now           Test links now not when scheduled (testing only)
     --untested           Test all links which have not been tested.
     --sequential         Put links into schedule in order tested (for testing)
  -H --halt-time=MINUTES  stop after given number of minutes
  -L --latest-time=MINUTES  latest time from schedule to stop
  -m --max-links=INTEGER  Maximum number of links to test (-1=no limit)


DESCRIPTION

This program tests links and stores the information about what it found into the Link database.

Needs:-

  * link database
  * schedule database


CONFIGURATION

Configuration is done using the WWW::Link_Controller::ReadConf (3) module.

You may want to explicitly set the user name.


ROBOT BEHAVIOR

This program is designed to be a well behaved netizen.. That means that it will try not to put alot of load on a single site. However, the program also attempts to work efficeiently through all of the links it has to check.

In order to achieve these goals the test-link will wait for a delay period between checks to the same site, but it will try to re-order it's work so that it always has some link to check. It looks ahead up to 100 links.

Making this queue longer will probably not help with efficiency since an overload is probably a sign that you have many links from the same site. If that site is your own to check or you can get an arrangement with them then you could use a regular expression to allow faster checking.


SCHEDULING

Most of the scheduling is handled by Schedule::Softtime which provides an `I'll get round to you when I can be bothered' scheduler. We guarantee that we will never schedule a link earlier than min-time (defaults to a day) from now.

The suggested time is created by the link (see WWW::Link) for details. We then check that it's at least a certain amount (hard wired to be one day at present) into the future.

status log handling

During it's operation, test-link can write a log file (to a file given in the $::link_stat_log configuration variable). This can be used to alerts to the webmaster about newly broken links.


LOCKING

test-link uses a very simple application level lock to protect the links database. If you bypass this locking it could corrupt the database. Only other runs of test-link will follow this locking.

During a run you can run link-report, but there is in principle no guarantee that it works properly at all. However it shouldn't normally do any damage since it has read only access to the database.

Note that the lock is done on the links database filename.

Other programs such as build-schedule and link creation programs should not


BUGS

The locking used in the current design could be considered a bug..

There should be a mechanism for detecting that the computer is not connected to the network at all and aborting the run completely. This would avoid false positive broken links.

There is a problem with redirects. The second request has to wait for the robot rules to permit it after the first. We should allow a number of levels of redirects without waiting... Maybe this is fixed best with a parallel agent.


SEE ALSO

the verify-link-control manpage; the extract-links manpage; the build-schedule manpage the link-report manpage; the fix-link manpage; the link-report.cgi manpage; the fix-link.cgi manpage the suggest manpage; the link-report.cgi manpage; the configure-link-control manpage

The LinkController manual in the distribution in HTML, info, or postscript formats, included in the distribution.

http://scotclimb.org.uk/software/linkcont/ - the LinkController homepage.