[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
There are various advanced ways to configure LinkController. These are mostly not needed for simple checking of a small collection of web pages. For larger sites and special situations however, they may well make life much easier.
3.1 Advanced Infostructure Configuration Advanced control of checking 3.2 Authorisation Configuration Checking pages which require basic authentication. 3.3 Configuring CGI Programs Setting up LinkController's web interface
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Using more advanced configuration it is possible to skip over certain resources when we are doing link extraction and to ignore some of the links. You may want to skip over this section initially and come back to it only when you find that there are links or pages being checked that you would rather avoid.
For this section, we assume that you already know how to make basic Perl code. If not, then please read through the Perl manual pages `perl', `perlsyn' and `perldata'. You may find that the examples given below are sufficient to get you started.
In order to get extract-links
to extract links using an
advanced infostructure, you must use the advanced
keyword. In the
infostructure file. Infostructures not listed there will be ignored,
but won't cause any harm.
Advanced configuration is in the `.link-controller.pl'
configuration file by making definitions into the %::infostrucs
hash. These look like the following
$::infostrucs{http://www.mypages.org/} = { mode => "directory"; file_base => "/home/myself/www", prune_re => "^(/home/myself/www/statistics)" #ignore referrals . "|(cgi-bin)", #do CGIs separately resource_exclude_re => "\.secret$", #secrets shouldn't stay secret link_exclude_re => "^http://([a-z]+\.)+example\.com", }; $::infostrucs{http://www.mypages.org/cgi-bin/} = { mode => "www"; resource_exclude_re => "query", #query space is infinite!! }; |
There are a number of keywordss that can be used.
N.B. the exclude and include regular expressions can be used together. For a match, the include regular expression must match and the exclude must not match. In other words excludes override includes.
In order for the infostructure to be used by extract-links
an
entry must still be made in the `infostrucs' file. For this use the
advanced
keyword. The second argument is a URL used to look up
the definition in the $::infostrucs hash.
advanced http://www.mypages.org/ advanced http://www.mypages.org/cgi-bin/ |
The URL used here must match exactly the one used in the hash. It is important to note that `directory' and `www' definitions in the `infostrucs' file will override any advanced configuration given.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
One problem when checking links, especially within an intranet situation is that some pages can be protected with basic authentication. In order to extract links from those pages or to simply know that they are there, we have to get through that authentication. By using the advanced Authorisation Configuration we can give LinkController authority to access these pages and allow link checking to work as normal.
Using this method to allow LinkController to work in an environment with authentication is inherently a security issue since authentication tokens must be stored, effectively in plaintext, in files. This risk may, however, not be much higher than the one that you currently accept, so this can be useful |
We can store the authentication tokens simply in the %::credentials hash which we can create in the `.link-controller.pl' configuration file. The keys in the hash are the exact realm string which will be sent by the web server. Each value of this hash is a hash with a pair of keys. The `credentials' key should be associated to the authentication token. The `uri_re' key should be a regular expression which matches the web pages you want to visit. For security reasons it shouldn't match any others.
$::credentials = { my_realm => { uri_re => "https://myhost.example.com", credential => "my_secret" } } ); |
As a sanity check, every `uri_re' will be tried on `http://3133t3hax0rs.rhere.com' and `http://3133t3hax0rs.rhere.com/secretstuff/www.goodplace.com/'. If the expression matches then the credentials will be ignored. If you know enough to do this safely then you should definitely know how to get past this check. The owners of the domain `3133t3hax0rs.rhere.com' will just have to hack the code..
For more discussion about the security risks and how to mitigate them see the file `authorisation.pod' included with the LinkController distribution. If you didn't understand the security risk from the above description then probably you should consider avoiding using this mechanism.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The CGI programs use the same configuration variables as the other
programs, however, to avoid any confusion and related security problems,
a perl script should be written which has the configuration variables
hard wired in then runs the appropriate CGI program.
configure-link-cgi
is a program designed to set up such a
script.
FIXME: this section needs to be rewritten.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |