[ Table of Contents ] [ Previous Chapter ] [ Next Chapter ]



ht://Dig

The version of ht://Dig included in iTools has been extended with a CGI interface that supports the administrative tasks of creating and maintaining searchable databases in a fully integrated, multiple virtual host iTools package.

 

ht://Dig is a very customizable utility. The iTools indexing CGI is designed as an easy to use front-end to htdig. It provides a quick way to get a basic set of htdig's search capabilities working for each virtual host in a iTools system. To further exploit the power of htdig, refer to the ht://Dig documentation (http://host.domain.com/htdig/doc/index.html). Note that the htdig configuration files created by the indexing CGI are stored in the /htdig/conf/<virtualhostname>.conf file for each virtual host.

 

You will probably want to customize the HTML search page and the results page from the defaults that are provided. Look in the ht://Dig documentation (http://host.domain.com/htdig/doc/index.html) for a description of the files that it uses for each page. Also look in the WebServer/tenon/apache/conf/itools.conf file for the extra htdig configuration lines that were added by the iTools Search Engine Installer. You might want to change these directives if, for example, you wanted to change the URLs for users to access the search engine for a particular virtual host or for your entire Web Server.

 

Once a searchable database has been built, it may be necessary to periodically rebuild the database to include new or changed pages that have been added to a site. To facilitate periodic updates, the indexing CGI can also be run as a CRON script. For more information, look in the /usr/local/apache/htdig/conf/crontab.tmpl file for some example crontab entries for invoking the indexing CGI.

 

The indexing process can create large database files. Almost every word that is retrieved from examining a document is stored into a sorted database file for later searching. This means that a lot of disk space may be required to successfully complete an indexing operation. A large site might require as much as 300 Megabytes of available disk space!

 

Build the iTools Search Engine Index File

The iTools Search Engine Index files are built and maintained using a special indexing CGI. This CGI is intended only for iTools Administrators and it is protected within the iTools Admin realm (username and password are required). Use the following URL to open the indexing CGI.

 

Substitute your iTools servers name into: http://hostname/index.cgi

 

The indexing CGI displays a form with a fields for entering the URLs to be indexed, excluded and limited and an optional email address.

 

Default Indexing Options

 

The indexing form contains fields for specifying which URLs should be indexed. The Start URLs are the starting point for the indexing engine. The Exclude URLs are URLs that should not be indexed. The Limit URLs contains sets of patterns that the URLs must match.

 

The default Start URLs is a single URL matching the virtual host name used in the request. This default instructs the indexing process to visit all of the documents on this virtual host that are reachable (following any numbers of links) from the home page. The default Limit URLs specifies a set that exactly matches the set of Start URLs. In most cases, this is all that is needed to build a complete index of an entire virtual host. Additional URLs can be added to these lists.

 

The form also provides a field for an email address. It an email address is provided, the results of the indexing process will be emailed to that address.

 

Additional options may be displayed by clicking on the Options button. In this case, the form is displayed again with the default options shown (below). These defaults can then be modified. (The default options are used if the form is submitted without displaying the options.) The default settings are sufficient to create a search engine index (or database) file for the specified URLs.

 

All Indexing Options

 

To begin the indexing process, click on the Run! button. The CGI will start a batch indexing process (if the batch options is specified) that continues to run after the CGI has completed. A link to a file which will contain the detailed results of the indexing process is provided. Note that it may take some time for the batch indexing process to complete. (For example, a default iTools installation takes about 10 minutes.) If

the results are referenced before the indexing process is complete, only the completed parts of the indexing process will be shown. Providing an email address is the best way to be notified when the entire indexing operation is complete.

 

To continually monitor the progress of the indexing process, uncheck the batch option before clicking on the Run! button. In this case, the output from the indexing process is continually displayed in the CGIs output and the CGI does not complete until the indexing process completes.

 

Test the iTools Search Engine Database

The best way to test the searchable database is to perform some actual searches. Use the following URL to search for a particular topic on the indexed site:

 

Substitute your iTools servers name into

 

http:/host.domain.com/search.html

 

Multiple Virtual Hosts

 

The iTools Search Engine supports indexing and searching for multiple virtual hosts. By default, searchable databases are built on a per virtual host basis. For example, to build the index files for virtual hosts www.domain1.com and www.domain2.com, use the following URLs:

 

http://www.domain1.com/index.cgi

http://www.domain2.com/index.cgi

 

To search the databases for these virtual hosts, use the following corresponding URLs:

 

http://www.domain1.com/search.html

http://www.domain2.com/search.html

 

 



[ Table of Contents ] [ Previous Chapter ] [ Next Chapter ]



Copyright 1999. Tenon Intersystems. All Rights Reserved.