Perlfect Solutions
 

[Perlfect-search] Problems with http indexing.

Alex Collins perlfect-search@perlfect.com
Mon, 30 Sep 2002 16:20:42 +0100
Hi There

        Just tried to http index our shiny new web site. I have been
using perlfect for a while now, and upgraded to 3.30.
I'm on a Cobalt Raq server.

I get piles of output - but it only actually seems to index the
index.html page, and everything else just links back to the index.html
page.

Am i being stupid ?
I had a brief look in the archives but couldn't see anything.

TIA

Alex Collins.

The text output i get is:
[root search]# ./indexer.pl
[Mon Sep 30 16:17:17 2002] indexer.pl: Name "main::VERSION" used only
once: possible typo at ./indexer.pl line 68.
Using DB_File...
Checking for old temp files...
Building string of special characters...
Loading 'no index' regular expressions:
        - /home/sites/site1/web/cgi-bin/*
        - /home/sites/site1/web/test/*
        - /home/sites/site1/web/cgi/*
        - /home/sites/site1/web/tour/*
        - /home/sites/site1/web/robots.txt
        - /home/sites/site1/web/sitemap/*
        - /home/sites/site1/web/stats/*
        - /home/sites/site1/web/journals/project*
        - /home/sites/site1/web/site/*
Loading stopwords...389 stopwords loaded.
Starting crawler...
Note: I will not visit more than $HTTP_MAX_PAGES=10000 pages.
Fetched  'http://libweb.apu.ac.uk/', 12469 bytes
         1: http://libweb.apu.ac.uk/index.html
Ignoring 'http://libweb.apu.ac.uk/newcss.css': content-type 'text/css'
Ignoring 'http://libweb.apu.ac.uk/index.html': already visited
Ignoring 'http://www.apu.ac.uk': not below $HTTP_LIMIT_URL or non-http
protocol
Fetched  'http://libweb.apu.ac.uk/access/access.htm', 10872 bytes
         2: http://libweb.apu.ac.uk/index.html
Ignoring 'http://libweb.apu.ac.uk/newcss.css': content-type 'text/css'
Ignoring 'http://libweb.apu.ac.uk/index.html': already visited
Ignoring 'http://www.apu.ac.uk': not below $HTTP_LIMIT_URL or non-http
protocol
Fetched  'http://libweb.apu.ac.uk/access/access.htm', 10872 bytes
Ignoring 'http://libweb.apu.ac.uk/index.html': content identical to
'http://libweb.apu.ac.uk/index.html'
Fetched  'http://libweb.apu.ac.uk/search.htm', 10703 bytes
         3: http://libweb.apu.ac.uk/index.html

-- 
Alex Collins.     Rivermead Library IT Support Technician.
Tel:01245 493131 X3722  Fax:X3145  VideoConf: 194.66.160.110
a.collins@apu.ac.uk        http://libweb.apu.ac.uk
This message has been ROT-13 Encrypted twice for Extra Security !