|
|
[Perlfect-search] Problems with http indexing.
Alex Collins perlfect-search@perlfect.com
Mon, 30 Sep 2002 16:20:42 +0100
Hi There
Just tried to http index our shiny new web site. I have been
using perlfect for a while now, and upgraded to 3.30.
I'm on a Cobalt Raq server.
I get piles of output - but it only actually seems to index the
index.html page, and everything else just links back to the index.html
page.
Am i being stupid ?
I had a brief look in the archives but couldn't see anything.
TIA
Alex Collins.
The text output i get is:
[root search]# ./indexer.pl
[Mon Sep 30 16:17:17 2002] indexer.pl: Name "main::VERSION" used only
once: possible typo at ./indexer.pl line 68.
Using DB_File...
Checking for old temp files...
Building string of special characters...
Loading 'no index' regular expressions:
- /home/sites/site1/web/cgi-bin/*
- /home/sites/site1/web/test/*
- /home/sites/site1/web/cgi/*
- /home/sites/site1/web/tour/*
- /home/sites/site1/web/robots.txt
- /home/sites/site1/web/sitemap/*
- /home/sites/site1/web/stats/*
- /home/sites/site1/web/journals/project*
- /home/sites/site1/web/site/*
Loading stopwords...389 stopwords loaded.
Starting crawler...
Note: I will not visit more than $HTTP_MAX_PAGES=10000 pages.
Fetched 'http://libweb.apu.ac.uk/', 12469 bytes
1: http://libweb.apu.ac.uk/index.html
Ignoring 'http://libweb.apu.ac.uk/newcss.css': content-type 'text/css'
Ignoring 'http://libweb.apu.ac.uk/index.html': already visited
Ignoring 'http://www.apu.ac.uk': not below $HTTP_LIMIT_URL or non-http
protocol
Fetched 'http://libweb.apu.ac.uk/access/access.htm', 10872 bytes
2: http://libweb.apu.ac.uk/index.html
Ignoring 'http://libweb.apu.ac.uk/newcss.css': content-type 'text/css'
Ignoring 'http://libweb.apu.ac.uk/index.html': already visited
Ignoring 'http://www.apu.ac.uk': not below $HTTP_LIMIT_URL or non-http
protocol
Fetched 'http://libweb.apu.ac.uk/access/access.htm', 10872 bytes
Ignoring 'http://libweb.apu.ac.uk/index.html': content identical to
'http://libweb.apu.ac.uk/index.html'
Fetched 'http://libweb.apu.ac.uk/search.htm', 10703 bytes
3: http://libweb.apu.ac.uk/index.html
--
Alex Collins. Rivermead Library IT Support Technician.
Tel:01245 493131 X3722 Fax:X3145 VideoConf: 194.66.160.110
a.collins@apu.ac.uk http://libweb.apu.ac.uk
This message has been ROT-13 Encrypted twice for Extra Security !
|
|