Perlfect Solutions
 

[Perlfect-search] Indexer doesn't follow links with an "&" in it

Jens Gutzeit perlfect-search@perlfect.com
Fri, 23 Aug 2002 17:31:44 +0200
Hi

This is the important output of indexer.pl

Checking for old temp files...
Building string of special characters...
Loading 'no index' regular expressions:
        - http://lfsforum.org/links/*
        - http://lfsforum.org/ghostbb/*
Loading stopwords...371 stopwords loaded.
Starting crawler...
Note: I will not visit more than $HTTP_MAX_PAGES=100 pages.
         1: http://lfsforum.org/
         2: http://lfsforum.org/index.php
         3: http://lfsforum.org/search.php
         4: http://lfsforum.org/index.php?sid=87cccf0532eecc90ef210f957eb337fa
         5:
http://lfsforum.org/news_archiv/?sid=87cccf0532eecc90ef210f957eb337fa
         6: http://lfsforum.org/news_archiv/index.php
         7:
http://lfsforum.org/news_archiv/index.php?sid=22f41d149b2a9783d9d422ea4abf451d
         8: http://lfsforum.org/index.php?sid=22f41d149b2a9783d9d422ea4abf451d
         9:
http://lfsforum.org/howto_schreiben/?sid=22f41d149b2a9783d9d422ea4abf451d
         10: http://lfsforum.org/howto_schreiben/index.php
         11:
http://lfsforum.org/howto_schreiben/index.php?sid=7b1adb279b8a7b0256f6e6c5527d8148
         12: http://lfsforum.org/index.php?sid=7b1adb279b8a7b0256f6e6c5527d8148
         13:
http://lfsforum.org/news_archiv/?sid=7b1adb279b8a7b0256f6e6c5527d8148
         14: http://lfsforum.org/howtos/?sid=7b1adb279b8a7b0256f6e6c5527d8148
         15: http://lfsforum.org/howtos/index.php
         16:
http://lfsforum.org/howtos/index.php?sid=ca2c8abb1c22f2095de98885ce97050e
         17: http://lfsforum.org/index.php?sid=ca2c8abb1c22f2095de98885ce97050e
         18:
http://lfsforum.org/news_archiv/?sid=ca2c8abb1c22f2095de98885ce97050e
         19:
http://lfsforum.org/howto_schreiben/?sid=ca2c8abb1c22f2095de98885ce97050e
         20:
http://lfsforum.org/kurztipps/?sid=ca2c8abb1c22f2095de98885ce97050e
         21: http://lfsforum.org/kurztipps/index.php
         22:
http://lfsforum.org/kurztipps/index.php?sid=0a4f3de400dd32cabbfabc5e3716e581
         23: http://lfsforum.org/index.php?sid=0a4f3de400dd32cabbfabc5e3716e581
         24:
http://lfsforum.org/news_archiv/?sid=0a4f3de400dd32cabbfabc5e3716e581
         25:
http://lfsforum.org/howto_schreiben/?sid=0a4f3de400dd32cabbfabc5e3716e581
         26: http://lfsforum.org/howtos/?sid=0a4f3de400dd32cabbfabc5e3716e581
         27: http://lfsforum.org/lfs-buch/?sid=0a4f3de400dd32cabbfabc5e3716e581
         28: http://lfsforum.org/lfs-buch/index.php
         29:
http://lfsforum.org/lfs-buch/index.php?sid=3bb0a52ed819b0df4c8d4ef63610024f
         30: http://lfsforum.org/index.php?sid=3bb0a52ed819b0df4c8d4ef63610024f
Crawler finished: indexed 30 files, 23851 terms (1627 different terms), 
ignored 519 files because of conf/no_index.txt

[...]

Here are some parts of my conf.pl
$HTTP_START_URL = 'http://lfsforum.org/';
$HTTP_MAX_PAGES = 100;
$HTTP_SERVER_ROOT = $DOCUMENT_ROOT;
@HTTP_LIMIT_URLS = ('http://lfsforum.org/howtos/',
'http://lfsforum.org/');

Thank you very much for your help.
I hope it's makes clear what my problem is

best regards
Jens

-- 
Hilfe zu LFS Problemen: http://www.lfsforum.de
Public Key: http://lfsforum.org/jens-gutzeit.asc