Perlfect Solutions
 

[Perlfect-search] too long indexing process

gape perlfect-search@perlfect.com
Thu, 7 Aug 2003 21:59:27 +0200
This is a multi-part message in MIME format.

------=_NextPart_000_0029_01C35D2F.2BD792A0
Content-Type: text/plain;
        charset="iso-8859-2"
Content-Transfer-Encoding: quoted-printable

i have a little problem
i have perlfect 3.30 installed for a year now ... at first i was using =
it for indexind thru disk (document_root & stuff)
then i changed the conf to index thru web (base_url & stuff), becouse of =
a forum ...
i was indexing thru web
as i remember it took perlfect about 30 mins to index the site
afcourse the site has grown ... it got a php script geeklog where i run =
a 'news paper'.
it has abot 1000 articles, some with comments.
forum has 50000 posts

when the index got thru (1 year ago) i gave the directive to =
TheMasterOfTheWebMachine, to put perlfect in cron.
he didn't ...=20
so ... now i wanted to finally, really put together a search engine, =
that will tell a searcher the ansver to the question.=20

so i gave another directive to TheMasterOfTheWebMachine to index my site =
thru ssh and to look for processor times and so on ...=20

so he did

processor 100% all the time, the sites on machine were working, but ... =
100% ???
sites were slower afcourse ... but that is all ... the machine is not =
experimental, it runs a lot of commercial sites.
after a few hours TheMasterOfTheWebMachine called me, to tell me that he =
will stop the script, couse it ate 250+ of ram and all of the procesor =
...=20

i have
$HTTP_MAX_PAGES =3D 10000;

the first q is
if i have $HTTP_START_URL enabled (uncomented), should i coment out =
$DOCUMENT_ROOT - now it wasn't.

the second ... is it normal for indexer to run so long?

i mean ... site has well over 10.000 pages (html, php, pl ...) ... the =
script should stop when it reached that limit ???

or what ...


tnx ...

With Love and Light
gape
www.gape.org


------=_NextPart_000_0029_01C35D2F.2BD792A0
Content-Type: text/html;
        charset="iso-8859-2"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-2">
<META content=3D"MSHTML 6.00.2800.1170" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>i have a little problem</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>i have perlfect 3.30 installed for a =
year now ...=20
at first i was using it for indexind thru disk (document_root &amp;=20
stuff)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>then i changed the conf to index thru =
web (base_url=20
&amp; stuff), becouse of a forum ...</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>i was indexing thru web</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>as i remember it took perlfect about 30 =
mins to=20
index the site</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>afcourse the site has grown ...&nbsp;it =
got a php=20
script geeklog where i run a 'news paper'.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>it has abot 1000 articles, some with=20
comments.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>forum has 50000 posts</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>when the index got thru (1 year ago) i =
gave the=20
directive to TheMasterOfTheWebMachine, to put perlfect in =
cron.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>he didn't ... </FONT></DIV>
<DIV><FONT face=3DArial size=3D2>so ... now i wanted to finally, really =
put together=20
a search engine, that will tell a searcher the ansver to the question.=20
</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>so i gave another directive to=20
TheMasterOfTheWebMachine to index my site thru ssh and to look for =
processor=20
times and so on ... </FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>so he did</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>processor 100% all the time, the sites =
on machine=20
were working, but ... 100% ???</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>sites were slower afcourse ... but that =
is all ...=20
the machine is not experimental, it runs a lot of commercial =
sites.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>after a few hours =
TheMasterOfTheWebMachine called=20
me, to tell me that he will stop the script, couse it ate 250+ of ram =
and all of=20
the procesor ... </FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>i have</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>$HTTP_MAX_PAGES =3D 10000;</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>the first q is</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>if i have $HTTP_START_URL enabled =
(uncomented),=20
should i coment out&nbsp;$DOCUMENT_ROOT - now it wasn't.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>the second ... is it normal for indexer =
to run so=20
long?</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>i mean ... site has well over 10.000 =
pages (html,=20
php, pl ...) ... the script should stop when it reached that limit=20
???</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>or what ...</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>tnx ...</FONT></DIV><FONT face=3DArial =
size=3D2>
<DIV><BR>With Love and Light<BR>gape<BR><A=20
href=3D"http://www.gape.org">www.gape.org</A></DIV>
<DIV>&nbsp;</DIV>
<DIV></FONT>&nbsp;</DIV></BODY></HTML>

------=_NextPart_000_0029_01C35D2F.2BD792A0--