Perlfect Solutions
 

[Perlfect-search] Problems with 22000 plus file crawl with large file sizes

James Garrett perlfect-search@perlfect.com
Fri, 7 Nov 2003 23:49:35 -1000
This is a multi-part message in MIME format.

------=_NextPart_000_0036_01C3A589.CCA65020
Content-Type: text/plain;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

A couple of things,

Set with a gig of ram and $LOW_MEMORY_INDEX =3D0; successful crawl took =
day and half but machine melted at point of copying hash to DBase and =
had to reboot. Any way to pick up from that point and finishing up =
copying the hash files without having to rerun ./indexer.pl?

Have read through every archive searching for how to deal with =
large/numerous files and would like to use perlfect to crawl 80,000/plus =
files with max 6meg file sizes. Have tried numerous changes to conf.pl =
and have yet to have successful crawl exceeding 12,000. Kernel runs out =
of memory. Tests on smaller file count no probs. What could be happening =
that takes out a machine ?

Setting $LOW_MEMORY_INDEX =3D1; is just so painfully slow. Found =
reference to $FLUSH_FREQUENCY =3D100;  but couldn't fig how to patch =
./indexer.pl and don't really know if that will help at all. Also am =
afraid temp files could grow hugh. Such a thing as a 2gig max file size =
in Linux? Perlfect ?

Any light/advice/help/support/suggestions you can shed appreciated.

James Garrett
worldebooklibrary.com
------=_NextPart_000_0036_01C3A589.CCA65020
Content-Type: text/html;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2800.1264" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>
<DIV><FONT face=3DArial size=3D2>A couple of things,</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Set with a gig of ram and =
$LOW_MEMORY_INDEX =3D0;=20
successful crawl took day and half&nbsp;but machine melted at point of =
copying=20
hash to DBase and had to reboot. Any way to pick up from&nbsp;that point =
and=20
finishing up copying the&nbsp;hash files&nbsp;without having to rerun=20
./indexer.pl?</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Have read through every archive =
searching=20
for&nbsp;how to deal with large/numerous files and would like to use =
perlfect to=20
crawl 80,000/plus files with max 6meg file sizes. Have tried numerous =
changes to=20
conf.pl and have yet to have successful crawl exceeding 12,000. Kernel =
runs out=20
of memory.&nbsp;Tests on smaller file count no probs. What could be=20
happening&nbsp;that takes out a machine ?</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Setting $LOW_MEMORY_INDEX =3D1; is just =
so painfully=20
slow. Found reference to $FLUSH_FREQUENCY&nbsp;=3D100;=20
&nbsp;but&nbsp;couldn't&nbsp;fig&nbsp;how to patch ./indexer.pl and =
don't really=20
know if that will help at all. Also am afraid temp files could grow=20
hugh.&nbsp;Such a thing as a 2gig max file size in Linux? Perlfect=20
?</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Any =
light/advice/help/support/suggestions you can=20
shed appreciated.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>James Garrett</FONT></DIV>
<DIV><FONT face=3DArial=20
size=3D2>worldebooklibrary.com</FONT></DIV></FONT></DIV></BODY></HTML>

------=_NextPart_000_0036_01C3A589.CCA65020--