From perlfect-search@perlfect.com Sat Nov 8 09:49:35 2003 From: perlfect-search@perlfect.com (James Garrett) Date: Fri, 7 Nov 2003 23:49:35 -1000 Subject: [Perlfect-search] Problems with 22000 plus file crawl with large file sizes Message-ID: <003901c3a5dd$9e6572f0$8f474140@WorkStation> This is a multi-part message in MIME format. ------=_NextPart_000_0036_01C3A589.CCA65020 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable A couple of things, Set with a gig of ram and $LOW_MEMORY_INDEX =3D0; successful crawl took = day and half but machine melted at point of copying hash to DBase and = had to reboot. Any way to pick up from that point and finishing up = copying the hash files without having to rerun ./indexer.pl? Have read through every archive searching for how to deal with = large/numerous files and would like to use perlfect to crawl 80,000/plus = files with max 6meg file sizes. Have tried numerous changes to conf.pl = and have yet to have successful crawl exceeding 12,000. Kernel runs out = of memory. Tests on smaller file count no probs. What could be happening = that takes out a machine ? Setting $LOW_MEMORY_INDEX =3D1; is just so painfully slow. Found = reference to $FLUSH_FREQUENCY =3D100; but couldn't fig how to patch = ./indexer.pl and don't really know if that will help at all. Also am = afraid temp files could grow hugh. Such a thing as a 2gig max file size = in Linux? Perlfect ? Any light/advice/help/support/suggestions you can shed appreciated. James Garrett worldebooklibrary.com ------=_NextPart_000_0036_01C3A589.CCA65020 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
A couple of things,
 
Set with a gig of ram and = $LOW_MEMORY_INDEX =3D0;=20 successful crawl took day and half but machine melted at point of = copying=20 hash to DBase and had to reboot. Any way to pick up from that point = and=20 finishing up copying the hash files without having to rerun=20 ./indexer.pl?
 
Have read through every archive = searching=20 for how to deal with large/numerous files and would like to use = perlfect to=20 crawl 80,000/plus files with max 6meg file sizes. Have tried numerous = changes to=20 conf.pl and have yet to have successful crawl exceeding 12,000. Kernel = runs out=20 of memory. Tests on smaller file count no probs. What could be=20 happening that takes out a machine ?
 
Setting $LOW_MEMORY_INDEX =3D1; is just = so painfully=20 slow. Found reference to $FLUSH_FREQUENCY =3D100;=20  but couldn't fig how to patch ./indexer.pl and = don't really=20 know if that will help at all. Also am afraid temp files could grow=20 hugh. Such a thing as a 2gig max file size in Linux? Perlfect=20 ?
 
Any = light/advice/help/support/suggestions you can=20 shed appreciated.
 
James Garrett
worldebooklibrary.com
------=_NextPart_000_0036_01C3A589.CCA65020-- From perlfect-search@perlfect.com Sat Nov 8 12:41:24 2003 From: perlfect-search@perlfect.com (Daniel Naber) Date: Sat, 8 Nov 2003 13:41:24 +0100 Subject: [Perlfect-search] Problems with 22000 plus file crawl with large file sizes In-Reply-To: <003901c3a5dd$9e6572f0$8f474140@WorkStation> References: <003901c3a5dd$9e6572f0$8f474140@WorkStation> Message-ID: <200311081341.24273@danielnaber.de> On Saturday 08 November 2003 10:49, James Garrett wrote: > Setting $LOW_MEMORY_INDEX =1; is just so painfully slow. Found reference > to $FLUSH_FREQUENCY =100;  but couldn't fig how to patch ./indexer.pl > and don't really know if that will help at all. Also am afraid temp > files could grow hugh. Such a thing as a 2gig max file size in Linux? > Perlfect ? In a nutshell, you will probably want to use a different search engine that scales better. Try htdig or Lucene (Lucene is only a backend and requires Java programming). The $FLUSH_FREQUENCY patch probably won't help enough, the original posting talks about a 33% increase in indexing speed. Regards Daniel -- http://www.danielnaber.de From perlfect-search@perlfect.com Mon Nov 10 15:09:46 2003 From: perlfect-search@perlfect.com (orestes) Date: Mon, 10 Nov 2003 10:09:46 -0500 Subject: [Perlfect-search] Index made locally not funtion in remote webserver Message-ID: <000b01c3a79d$278d94c0$699d37c8@orestes> This is a multi-part message in MIME format. ------=_NextPart_000_0007_01C3A772.C4B34860 Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: 7bit Dear Nabel: Before all, and although a little late, I should thankyou your guessed right answer around my problem with accented characters. The character system was de key. Our site had established "charset=utf-8" in html headers, and when we ran indexer.pl, strange characters appeared in all the texts. We change to "charset=ISO-8859-1" and everything was solved. But now, my problem is another. I usually index my website locally and then, I send the content of data folder to the webserver by FTP. It had been working this way, perfectly, during several years. But, suddenly, difficulties that we have not been able to solve appeared. Now, if I run indexer.pl leaving the no_index file empty, the searcher works, but if I exclude some folder/file (as always we have made), when we trying to execute the searcher from the corresponding form we obtain the following error message: > CGI Error > The specified CGI application misbehaved by not returning a complete set of HTTP > headers. The headers it did return are: > Cannot open C:/website/mysitename/html/cgi-bin/old_news/search/data/inv_index: > at C:\website\mysitename\html\cgi-bin\old_news\search\search.pl line 76. I insist that locally, in my PC (http: / / localhost /. . . ), where I also have installed a webserver to preview my site, I don't have problems. These only happens in the remote server. Please, I appeal again to your good services and I advance you my infinite gratefulness. Orestes ------=_NextPart_000_0007_01C3A772.C4B34860 Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable
Dear Nabel:
 
Before all, and although a little late, I should = thankyou your=20 guessed right answer around my problem with accented characters. The = character=20 system was de key. Our site had established  "charset=3Dutf-8" in = html=20 headers, and when we ran indexer.pl, strange characters appeared in all = the=20 texts. We change to "charset=3DISO-8859-1" and everything was solved.=20
 
But now, my problem is another. I usually index my = website=20 locally and then, I send the content of data folder to the webserver by = FTP. It=20 had been working this way, perfectly, during several years. But, = suddenly,=20 difficulties that we have not been able to solve appeared.
 
Now, if I run indexer.pl leaving the no_index file = empty, the=20 searcher works, but if I exclude some folder/file (as always we have = made), when=20 we trying to execute the searcher from the corresponding form we obtain = the=20 following error message:
 
> CGI Error
> The specified CGI application = misbehaved by not returning a complete set of HTTP
> headers. The = headers=20 it did return are:
> Cannot open=20 C:/website/mysitename/html/cgi-bin/old_news/search/data/inv_index: =
> at=20 C:\website\mysitename\html\cgi-bin\old_news\search\search.pl line=20 76.
 
I insist that locally, in my PC (http: / / localhost = /. . . ),=20 where I also have installed a webserver to preview my site, I don't have = problems. These only happens in the remote server.
 
Please, I appeal again to your good services and I = advance you=20 my infinite gratefulness.
 
Orestes
------=_NextPart_000_0007_01C3A772.C4B34860-- From perlfect-search@perlfect.com Mon Nov 10 19:03:06 2003 From: perlfect-search@perlfect.com (Daniel Naber) Date: Mon, 10 Nov 2003 20:03:06 +0100 Subject: [Perlfect-search] Index made locally not funtion in remote webserver In-Reply-To: <000b01c3a79d$278d94c0$699d37c8@orestes> References: <000b01c3a79d$278d94c0$699d37c8@orestes> Message-ID: <200311102003.06375@danielnaber.de> On Monday 10 November 2003 16:09, orestes wrote: > Now, if I run indexer.pl leaving the no_index file empty, the searcher > works, but if I exclude some folder/file (as always we have made), when > we trying to execute the searcher from the corresponding form we obtain > the following error message: There simply is no guarantee that a locally built index works on a remote server. It may have stopped working because Perl (or some module etc.) was upgraded on the server. But maybe just the permissions of the index files got messed up (or the path to the files changed). Regards Daniel -- http://www.danielnaber.de From perlfect-search@perlfect.com Tue Nov 11 14:34:27 2003 From: perlfect-search@perlfect.com (Brian Beal) Date: Tue, 11 Nov 2003 06:34:27 -0800 Subject: [Perlfect-search] Why does this happen? Message-ID: <8AB5E62C1544D41190A1009027DC5CA705B40FFA@ntas300.matrixcos.com> This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_001_01C3A860.E8D89260 Content-Type: text/plain; charset="iso-8859-1" I just installed perlfect search 3.31 on our intranet. Install went fine, index was fine, everything seemed ok. And my initial search for a word returned results ok. The problem occurred when i attempted additional searches from the original search screen (not from the results screen). If I go back to my search screen and type in a different word to search, the result screen returns results from my previous search. I have to "refresh" the screen manually to see my new search results. It does this consistently, even if I close my browser, clear my history, and start over with a new browser instance. Any ideas why this occurs? Thanks, Brian ------_=_NextPart_001_01C3A860.E8D89260 Content-Type: text/html; charset="iso-8859-1"
I just installed perlfect search 3.31 on our intranet. Install went fine, index was fine, everything seemed ok.
 
And my initial search for a word returned results ok. The problem occurred when i attempted additional searches from the original search screen (not from the results screen).
 
If I go back to my search screen and type in a different word to search, the result screen returns results from my previous search. I have to "refresh" the screen manually to see my new search results. It does this consistently, even if I close my browser, clear my history, and start over with a new browser instance.
 
Any ideas why this occurs?
 
Thanks,

Brian

------_=_NextPart_001_01C3A860.E8D89260-- From perlfect-search@perlfect.com Tue Nov 11 20:54:50 2003 From: perlfect-search@perlfect.com (Daniel Naber) Date: Tue, 11 Nov 2003 21:54:50 +0100 Subject: [Perlfect-search] Why does this happen? In-Reply-To: <8AB5E62C1544D41190A1009027DC5CA705B40FFA@ntas300.matrixcos.com> References: <8AB5E62C1544D41190A1009027DC5CA705B40FFA@ntas300.matrixcos.com> Message-ID: <200311112154.50168@danielnaber.de> On Tuesday 11 November 2003 15:34, Brian Beal wrote: > If I go back to my search screen and type in a different word to search, > the result screen returns results from my previous search. Can you please post an URL where people can test this? Regards Daniel -- http://www.danielnaber.de From perlfect-search@perlfect.com Tue Nov 11 20:59:48 2003 From: perlfect-search@perlfect.com (Brian Beal) Date: Tue, 11 Nov 2003 12:59:48 -0800 Subject: [Perlfect-search] Why does this happen? Message-ID: <8AB5E62C1544D41190A1009027DC5CA705B40FFF@ntas300.matrixcos.com> This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_001_01C3A896.BE3549F0 Content-Type: text/plain I would if I could. Unfortunately, it's on our Intranet, which is behind our firewall. Any other suggestions? Thanks, Brian -----Original Message----- From: Daniel Naber [mailto:daniel.naber@t-online.de] Sent: Tuesday, November 11, 2003 12:55 PM To: 'perlfect-search@perlfect.com' Subject: Re: [Perlfect-search] Why does this happen? On Tuesday 11 November 2003 15:34, Brian Beal wrote: > If I go back to my search screen and type in a different word to search, > the result screen returns results from my previous search. Can you please post an URL where people can test this? Regards Daniel -- http://www.danielnaber.de _______________________________________________ perlfect-search mailing list perlfect-search@perlfect.com To unsubscribe, set other personal options or view the list archives please visit: http://perlfect.com/mailman/listinfo/perlfect-search  ------_=_NextPart_001_01C3A896.BE3549F0 Content-Type: text/html Content-Transfer-Encoding: quoted-printable RE: [Perlfect-search] Why does this happen?

I would if I could. Unfortunately, it's on our = Intranet, which is behind our firewall. Any other suggestions?

Thanks,
Brian

-----Original Message-----
From: Daniel Naber [mailto:daniel.naber@t-online.de= ]
Sent: Tuesday, November 11, 2003 12:55 PM
To: 'perlfect-search@perlfect.com'
Subject: Re: [Perlfect-search] Why does this = happen?


On Tuesday 11 November 2003 15:34, Brian Beal = wrote:

> If I go back to my search screen and type in a = different word to search,
> the result screen returns results from my = previous search.

Can you please post an URL where people can test = this?

Regards
 Daniel

--
http://www.danielnaber.de
_______________________________________________
perlfect-search mailing list
perlfect-search@perlfect.com
To unsubscribe, set other personal options or view = the list archives please visit:
http://perlfect.com/mailman/listinfo/perlfect-search
=1A

------_=_NextPart_001_01C3A896.BE3549F0-- From perlfect-search@perlfect.com Wed Nov 12 19:34:14 2003 From: perlfect-search@perlfect.com (Daniel Naber) Date: Wed, 12 Nov 2003 20:34:14 +0100 Subject: [Perlfect-search] Why does this happen? In-Reply-To: <8AB5E62C1544D41190A1009027DC5CA705B40FFF@ntas300.matrixcos.com> References: <8AB5E62C1544D41190A1009027DC5CA705B40FFF@ntas300.matrixcos.com> Message-ID: <200311122034.14880@danielnaber.de> On Tuesday 11 November 2003 21:59, Brian Beal wrote: > I would if I could. Unfortunately, it's on our Intranet, which is behind > our firewall. Any other suggestions? Do you use a proxy? Did you check with other browsers? Maybe you accidentally added a meta header that prevents refreshing (either in the page, or a http header in the server configuration). Regards Daniel -- http://www.danielnaber.de From perlfect-search@perlfect.com Wed Nov 12 21:07:11 2003 From: perlfect-search@perlfect.com (Brian Joergensen) Date: Wed, 12 Nov 2003 22:07:11 +0100 Subject: [Perlfect-search] Have any of you been lucky to get this Perlfect program running? Message-ID: <000e01c3a960$f12b30c0$0200000a@n2e5n2> This is a multi-part message in MIME format. ------=_NextPart_000_000B_01C3A969.5270F2E0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable I have used the last two days to try install this by setup.pl and = manually, but without any results... Whatever... ------=_NextPart_000_000B_01C3A969.5270F2E0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
I have used the last two days to try install this by = setup.pl=20 and manually, but without any results...
 
Whatever...
------=_NextPart_000_000B_01C3A969.5270F2E0-- From perlfect-search@perlfect.com Thu Nov 13 07:08:11 2003 From: perlfect-search@perlfect.com (Morten Wulff) Date: Thu, 13 Nov 2003 08:08:11 +0100 Subject: [Perlfect-search] Have any of you been lucky to get this Perlfect program running? In-Reply-To: <000e01c3a960$f12b30c0$0200000a@n2e5n2> References: <000e01c3a960$f12b30c0$0200000a@n2e5n2> Message-ID: On Wed, 12 Nov 2003 22:07:11 +0100, Brian Joergensen wrote: > I have used the last two days to try install this by setup.pl and > manually, but without any results... On what platform, webserver etc? Two days sounds a bit excessive ;-) I've got Perlfect Search running on my Linux/Apache and W2K/IIS servers -- installed without a glitch in both cases (manual install). Med venlig hilsen / Kind regards Morten Wulff -- Self Injury Information and Support: www.psyke.org From perlfect-search@perlfect.com Mon Nov 24 20:35:57 2003 From: perlfect-search@perlfect.com (Daniel Naber) Date: Mon, 24 Nov 2003 21:35:57 +0100 Subject: [Perlfect-search] ANN: search logfile analysis tool Message-ID: <200311242135.57028@danielnaber.de> Hi, I've written a small Perl tool that lets you analyze your log files to see what people's top search terms are on your homepage (assuming you use Perlfect Search, of course): http://www.danielnaber.de/searchstats/ You can either upload your logs (note there's a size limit) or download the script and use it locally. Please send me feedback and tell me if it works okay. I'm also interested in the results of people who have high-traffic websites, so feel free to send me the tool's output pages (the files you upload to the server are not saved, neither is the output). Making the script work with other search engines should be trivial. Regards Daniel -- http://www.danielnaber.de From perlfect-search@perlfect.com Tue Nov 25 12:04:06 2003 From: perlfect-search@perlfect.com (Jochen Luig) Date: Tue, 25 Nov 2003 13:04:06 +0100 Subject: [Perlfect-search] ANN: search logfile analysis tool References: <200311242135.57028@danielnaber.de> Message-ID: <3FC34536.5020005@c-lab.de> Hi Daniel! Which is the first Perlfect Search version that provides the log feature? Im running 3.10 and there is no $LOG variable in my conf.pl. I think I'll have to upgrade instead of merely setting $LOG. Will newer versions mess with my index? Regards Jochen From perlfect-search@perlfect.com Tue Nov 25 18:17:08 2003 From: perlfect-search@perlfect.com (Daniel Naber) Date: Tue, 25 Nov 2003 19:17:08 +0100 Subject: [Perlfect-search] ANN: search logfile analysis tool In-Reply-To: <3FC34536.5020005@c-lab.de> References: <200311242135.57028@danielnaber.de> <3FC34536.5020005@c-lab.de> Message-ID: <200311251917.08589@danielnaber.de> On Tuesday 25 November 2003 13:04, Jochen Luig wrote: > Which is the first Perlfect Search version that provides the log > feature? Im running 3.10 and there is no $LOG variable in my conf.pl. It was added in 3.20beta. > I think I'll have to upgrade instead of merely setting $LOG. > Will newer versions mess with my index? You'll have to re-index everything. On the other hand, why not just upload the web server log file? Regards Daniel -- http://www.danielnaber.de From perlfect-search@perlfect.com Tue Nov 25 21:43:20 2003 From: perlfect-search@perlfect.com (Morten Wulff) Date: Tue, 25 Nov 2003 22:43:20 +0100 Subject: [Perlfect-search] ANN: search logfile analysis tool In-Reply-To: <200311242135.57028@danielnaber.de> References: <200311242135.57028@danielnaber.de> Message-ID: On Mon, 24 Nov 2003 21:35:57 +0100, Daniel Naber wrote: > Please send me feedback and tell me if it works okay. Works great (both online and offline). I have one small feature idea: It would be nice to be able to see search phrases as well as search terms. I have written a script which does that: http://www.psyke.org/about/tech/searchlog.pl Usage: searchlog.pl [...] I.e. to run the script and exclude the IP numbers of my development machines at home and at work (addresses have been obscured): searchlog.pl log.txt 62.61.n.n 192.38.n.n Sample output: IP Hostname Hits ------------------------------------------------------------------------ 67.41.31.181 67.41.31.181 60 199.98.20.223 conjunction.ee.cooper.edu 44 68.84.77.59 pcp02432332pcs.trnrsv01.nj.comcast.net 42 Phrase Count ------------------------------------------------------------------------ cutting 94 self burning 60 how not to commit suicide 38 Term Count ------------------------------------------------------------------------ self 330 cutting 190 bulimia 110 The hostname stuff is mainly for fun, but I find the phrase stuff useful (although it might be an idea to only include phrases with two or more words). Kind regards, Morten Wulff -- Self Injury Information and Support: www.psyke.org "I have a school book with my name on it." "Your parents must be so proud." (http://www.actsofgord.com/)