From perlfect-search@perlfect.com Sat Nov 8 09:49:35 2003
From: perlfect-search@perlfect.com (James Garrett)
Date: Fri, 7 Nov 2003 23:49:35 -1000
Subject: [Perlfect-search] Problems with 22000 plus file crawl with large file sizes
Message-ID: <003901c3a5dd$9e6572f0$8f474140@WorkStation>
This is a multi-part message in MIME format.
------=_NextPart_000_0036_01C3A589.CCA65020
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
A couple of things,
Set with a gig of ram and $LOW_MEMORY_INDEX =3D0; successful crawl took =
day and half but machine melted at point of copying hash to DBase and =
had to reboot. Any way to pick up from that point and finishing up =
copying the hash files without having to rerun ./indexer.pl?
Have read through every archive searching for how to deal with =
large/numerous files and would like to use perlfect to crawl 80,000/plus =
files with max 6meg file sizes. Have tried numerous changes to conf.pl =
and have yet to have successful crawl exceeding 12,000. Kernel runs out =
of memory. Tests on smaller file count no probs. What could be happening =
that takes out a machine ?
Setting $LOW_MEMORY_INDEX =3D1; is just so painfully slow. Found =
reference to $FLUSH_FREQUENCY =3D100; but couldn't fig how to patch =
./indexer.pl and don't really know if that will help at all. Also am =
afraid temp files could grow hugh. Such a thing as a 2gig max file size =
in Linux? Perlfect ?
Any light/advice/help/support/suggestions you can shed appreciated.
James Garrett
worldebooklibrary.com
------=_NextPart_000_0036_01C3A589.CCA65020
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
A couple of things,
Set with a gig of ram and =
$LOW_MEMORY_INDEX =3D0;=20
successful crawl took day and half but machine melted at point of =
copying=20
hash to DBase and had to reboot. Any way to pick up from that point =
and=20
finishing up copying the hash files without having to rerun=20
./indexer.pl?
Have read through every archive =
searching=20
for how to deal with large/numerous files and would like to use =
perlfect to=20
crawl 80,000/plus files with max 6meg file sizes. Have tried numerous =
changes to=20
conf.pl and have yet to have successful crawl exceeding 12,000. Kernel =
runs out=20
of memory. Tests on smaller file count no probs. What could be=20
happening that takes out a machine ?
Setting $LOW_MEMORY_INDEX =3D1; is just =
so painfully=20
slow. Found reference to $FLUSH_FREQUENCY =3D100;=20
but couldn't fig how to patch ./indexer.pl and =
don't really=20
know if that will help at all. Also am afraid temp files could grow=20
hugh. Such a thing as a 2gig max file size in Linux? Perlfect=20
?
Any =
light/advice/help/support/suggestions you can=20
shed appreciated.
James Garrett
worldebooklibrary.com
------=_NextPart_000_0036_01C3A589.CCA65020--
From perlfect-search@perlfect.com Sat Nov 8 12:41:24 2003
From: perlfect-search@perlfect.com (Daniel Naber)
Date: Sat, 8 Nov 2003 13:41:24 +0100
Subject: [Perlfect-search] Problems with 22000 plus file crawl with large file sizes
In-Reply-To: <003901c3a5dd$9e6572f0$8f474140@WorkStation>
References: <003901c3a5dd$9e6572f0$8f474140@WorkStation>
Message-ID: <200311081341.24273@danielnaber.de>
On Saturday 08 November 2003 10:49, James Garrett wrote:
> Setting $LOW_MEMORY_INDEX =1; is just so painfully slow. Found reference
> to $FLUSH_FREQUENCY =100; but couldn't fig how to patch ./indexer.pl
> and don't really know if that will help at all. Also am afraid temp
> files could grow hugh. Such a thing as a 2gig max file size in Linux?
> Perlfect ?
In a nutshell, you will probably want to use a different search engine that
scales better. Try htdig or Lucene (Lucene is only a backend and requires
Java programming). The $FLUSH_FREQUENCY patch probably won't help enough,
the original posting talks about a 33% increase in indexing speed.
Regards
Daniel
--
http://www.danielnaber.de
From perlfect-search@perlfect.com Mon Nov 10 15:09:46 2003
From: perlfect-search@perlfect.com (orestes)
Date: Mon, 10 Nov 2003 10:09:46 -0500
Subject: [Perlfect-search] Index made locally not funtion in remote webserver
Message-ID: <000b01c3a79d$278d94c0$699d37c8@orestes>
This is a multi-part message in MIME format.
------=_NextPart_000_0007_01C3A772.C4B34860
Content-Type: text/plain;
charset="Windows-1252"
Content-Transfer-Encoding: 7bit
Dear Nabel:
Before all, and although a little late, I should thankyou your guessed right
answer around my problem with accented characters. The character system was
de key. Our site had established "charset=utf-8" in html headers, and when
we ran indexer.pl, strange characters appeared in all the texts. We change
to "charset=ISO-8859-1" and everything was solved.
But now, my problem is another. I usually index my website locally and then,
I send the content of data folder to the webserver by FTP. It had been
working this way, perfectly, during several years. But, suddenly,
difficulties that we have not been able to solve appeared.
Now, if I run indexer.pl leaving the no_index file empty, the searcher
works, but if I exclude some folder/file (as always we have made), when we
trying to execute the searcher from the corresponding form we obtain the
following error message:
> CGI Error
> The specified CGI application misbehaved by not returning a complete set
of HTTP
> headers. The headers it did return are:
> Cannot open
C:/website/mysitename/html/cgi-bin/old_news/search/data/inv_index:
> at C:\website\mysitename\html\cgi-bin\old_news\search\search.pl line 76.
I insist that locally, in my PC (http: / / localhost /. . . ), where I also
have installed a webserver to preview my site, I don't have problems. These
only happens in the remote server.
Please, I appeal again to your good services and I advance you my infinite
gratefulness.
Orestes
------=_NextPart_000_0007_01C3A772.C4B34860
Content-Type: text/html;
charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
Dear Nabel:
Before all, and although a little late, I should =
thankyou your=20
guessed right answer around my problem with accented characters. The =
character=20
system was de key. Our site had established "charset=3Dutf-8" in =
html=20
headers, and when we ran indexer.pl, strange characters appeared in all =
the=20
texts. We change to "charset=3DISO-8859-1" and everything was solved.=20
But now, my problem is another. I usually index my =
website=20
locally and then, I send the content of data folder to the webserver by =
FTP. It=20
had been working this way, perfectly, during several years. But, =
suddenly,=20
difficulties that we have not been able to solve appeared.
Now, if I run indexer.pl leaving the no_index file =
empty, the=20
searcher works, but if I exclude some folder/file (as always we have =
made), when=20
we trying to execute the searcher from the corresponding form we obtain =
the=20
following error message:
> CGI Error
> The specified CGI application =
misbehaved by not returning a complete set of HTTP
> headers. The =
headers=20
it did return are:
> Cannot open=20
C:/website/mysitename/html/cgi-bin/old_news/search/data/inv_index: =
> at=20
C:\website\mysitename\html\cgi-bin\old_news\search\search.pl line=20
76.
I insist that locally, in my PC (http: / / localhost =
/. . . ),=20
where I also have installed a webserver to preview my site, I don't have =
problems. These only happens in the remote server.
Please, I appeal again to your good services and I =
advance you=20
my infinite gratefulness.
Orestes
------=_NextPart_000_0007_01C3A772.C4B34860--
From perlfect-search@perlfect.com Mon Nov 10 19:03:06 2003
From: perlfect-search@perlfect.com (Daniel Naber)
Date: Mon, 10 Nov 2003 20:03:06 +0100
Subject: [Perlfect-search] Index made locally not funtion in remote webserver
In-Reply-To: <000b01c3a79d$278d94c0$699d37c8@orestes>
References: <000b01c3a79d$278d94c0$699d37c8@orestes>
Message-ID: <200311102003.06375@danielnaber.de>
On Monday 10 November 2003 16:09, orestes wrote:
> Now, if I run indexer.pl leaving the no_index file empty, the searcher
> works, but if I exclude some folder/file (as always we have made), when
> we trying to execute the searcher from the corresponding form we obtain
> the following error message:
There simply is no guarantee that a locally built index works on a remote
server. It may have stopped working because Perl (or some module etc.) was
upgraded on the server. But maybe just the permissions of the index files
got messed up (or the path to the files changed).
Regards
Daniel
--
http://www.danielnaber.de
From perlfect-search@perlfect.com Tue Nov 11 14:34:27 2003
From: perlfect-search@perlfect.com (Brian Beal)
Date: Tue, 11 Nov 2003 06:34:27 -0800
Subject: [Perlfect-search] Why does this happen?
Message-ID: <8AB5E62C1544D41190A1009027DC5CA705B40FFA@ntas300.matrixcos.com>
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
------_=_NextPart_001_01C3A860.E8D89260
Content-Type: text/plain;
charset="iso-8859-1"
I just installed perlfect search 3.31 on our intranet. Install went fine,
index was fine, everything seemed ok.
And my initial search for a word returned results ok. The problem occurred
when i attempted additional searches from the original search screen (not
from the results screen).
If I go back to my search screen and type in a different word to search, the
result screen returns results from my previous search. I have to "refresh"
the screen manually to see my new search results. It does this consistently,
even if I close my browser, clear my history, and start over with a new
browser instance.
Any ideas why this occurs?
Thanks,
Brian
------_=_NextPart_001_01C3A860.E8D89260
Content-Type: text/html;
charset="iso-8859-1"
I just installed
perlfect search 3.31 on our intranet. Install went fine, index was fine,
everything seemed ok.
And my initial
search for a word returned results ok. The problem occurred when i attempted
additional searches from the original search screen (not from the results
screen).
If I go back to my
search screen and type in a different word to search, the result screen returns
results from my previous search. I have to "refresh" the screen manually to see
my new search results. It does this consistently, even if I close my browser,
clear my history, and start over with a new browser
instance.
Any ideas why this
occurs?
Thanks,
Brian
------_=_NextPart_001_01C3A860.E8D89260--
From perlfect-search@perlfect.com Tue Nov 11 20:54:50 2003
From: perlfect-search@perlfect.com (Daniel Naber)
Date: Tue, 11 Nov 2003 21:54:50 +0100
Subject: [Perlfect-search] Why does this happen?
In-Reply-To: <8AB5E62C1544D41190A1009027DC5CA705B40FFA@ntas300.matrixcos.com>
References: <8AB5E62C1544D41190A1009027DC5CA705B40FFA@ntas300.matrixcos.com>
Message-ID: <200311112154.50168@danielnaber.de>
On Tuesday 11 November 2003 15:34, Brian Beal wrote:
> If I go back to my search screen and type in a different word to search,
> the result screen returns results from my previous search.
Can you please post an URL where people can test this?
Regards
Daniel
--
http://www.danielnaber.de
From perlfect-search@perlfect.com Tue Nov 11 20:59:48 2003
From: perlfect-search@perlfect.com (Brian Beal)
Date: Tue, 11 Nov 2003 12:59:48 -0800
Subject: [Perlfect-search] Why does this happen?
Message-ID: <8AB5E62C1544D41190A1009027DC5CA705B40FFF@ntas300.matrixcos.com>
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
------_=_NextPart_001_01C3A896.BE3549F0
Content-Type: text/plain
I would if I could. Unfortunately, it's on our Intranet, which is behind our
firewall. Any other suggestions?
Thanks,
Brian
-----Original Message-----
From: Daniel Naber [mailto:daniel.naber@t-online.de]
Sent: Tuesday, November 11, 2003 12:55 PM
To: 'perlfect-search@perlfect.com'
Subject: Re: [Perlfect-search] Why does this happen?
On Tuesday 11 November 2003 15:34, Brian Beal wrote:
> If I go back to my search screen and type in a different word to search,
> the result screen returns results from my previous search.
Can you please post an URL where people can test this?
Regards
Daniel
--
http://www.danielnaber.de
_______________________________________________
perlfect-search mailing list
perlfect-search@perlfect.com
To unsubscribe, set other personal options or view the list archives please
visit:
http://perlfect.com/mailman/listinfo/perlfect-search
------_=_NextPart_001_01C3A896.BE3549F0
Content-Type: text/html
Content-Transfer-Encoding: quoted-printable
RE: [Perlfect-search] Why does this happen?
I would if I could. Unfortunately, it's on our =
Intranet, which is behind our firewall. Any other suggestions?
Thanks,
Brian
-----Original Message-----
From: Daniel Naber [mailto:daniel.naber@t-online.de=
]
Sent: Tuesday, November 11, 2003 12:55 PM
To: 'perlfect-search@perlfect.com'
Subject: Re: [Perlfect-search] Why does this =
happen?
On Tuesday 11 November 2003 15:34, Brian Beal =
wrote:
> If I go back to my search screen and type in a =
different word to search,
> the result screen returns results from my =
previous search.
Can you please post an URL where people can test =
this?
Regards
Daniel
--
http://www.danielnaber.de
_______________________________________________
perlfect-search mailing list
perlfect-search@perlfect.com
To unsubscribe, set other personal options or view =
the list archives please visit:
http://perlfect.com/mailman/listinfo/perlfect-search=
A>
=1A
------_=_NextPart_001_01C3A896.BE3549F0--
From perlfect-search@perlfect.com Wed Nov 12 19:34:14 2003
From: perlfect-search@perlfect.com (Daniel Naber)
Date: Wed, 12 Nov 2003 20:34:14 +0100
Subject: [Perlfect-search] Why does this happen?
In-Reply-To: <8AB5E62C1544D41190A1009027DC5CA705B40FFF@ntas300.matrixcos.com>
References: <8AB5E62C1544D41190A1009027DC5CA705B40FFF@ntas300.matrixcos.com>
Message-ID: <200311122034.14880@danielnaber.de>
On Tuesday 11 November 2003 21:59, Brian Beal wrote:
> I would if I could. Unfortunately, it's on our Intranet, which is behind
> our firewall. Any other suggestions?
Do you use a proxy? Did you check with other browsers? Maybe you
accidentally added a meta header that prevents refreshing (either in the
page, or a http header in the server configuration).
Regards
Daniel
--
http://www.danielnaber.de
From perlfect-search@perlfect.com Wed Nov 12 21:07:11 2003
From: perlfect-search@perlfect.com (Brian Joergensen)
Date: Wed, 12 Nov 2003 22:07:11 +0100
Subject: [Perlfect-search] Have any of you been lucky to get this Perlfect program running?
Message-ID: <000e01c3a960$f12b30c0$0200000a@n2e5n2>
This is a multi-part message in MIME format.
------=_NextPart_000_000B_01C3A969.5270F2E0
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
I have used the last two days to try install this by setup.pl and =
manually, but without any results...
Whatever...
------=_NextPart_000_000B_01C3A969.5270F2E0
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
I have used the last two days to try install this by =
setup.pl=20
and manually, but without any results...
Whatever...
------=_NextPart_000_000B_01C3A969.5270F2E0--
From perlfect-search@perlfect.com Thu Nov 13 07:08:11 2003
From: perlfect-search@perlfect.com (Morten Wulff)
Date: Thu, 13 Nov 2003 08:08:11 +0100
Subject: [Perlfect-search] Have any of you been lucky to get this Perlfect program running?
In-Reply-To: <000e01c3a960$f12b30c0$0200000a@n2e5n2>
References: <000e01c3a960$f12b30c0$0200000a@n2e5n2>
Message-ID:
On Wed, 12 Nov 2003 22:07:11 +0100, Brian Joergensen
wrote:
> I have used the last two days to try install this by setup.pl and
> manually, but without any results...
On what platform, webserver etc? Two days sounds a bit excessive ;-)
I've got Perlfect Search running on my Linux/Apache and W2K/IIS servers --
installed without a glitch in both cases (manual install).
Med venlig hilsen / Kind regards
Morten Wulff
--
Self Injury Information and Support: www.psyke.org
From perlfect-search@perlfect.com Mon Nov 24 20:35:57 2003
From: perlfect-search@perlfect.com (Daniel Naber)
Date: Mon, 24 Nov 2003 21:35:57 +0100
Subject: [Perlfect-search] ANN: search logfile analysis tool
Message-ID: <200311242135.57028@danielnaber.de>
Hi,
I've written a small Perl tool that lets you analyze your log files to see
what people's top search terms are on your homepage (assuming you use
Perlfect Search, of course):
http://www.danielnaber.de/searchstats/
You can either upload your logs (note there's a size limit) or download the
script and use it locally.
Please send me feedback and tell me if it works okay. I'm also interested
in the results of people who have high-traffic websites, so feel free to
send me the tool's output pages (the files you upload to the server are
not saved, neither is the output). Making the script work with other
search engines should be trivial.
Regards
Daniel
--
http://www.danielnaber.de
From perlfect-search@perlfect.com Tue Nov 25 12:04:06 2003
From: perlfect-search@perlfect.com (Jochen Luig)
Date: Tue, 25 Nov 2003 13:04:06 +0100
Subject: [Perlfect-search] ANN: search logfile analysis tool
References: <200311242135.57028@danielnaber.de>
Message-ID: <3FC34536.5020005@c-lab.de>
Hi Daniel!
Which is the first Perlfect Search version that provides the log
feature? Im running 3.10 and there is no $LOG variable in my conf.pl.
I think I'll have to upgrade instead of merely setting $LOG.
Will newer versions mess with my index?
Regards
Jochen
From perlfect-search@perlfect.com Tue Nov 25 18:17:08 2003
From: perlfect-search@perlfect.com (Daniel Naber)
Date: Tue, 25 Nov 2003 19:17:08 +0100
Subject: [Perlfect-search] ANN: search logfile analysis tool
In-Reply-To: <3FC34536.5020005@c-lab.de>
References: <200311242135.57028@danielnaber.de> <3FC34536.5020005@c-lab.de>
Message-ID: <200311251917.08589@danielnaber.de>
On Tuesday 25 November 2003 13:04, Jochen Luig wrote:
> Which is the first Perlfect Search version that provides the log
> feature? Im running 3.10 and there is no $LOG variable in my conf.pl.
It was added in 3.20beta.
> I think I'll have to upgrade instead of merely setting $LOG.
> Will newer versions mess with my index?
You'll have to re-index everything. On the other hand, why not just upload
the web server log file?
Regards
Daniel
--
http://www.danielnaber.de
From perlfect-search@perlfect.com Tue Nov 25 21:43:20 2003
From: perlfect-search@perlfect.com (Morten Wulff)
Date: Tue, 25 Nov 2003 22:43:20 +0100
Subject: [Perlfect-search] ANN: search logfile analysis tool
In-Reply-To: <200311242135.57028@danielnaber.de>
References: <200311242135.57028@danielnaber.de>
Message-ID:
On Mon, 24 Nov 2003 21:35:57 +0100, Daniel Naber
wrote:
> Please send me feedback and tell me if it works okay.
Works great (both online and offline). I have one small feature idea: It
would be nice to be able to see search phrases as well as search terms.
I have written a script which does that:
http://www.psyke.org/about/tech/searchlog.pl
Usage: searchlog.pl [...]
I.e. to run the script and exclude the IP numbers of my development
machines at home and at work (addresses have been obscured):
searchlog.pl log.txt 62.61.n.n 192.38.n.n
Sample output:
IP Hostname Hits
------------------------------------------------------------------------
67.41.31.181 67.41.31.181 60
199.98.20.223 conjunction.ee.cooper.edu 44
68.84.77.59 pcp02432332pcs.trnrsv01.nj.comcast.net 42
Phrase Count
------------------------------------------------------------------------
cutting 94
self burning 60
how not to commit suicide 38
Term Count
------------------------------------------------------------------------
self 330
cutting 190
bulimia 110
The hostname stuff is mainly for fun, but I find the phrase stuff useful
(although it might be an idea to only include phrases with two or more
words).
Kind regards,
Morten Wulff
--
Self Injury Information and Support: www.psyke.org
"I have a school book with my name on it."
"Your parents must be so proud." (http://www.actsofgord.com/)