From JMilks at auburn.wednet.edu Tue Nov 2 16:41:43 2004 From: JMilks at auburn.wednet.edu (Milks, Jennifer) Date: Tue Nov 2 16:42:13 2004 Subject: [Perlfect-search] Access denied Message-ID: <5C2203CBEDCCC14C86629ABD13321CBF07C4961F@mx.email.auburn.wednet.edu> I am trying to use this search engine, but I am getting "access denied" when I try to search on anything. I ran through the installation process and it all seemed to work well, except it won't present me with any search results. I am running IIS 6.0 with Active State's Perl. This web site requires authentication. What permissions need to be where in order for the search results page to be presented to me? Thanks so much. Jennifer Milks, CCNP, CWNA, MCP Technical Services Manager Department of Information Technology Auburn School District #408 915 4th Street Northeast Auburn, WA 98002 253-931-4940 (phone) 253-931-8006 Email: jmilks@auburn.wednet.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://hottub.perlfect.com/pipermail/perlfect-search/attachments/20041102/db09b397/attachment.html From daniel.naber at t-online.de Tue Nov 2 19:52:11 2004 From: daniel.naber at t-online.de (Daniel Naber) Date: Tue Nov 2 19:52:19 2004 Subject: [Perlfect-search] Indexing Documents with SSI (Server Side Includes) In-Reply-To: <417A699A.8060103@FiduciaCollection.com> References: <417A699A.8060103@FiduciaCollection.com> Message-ID: <200411022052.11250@danielnaber.de> On Saturday 23 October 2004 16:24, Frank wrote: > 1) How can I use SSI in the search.html and no_match.html templates? > Adding > lines are ignored. > How do I get the results to parse the SSI tags when I select "hilighted > matches" Perlfect Search loads the files from harddisk, so it doesn't see the include's content, only the "include" command. If you want to work around that, set $HTTP_START_URL in conf.pl to crawl your pages via http. Regards Daniel -- http://www.danielnaber.de From daniel.naber at t-online.de Tue Nov 2 19:54:10 2004 From: daniel.naber at t-online.de (Daniel Naber) Date: Tue Nov 2 19:54:11 2004 Subject: [Perlfect-search] Access denied In-Reply-To: <5C2203CBEDCCC14C86629ABD13321CBF07C4961F@mx.email.auburn.wednet.edu> References: <5C2203CBEDCCC14C86629ABD13321CBF07C4961F@mx.email.auburn.wednet.edu> Message-ID: <200411022054.10105@danielnaber.de> On Tuesday 02 November 2004 17:41, Milks, Jennifer wrote: > I am trying to use this search engine, but I am getting "access denied" > when I try to search on anything. Do other CGI scripts work? search.pl must be accessible and executable of course. However, I don't know how exactly that is done in IIS. Regards Daniel -- http://www.danielnaber.de From daniel.naber at t-online.de Tue Nov 2 19:59:15 2004 From: daniel.naber at t-online.de (Daniel Naber) Date: Tue Nov 2 19:59:26 2004 Subject: [Perlfect-search] Two questions - indexing dynamic pages and time display In-Reply-To: <618EC81F242C734AAB9F6F7B1AD3DA383B5283@fw1-ex02.c-b.net> References: <618EC81F242C734AAB9F6F7B1AD3DA383B5283@fw1-ex02.c-b.net> Message-ID: <200411022059.15662@danielnaber.de> On Tuesday 12 October 2004 16:26, Barker, Robert R. wrote: > able to get get these specific pages to come up in any search. Should I > be indexing via http rather than simply running indexer.pl? You need to set $HTTP_START_URL in conf.pl to a start URL so the script crawls your pages via http (instead of loading them from disk). > If so, how > do I set that up when the scripts directory is a couple of levels above > the webroot? You can set @HTTP_LIMIT_URLS to a list of URL which are allowed, i.e. these URLs and everything below can be indexed. > update. I'm guessing that it it generating the date in GMT as it's > several hours off from what our local time is on the server. How can I > change that date to represent the local time? Not sure, the code in search.pl looks like this: localtime((stat($UPDATE_FILE))[9]) This looks okay time me, but you could try changing it (maybe to gmtime or time). The time is just the time of the "update" file in the "data" directory. Regards Daniel -- http://www.danielnaber.de From dave at allpar.com Thu Nov 4 13:54:35 2004 From: dave at allpar.com (David Zatz) Date: Thu Nov 4 13:54:40 2004 Subject: [Perlfect-search] Two questions Message-ID: Two questions - in the log, I keep getting: script not found or unable to stat: /cgi-bin/perlfect/search/1 Also, I'm getting this message - Use of uninitialized value in pattern match (m//) at line 191. - repeated four times every now and then. I'm not sure what the problem is even after looking at the code (seems to be looking for meta tags). Any ideas? From mail at elokron.de Fri Nov 5 11:25:23 2004 From: mail at elokron.de (Stefan A. Hoyer) Date: Fri Nov 5 11:26:35 2004 Subject: [Perlfect-search] new files are indexed but not searchable Message-ID: <418B7133.16232.ECDCEF@localhost> Hello Perlfect user, I use 'perlfect search' for more than one year now. It is a very good tool! Now suddenly there is a problem: new files are being indexed but are not shown in the results. When I reindex the pages, 'perlfect search' says it indexes 57 HTML files. But when I perform a query, 'perlfect search' says it is searching in 53 HTML files. What can I do to make 'perlfect search' search in all my HTML files? It seems, that the files in the directory 'data' are not updated. I use perlfect search 3.31b. Thank you for your help. Stefan From daniel.naber at t-online.de Fri Nov 5 17:50:07 2004 From: daniel.naber at t-online.de (Daniel Naber) Date: Fri Nov 5 17:50:15 2004 Subject: [Perlfect-search] new files are indexed but not searchable In-Reply-To: <418B7133.16232.ECDCEF@localhost> References: <418B7133.16232.ECDCEF@localhost> Message-ID: <200411051850.08076@danielnaber.de> On Friday 05 November 2004 12:25, Stefan A. Hoyer wrote: > When I reindex the pages, 'perlfect > search' says it indexes 57 HTML files What does the output look like? Are there any errors (maybe in the web server log)? You could try to delete all files in "data" manually and then re-index. Regards Daniel -- http://www.danielnaber.de From mail at elokron.de Fri Nov 5 18:56:16 2004 From: mail at elokron.de (Stefan A. Hoyer) Date: Fri Nov 5 18:56:59 2004 Subject: [Perlfect-search] new files are indexed but not searchable In-Reply-To: <200411051850.08076@danielnaber.de> References: <418B7133.16232.ECDCEF@localhost> Message-ID: <418BDAE0.8096.289B840@localhost> Hello Daniel, On 5 Nov 2004 at 18:50, Daniel Naber wrote: > What does the output look like? Are there any errors (maybe in the web > server log)? No, not many errors. No errors seem to come from 'perlfect search'. > You could try to delete all files in "data" manually and then > re-index. I did it ... and the next search ended with a 500 Message (Internal Server Error). A look in the directory showed me, that only the file 'content_tmp' was created new. It was good to have a backup! Any other ideas? Stefan From daniel.naber at t-online.de Fri Nov 5 21:34:00 2004 From: daniel.naber at t-online.de (Daniel Naber) Date: Fri Nov 5 21:34:01 2004 Subject: [Perlfect-search] new files are indexed but not searchable In-Reply-To: <418BDAE0.8096.289B840@localhost> References: <418B7133.16232.ECDCEF@localhost> <418BDAE0.8096.289B840@localhost> Message-ID: <200411052234.00973@danielnaber.de> On Friday 05 November 2004 19:56, Stefan A. Hoyer wrote: > I did it ... and the next search ended with a 500 Message (Internal > Server Error). You need to look in the server's error log to see what's the cause of this error. Regards Daniel -- http://www.danielnaber.de From stefan.glaesser at gmx.de Tue Nov 16 17:53:54 2004 From: stefan.glaesser at gmx.de (=?iso-8859-1?Q?Stefan_Gl=E4=DFer?=) Date: Tue Nov 16 17:54:08 2004 Subject: [Perlfect-search] How to index PDF-Files? Message-ID: <200411161754.iAGHs66b021961@hottub.perlfect.com> Hi, i would like to index some pdf-documents via http. Is there a chance to do it? What Options should i use? Thanks for your support. kind regards, Stefan From daniel.naber at t-online.de Tue Nov 16 22:48:48 2004 From: daniel.naber at t-online.de (Daniel Naber) Date: Tue Nov 16 22:48:22 2004 Subject: [Perlfect-search] How to index PDF-Files? In-Reply-To: <200411161754.iAGHs66b021961@hottub.perlfect.com> References: <200411161754.iAGHs66b021961@hottub.perlfect.com> Message-ID: <200411162348.48998@danielnaber.de> On Tuesday 16 November 2004 18:53, Stefan Gl??er wrote: > i would like to index some pdf-documents via http. > Is there a chance to do it? What Options should i use? Install pdftotext (part of xpdf), and set %EXT_FILTER and @HTTP_CONTENT_TYPES accordingly. Regards Daniel -- http://www.danielnaber.de From mail at elokron.de Wed Nov 17 15:49:26 2004 From: mail at elokron.de (Stefan A. Hoyer) Date: Wed Nov 17 15:50:07 2004 Subject: [Perlfect-search] new files are indexed but not searchable In-Reply-To: <200411052234.00973@danielnaber.de> References: <418BDAE0.8096.289B840@localhost> Message-ID: <419B8116.5926.15B3FF9@localhost> Hello Daniel, On 5 Nov 2004 at 22:34, Daniel Naber wrote: > > I did it ... and the next search ended with a 500 Message (Internal > > Server Error). > > You need to look in the server's error log to see what's the cause of this > error. I had to wait for the log file. Now I read it. There was no specific error message. This is the complete line: pd9052657.dip.t-dialin.net - - [05/Nov/2004:12:06:14 +0100] "GET /suche/search.pl?p=1&lang=de&include=&exclude=&penalty=&mode=a ny&q=dicke HTTP/1.1" 500 - "http://www.wertvoll- medien.de/suche.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Alexa Toolbar)" As far as I understand the problem perlfect search does not write new files in the 'data' directory anymore. Therefore there was a 500- message because there were no new files generated. What can I do to make perlfect search generate new files in the 'data' directory during the reindex? Greetings Stefan From daniel.naber at t-online.de Wed Nov 17 20:21:54 2004 From: daniel.naber at t-online.de (Daniel Naber) Date: Wed Nov 17 20:21:38 2004 Subject: [Perlfect-search] new files are indexed but not searchable In-Reply-To: <419B8116.5926.15B3FF9@localhost> References: <418BDAE0.8096.289B840@localhost> <419B8116.5926.15B3FF9@localhost> Message-ID: <200411172121.55353@danielnaber.de> On Wednesday 17 November 2004 16:49, Stefan A. Hoyer wrote: > pd9052657.dip.t-dialin.net - - [05/Nov/2004:12:06:14 +0100] "GET > /suche/search.pl?p=1&lang=de&include=&exclude=&penalty=&mode=a > ny&q=dicke HTTP/1.1" 500 - "http://www.wertvoll- > medien.de/suche.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows > 98; Alexa Toolbar)" That looks like the access log, not like the error log. > What can I do to make perlfect search generate new files in the 'data' > directory during the reindex? Delete all the files in "data" and make sure everybody has write access to that directory. Regards Daniel -- http://www.danielnaber.de From mail at elokron.de Wed Nov 17 21:07:15 2004 From: mail at elokron.de (Stefan A. Hoyer) Date: Wed Nov 17 21:07:57 2004 Subject: [Perlfect-search] new files are indexed but not searchable In-Reply-To: <200411172121.55353@danielnaber.de> References: <419B8116.5926.15B3FF9@localhost> Message-ID: <419BCB93.6931.DEE1AC@localhost> On 17 Nov 2004 at 21:21, Daniel Naber wrote: > That looks like the access log, not like the error log. Ough, yes, you are right. Sorry, at this time I have no access to the error log. > > What can I do to make perlfect search generate new files in the 'data' > > directory during the reindex? > > Delete all the files in "data" and make sure everybody has write access to > that directory. I did this two things ... and the next search after reindexing ended again with a internal server error. After that I reinstalled 'perlfect search' => same error. And after that I installed it in a new directory => same error again. Regards Stefan From mail at elokron.de Fri Nov 19 18:38:06 2004 From: mail at elokron.de (Stefan A. Hoyer) Date: Fri Nov 19 18:38:44 2004 Subject: [Perlfect-search] new files are indexed but not searchable Message-ID: <419E4B9E.21591.30A6F3@localhost> On 17 Nov 2004 at 22:07, I wrote: > (...) same error again. Now my hosting provider found an incompatibility of 'Perlfect Search' with the new server configuration. They made change ... and 'perlfect search' does a good job again. Daniel, thank you for your Help! Greetings, Stefan -- Stefan A. Hoyer Internet-Dienstleistungen Website-Optimierung f?r Menschen und Suchmaschinen http://elokron.de/ From mailinglists at futureware.at Sun Nov 21 23:57:37 2004 From: mailinglists at futureware.at (Philipp =?iso-8859-1?q?G=FChring?=) Date: Sun Nov 21 23:57:55 2004 Subject: [Perlfect-search] Scalability Message-ID: <200411220057.44714.mailinglists@futureware.at> Hi, Perlfect Search is really nice. But for only 2000 documents, I am using grep. Can we get Perlfect Search to handle 200.000.000 documents? Nov 21 16:06:46 linux3 kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Nov 21 16:06:47 linux3 kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Nov 21 16:06:47 linux3 kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Nov 21 16:06:47 linux3 kernel: VM: killing process perl :-( What is causing the memory-consumption here? Are the database-tied hashes using so much memory? Or is the problem in the indexer itself? ( I already have $LOW_MEMORY_INDEX = 1; ) By the way, we will soon have it finally integrated on http://www.quintessenz.at/ , which is needing about 50.000 documents. Many greetings, Philipp G?hring -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://hottub.perlfect.com/pipermail/perlfect-search/attachments/20041122/df2dc611/attachment.bin From daniel.naber at t-online.de Mon Nov 22 08:36:55 2004 From: daniel.naber at t-online.de (Daniel Naber) Date: Mon Nov 22 08:36:30 2004 Subject: [Perlfect-search] Scalability In-Reply-To: <200411220057.44714.mailinglists@futureware.at> References: <200411220057.44714.mailinglists@futureware.at> Message-ID: <200411220936.56024@danielnaber.de> On Monday 22 November 2004 00:57, Philipp G?hring wrote: > Can we get Perlfect Search to handle 200.000.000 documents? No, it won't scale well enough. Besides that, Perlfect Search doesn't support incremental indexing, i.e. you would need to re-index everything if only a single document changes. I suggest you try Lucene which scales much better. However, you cannot search 200 million documents on a single machine with acceptable speed, you'll need to distribute the index on several machines (unless your documents are *very* small, e.g. < 1KB). > What is causing the memory-consumption here? > Are the database-tied hashes using so much memory? Yes, they are not optimized for fulltext indexing. > By the way, we will soon have it finally integrated on > http://www.quintessenz.at/ , which is needing about 50.000 documents. Also note that there's a limit at about 64,000 documents in Perlfect Search (but that can be removed). Regards Daniel -- http://www.danielnaber.de From mailinglists at futureware.at Tue Nov 23 11:15:48 2004 From: mailinglists at futureware.at (Philipp =?iso-8859-1?q?G=FChring?=) Date: Tue Nov 23 11:15:59 2004 Subject: [Perlfect-search] MemoryLeak, DB_Symlink Message-ID: <200411231215.53909.mailinglists@futureware.at> Skipped content of type multipart/mixed-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://hottub.perlfect.com/pipermail/perlfect-search/attachments/20041123/c40b371c/attachment.bin From stefan.glaesser at gmx.de Wed Nov 24 17:53:38 2004 From: stefan.glaesser at gmx.de (=?iso-8859-1?Q?Stefan_Gl=E4=DFer?=) Date: Wed Nov 24 17:53:53 2004 Subject: [Perlfect-search] Limitations? Message-ID: <200411241753.iAOHroM6022368@hottub.perlfect.com> Hi, I would like to index appr. 3500 pages with an average size of 70KB. Is it possible for perlfect-search to do it or should I look for some other software? Greets, Stefan From daniel.naber at t-online.de Wed Nov 24 19:14:24 2004 From: daniel.naber at t-online.de (Daniel Naber) Date: Wed Nov 24 19:13:57 2004 Subject: [Perlfect-search] Limitations? In-Reply-To: <200411241753.iAOHroM6022368@hottub.perlfect.com> References: <200411241753.iAOHroM6022368@hottub.perlfect.com> Message-ID: <200411242014.25026@danielnaber.de> On Wednesday 24 November 2004 18:53, Stefan Gl??er wrote: > I would like to index appr. 3500 pages with an average > size of 70KB. That should be possible. Regards Daniel -- http://www.danielnaber.de