Perlfect Solutions
 

[Perlfect-search] Dynamic Page (PHP pages) Indexing Problems

Michael Borck perlfect-search@perlfect.com
Thu, 26 Feb 2004 12:14:33 +0800
Hi Daniel,

Thanks for the quick response. I have tried your suggestions but still 
have problems with search.cgi open the data files.  I have provided 
more detail below.


> that page does not only use https (of which I'm not sure if the module 
> we
> use supports it), but also asks for a password here. So it can't be
> indexed automatically. About the permission error when indexing on the

I forgot to mention that machine with the domain are not prompted for a 
password, but I guess the protocol is sufficiently different that it is 
causing the error.  I tried to install Crypt::SSLeary  (which is 
apparent SSL glue) automatically invoke/loaded by LWP but it didn't 
work for me (and I have reached my limit of knowledge).  So I am back 
to solving the file system "cannot open" problem.



> command line: 755 is okay, but who is the owner of the files? Probably 
> not
> you, because the owner was set to someone else when you indexed via
> index_form.html. So I suggest to delete all files in the "data" 
> directory
> on the command line and try again (from the command line).


Prior to index via the filesystem I deleted all the files in data.  I 
then run the indexer, and manually chmod 755 all the file.  I have even 
tried chmod 777.  Still no luck.  I get the "Cannot open error".  I 
should also mention the when I index via the file system it will index 
the site (a huge 293 files).

The funny thing is that when I index via index_form.html, the file 
permissions are 755 and  owned by "nobody".  The search.cgi will 
procced and find nothing because it index 0 pages (because we use the 
https:).

As mentioned in previous posts, after indexing via the filesystem, I 
manually chown the files to nobody, and chmod them all to 755 (even 
tried 777) but still no luck.




> BTW, depending on how you use PHP it might make sense to index the
> filesystem and "comment out" the PHP code with the snippets configured 
> in
> $IGNORE_TEXT_START and $IGNORE_TEXT_END in conf.pl.

I would prefer to index via the file system. The dynamic PHP content is 
not critical to search.  I have "commented out" potentially sensitive 
information and added entries to conf/no_index.txt.  I was only looking 
at doing this via cgi because of the problems I was having with opening 
the data/inv_index file.

I have spoken to the techos who are have assured me that the path 
exists on the server.  I have provided them the details of the script 
and error and they have promised to investigate.

I can't help but feel that it is a configuration error on my part as I 
had perlfect working out of the box and moved most of my pages over to 
php.  I cannot see how changing to php would cause any errors.  But to 
be safe/cautions I deleted the Perlfect install and did a clean 
install.  Still no luck.

If you can think of anything that might help or need more info please 
let me know.  I will continue to investigate my end.

Thanks again for your help.

Michael.
--