Perlfect Solutions
 

[Perlfect-search] Highlighting pages with no extension

Tom Sherman perlfect-search@perlfect.com
Sun, 01 Jun 2003 0:59:34 CDT
Hello,

Hopefully I'm not re-treading old ground here.  I read through the FAQ
and about a year's worth of the archive and didn't see this issue covered.

Here is my situation:

1. I have a dynamic (SSI-driven) site, so it has to be spidered from the
Web, not the file system
2. I want to have highlighting
3. I want to be able to highlight index pages with trailing slashes, e.g. 
http://site.com/subdirectory/

Highlighting works fine for http://site.com/file.shtml, because Perlfect
sees the .shtml extension, which I've defined as an HTML file type.  But
it doesn't know what to do with http://site.com/subdirectory/, because
the URL ends in a trailing slash.  The *true* file name is
http://site.com/subdirectory/index.shtml, but it's not seeing that.

My hack workaround:

What I've done to circumvent this limitation is that in search.pl, I've
deleted all of the isHTML() checks.  Since I'm only indexing HTML and
SHTML files anyway, it's superfluous.  The only files indexed are HTML,
and they are therefore capable of being highlighted.  I'm not bothering
to index TXT files.

However, this is just a hack.  It seems like instead of examining the
extension of a file, it would be better to look at the Content-Type in
the HTTP header and when the page is indexed, write that file type to a
database.  This is just an idea.  But it would eliminate the problem of
trailing slashes and the cute things people can do with Apache.

Anyway, thoughts are appreciated.  Perlfect is great.

--tom

[tom sherman | tsherman@northwestern.edu]