Perlfect Solutions
 

[Perlfect-search] ranking long vs. short documents

Deniz Sarikaya deniz@zeroknowledge.com
Fri, 18 Aug 2000 12:15:17 -0400
giorgos wrote:
> 
> the bottom line is that: although a shorter document may be more concise
> is respect to the keyword searched for, it may in the end contain
> exactly the same information since we are assuming that both the long
> and the short document have exacty the same number of occurences of the
> keyword(s).
> 
> in my opinion, the easiest solution to this problem is, not to tamper
> with the weight calculation but instead simple display the document size
> next to each result and then the user may pick the they prefer.

In my opinion, the bottom line is that context is king. My ideal
solution would be to quote the lines in which the keyword(s) appear
along with each search hit. I do not know how much of a performance/size
hit this would entail, nor how much extra programming it would entail. I
guess we'd store line numbers along with documents in the hashes?

Maybe it's an option which could be enabled in the conf.pl, and turned
off by default.

Also, is there any way we could enable the user specifying boolean AND
and OR? I know the engine defaults to OR, but shouldn't the person be
able to stick an AND in to manually override it? I know we can achieve
the same effect using +, so AND shouldn't be that hard to stick in.

-- 
Deniz Sarikaya, DevDoctor.
deniz@zeroknowledge.com

Zero-Knowledge Systems, Inc.
http://jobs.zeroknowledge.com/