|
|
[Perlfect-search] ranking long vs. short documents
Daniel Naber dnaber@mini.gt.owl.de
Sun, 13 Aug 2000 18:00:59 +0200
Hi,
currently the score of a match is influenced only by the position and the
number of occurences of a term. Shouldn't the length of the document also
play a role? If a word occurs twice in a short document, isn't that more
relevant than twice in a very long document?
Something like this:
$faktor = ($tdf{$doc_id}/$size+0.5);
$weight = $tdf{$doc_id} * $faktor * log($DN / $df);
(0.5 ist just some trial'n'error value)
Regards
Daniel
|
|