ranking long vs. short documents

Daniel Naber
Sun, 13 Aug 2000 18:00:59 +0200

currently the score of a match is influenced only by the position and the 
number of occurences of a term. Shouldn't the length of the document also 
play a role? If a word occurs twice in a short document, isn't that more 
relevant than twice in a very long document?

Something like this:
      $faktor = ($tdf{$doc_id}/$size+0.5);
      $weight = $tdf{$doc_id} * $faktor * log($DN / $df);

(0.5 ist just some trial'n'error value)