Perlfect Solutions
 

AW: [Perlfect-search] Maximum number of files that can be indexed

Mayer Richard Richard.Mayer@micronas.com
Fri, 12 Apr 2002 17:33:49 +0200
hmm, sorry, but I think I don�t really get this. I don�t understand any
PERL, so could you please point me a little bit more into the right
direction.

__________
I think you can easily fix this, only by changing the template used by
un/pack
in search.pl and indexer.pl. The 'S'. Please correct me if I am wrong.
----------

What do you mean with "change" the template. Into what?
I guess I understand that some kind of data type has to be changed (guess it
would be from int to long in C),
but I can�t even find where the data types are defined in the perl-code.
Sorry to post those stupid questions.I should spend some time with learning
PERL first.
I have also already read in the forums that it doesn�t really make sense to
use more than 65535 for indexing with "Search", but the number of the
htm-files is only slightly higher than that (about 75000), so it would
really be nice to get this working.


As far as I analyzed indexer.pl there are 2 parts which seem important to
me:

$weight = 65535 if ( $weight > 65535 );      # we're limited to 16 bit
      $weights .= pack("SS", $doc_id, $weight);

and

if( $DN >= 65535 ) {
    die "Error: Indexing more than 65534 documents is not supported";

I guess I�ll have to change these ones, too, right?


Thanks for your time


Rick


-----Urspr�ngliche Nachricht-----
Von: John [mailto:jotov@start.no]
Gesendet: Freitag, 12. April 2002 16:02
An: Richard.Mayer@micronas.com
Betreff: Re: [Perlfect-search] Maximum number of files that can be
indexed


Richard,

I think the limitation is because of the structure of the lookup hash (the
inverse index). Each term points to a list of the documents where that term
occurs. It is only set aside 16 bits for each document number in the list,
which
gives a maximum of 65536 documents. 

Example from search.pl:
my %v = unpack('S*', $inv_index_db{$term_id});

I think you can easily fix this, only by changing the template used by
un/pack
in search.pl and indexer.pl. The 'S'. Please correct me if I am wrong.

John


Fri, 12 Apr 2002 10:46:38 +0200 Mayer Richard <Richard.Mayer@micronas.com>
wrote:

>I use Search 3.30 and would like to index a site with more that 65535
>htm-files.
>In the FAQ it is mentioned that the maximum number of files is 65635, but
>could be increased using a "workaround".
>Unfortunately this "workaround" does not seem to be explained in the FAQ�s.
>Can anybody tell me what I have to do, if I want to index sites with more
>that 65535 files?



------------------------------------------------------------
F� din egen @start.no-adresse gratis p� http://www.start.no/