|
|
[Perlfect-search] advanced exclude?!
will will@spanner.org
Fri, 8 Jun 2001 19:32:55 +0100
>On Thursday 07 June 2001 09:56, you wrote:
>
>> Is there a way to do indexing once including this dir in data base to be
>> searched and then just reindexing the rest of the site every night?
>
>No. But there's a certain chance that version 3.21 will have faster
>indexing for large collections.
that's because you operate on the whole document collection to
calculate the weights?
it occurs to me that if you didn't unlink $TF_DB_FILE at the end of
the run, you could keep it as a state file. then when someone
requests an incremental index - a new directory or the reindexing of
a frequently-updated part of their site - you could start by
preloading %tf_db with the values from last time the indexer ran. The
difference would be invisible to crawl_whatever(), i think: it would
just append the data from the new files and then you could
recalculate the weightings for everything.
i'm trying to think of disadvantages. Writing the tf data file would
be a pain if the low memory flag wasn't set, and I guess it means
indexer.pl would have to take parameters on the command line. you'd
probably also need a threshold beyond which a full reindex was
strongly suggested, like fsck. and i guess the file could be rather
large?
but i think it would work transparently for existing files: there
would be two (doc_id => incidence) pairs for that file in %tf_db, but
the older one would be overwritten before weighting, where you unpack
%tf_db{$term_id}. that doesn't help for the tied file, of course.
i'm just avoiding work here. I hope you don't mind the impertinent suggestions.
will
ps. is this the right place for this sort of note?
--
pgpkey: http://www.spanner.org/keys/will.txt
|
|