|
|
[Perlfect-search] Similar content
Daniel Naber daniel.naber@t-online.de
Wed, 11 Apr 2001 22:53.35 +0200
On Wednesday 11 April 2001 17:25, you wrote:
> I'm interested in being able to "find similar pages" as is mentioned
> on the perlfect development page - not necessarily only in the
> search results but as an integrated part of the web page for each
> document
Giorgos Zervas has written both a paper and an implementation of a fast
clustering algorithm. I don't know if that paper was published, but if you
are interested in it, I think he might send it to you.
The algorithm does only the clustering, everything else would need to be
done by a wrapper or so.
It's important to understand how powerful this is: the computer can only
look at the words of a document, it will just suggest documents with many
similar words. This is worse than a manually edited "list of links" in
most cases.
> Would also be interested to use a RDBMS backend (like MySQL)
> rather than Berkeley
That's rater simple. The easy way is to look for uses of the tied hashes
(%inv_index_db etc). If the values of these hashes are read, use SELECT,
if they are written use INSERT or UPDATE. This would not make use of the
relational database features that are beyond Berkeley DB. To improve the
data structure one can look for uses of pack() that translate into
relations more complex than just key/value pairs.
Let me know if you are interested in more information.
Regards
Daniel
--
Daniel Naber, Paul-Gerhardt-Str. 2, 33332 Guetersloh, Germany
Tel. 05241-59371, Mobil 0170-4819674
|
|