Perlfect Solutions

[Perlfect-search] Similar content

Daniel Naber
Wed, 11 Apr 2001 22:53.35 +0200
On Wednesday 11 April 2001 17:25, you wrote:

>  I'm interested in being able to "find similar pages" as is mentioned
> on the perlfect development page - not necessarily only in the
> search results but as an integrated part of the web page for each
> document

Giorgos Zervas has written both a paper and an implementation of a fast 
clustering algorithm. I don't know if that paper was published, but if you 
are interested in it, I think he might send it to you.

The algorithm does only the clustering, everything else would need to be 
done by a wrapper or so.

It's important to understand how powerful this is: the computer can only 
look at the words of a document, it will just suggest documents with many 
similar words. This is worse than a manually edited "list of links" in 
most cases.

> Would also be interested to use a RDBMS backend (like MySQL)
> rather than Berkeley

That's rater simple. The easy way is to look for uses of the tied hashes 
(%inv_index_db etc). If the values of these hashes are read, use SELECT, 
if they are written use INSERT or UPDATE. This would not make use of the 
relational database features that are beyond Berkeley DB. To improve the 
data structure one can look for uses of pack() that translate into 
relations more complex than just key/value pairs.

Let me know if you are interested in more information.


Daniel Naber, Paul-Gerhardt-Str. 2, 33332 Guetersloh, Germany
Tel. 05241-59371, Mobil 0170-4819674