Perlfect Solutions
 

[Perlfect-search] data dump script (and xml)

Rob Stevenson rstevenson@accesscable.net
Mon, 7 May 2001 08:45:14 -0300
On Sun, May 6, 2001 will said...

>>On Friday 27 April 2001 01:12, you wrote:
>>
>>>  the way perlfect works is really very good for indexing xml files in
>>>   a simple way: all you have to do is change the parsing for title and
>>>   description and body to the relevant xml fields and you get a nice,
>>>   if limited, low-overhead free-text search of your xml data. no expat,
>>>   no nasty anything. very impressed. is that something that should be
>>>   developed? might be able to help, if so.
>>
>>That sounds very useful. It looks like all we need is two configuration
>>options "TITLE" and "BODY" (maybe "DESCRIPTION", too), which default to
>>"title" and "body". If you have a patch already, please send it. If not I
>>will add this to my version 3.21 TODO list.
>
>i don't have a patch but i'd be happy to (learn how to) make one.
>
>it's an interesting possibility, i agree: the strength of your 
>indexing would combine well with the added structure of xml, and it's 
>very simple to teach the indexer to read in data according to xml 
>rather than html conventions. the best, and most distinctive, thing 
>about it for me is that it allows html and xml (and pdf) to sit side 
>by side in the same index and come under the same weighting and 
>searching system without ever really caring about the data format.
>
>there are some issues to consider, though:

I believe what you're aiming for is a search engine which could be
considered to a be part of "The Semantic Web". Have a look at...

http://www.sciam.com/2001/0501issue/0501berners-lee.html

No need to reinvent this particular wheel. Just look up Dublin Core and
RDF in the many sites that cover this field. You could start at the site
I maintain, below. (It uses perlfect search of course.)

Rob

---------------------------------
Rob Stevenson - CIMI web manager
http://www.cimi.org
email: web@cimi.org
---------------------------------