[Perlfect-search] Can Perlfect support Chinese/Big5 ??

Daniel Naber
Thu, 14 Sep 2000 23:37:15 +0200
On Don, 14 Sep 2000, you wrote:

> Yes,the Chinese/Big5 charatcter set uses two bytes ( 16 bits )to encode.
> It's encoding range from A140 to F9FE and 8180 to FEA0 , total 23,940
> words.

You can try to remove this line in both and

$buffer =~ tr/a-zA-Z0-9_/ /cs;

Even if that works, it's no good solution, but I have no idea how to test 
pages with these characters, my browser doesn't even show them (but it 
correctly shows pages with charset=gb2312).