Perlfect Solutions
 

[Perlfect-search] [PATCH] Web indexing of framed sites

Vlad Romanenko perlfect-search@perlfect.com
Fri, 27 Sep 2002 17:05:51 +0300
This is a multi-part message in MIME format.

------=_NextPart_000_0033_01C26648.22086850
Content-Type: text/plain;
        charset="US-ASCII"
Content-Transfer-Encoding: 7bit

Hello!

First of all I'd like to thank the authors of this great product!
I found it very usefull for indexing my site.

But I had a lot of troubles figuring out why it doesn't index my
frame-based site using web indexer.
At least somewhere in FAQ it was mentioned that the tool ins't capable
of indexing framed sites and you need to write <noframes> section for
them.
I found this very inconvinient and thought it would be good if web
indexing script just parses <frame src=""> tags as well as regular
links.
So I fixed that (this is two-lines fix, look at the attachment).

When wanted to contribute to the project, I found that there is no
possibility to download latest source at a whole, only via http cvs
access. Is it so? Unfortunatly it is very inconvinient.
I had also problems accessing
http://cvs.perlfect.com/cgi-bin/cvsweb.cgi/search/ because
cvs.perlfect.com could not be found. So I tried to change cvs in server
name with www and it worked for me.

I'd like to do some more contributions in the nearest time and would be
glad to hear if there is a possibility to address cvs problems.

Regards,
Vlad Romanenko.

------=_NextPart_000_0033_01C26648.22086850
Content-Type: application/octet-stream;
        name="indexer_web.pl.diff"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
        filename="indexer_web.pl.diff"

--- indexer_web.pl      Thu Sep 26 17:33:26 2002=0A=
+++ /tmp/search-3.30/indexer_web.pl     Thu Mar  7 00:13:59 2002=0A=
@@ -97,8 +97,6 @@=0A=
                        content\s*=3D\s*["'][0-9]+;\s*URL\s*=3D\s*(.*?)['"]=0A=
                        |=0A=
                        href\s*=3D\s*["'](.*?)['"]=0A=
-                       |=0A=
-                       frame[^>]+src\s*=3D\s*["'](.*?)['"]=0A=
                        /gisx ) {=0A=
                my $new_url =3D $+;=0A=
                # &amp; in a link to distinguish arguments is actually correct, but =
we have to=0A=

------=_NextPart_000_0033_01C26648.22086850--