Perlfect Solutions
 

[Perlfect-search] Indexing with http

Cesareo, Craig perlfect-search@perlfect.com
Mon, 21 Oct 2002 17:47:10 -0500
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_001_01C27953.CA81B5E0
Content-Type: text/plain;
        charset="iso-8859-1"

Hi all,
 
Another question about indexing my web site:
 
I run the indexer locally on the server (static html pages), but am testing
to see if I can index some of the dynamic content on my site via the http
option. I have a content area where the html pages are produced by a custom
cgi application. For example, the url used to pull up any of the dynamic
pages looks something like this:
 
http://www.mysite.com/cgi-bin/dynamic.exe?func=1123doc=88
<http://www.mysite.com/cgi-bin/dynamic.exe?func=1123doc=88> 
(this is just a fake example)
 
But there are no direct links to these dynamic page urls on my site
anywhere. So, I created a basic html page that contains nothing but
hyperlinks to the exact urls of the dynamic pages. I setup the http
start_url to be the location of this html page I created on the server. I
thought that the index process would hit this html page, and crawl out to
the urls that are in the page and index each of them. But it is not doing
this.
 
Am I missunderstanding how the http indexing works? Is there a problem
because I can't set the filename extension for documents that should be
indexed in the conf.pl file - because the urls to the dynamic pages on my
site do not end in something like ".php" or ".asp"? Or anyone have any tips
as far as what I may be doing wrong?
 
Thanks very much!!
 
Craig 

------_=_NextPart_001_01C27953.CA81B5E0
Content-Type: text/html;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">


<META content=3D"MSHTML 6.00.2800.1106" name=3DGENERATOR></HEAD>
<BODY>
<DIV><SPAN class=3D787332622-21102002><FONT face=3DArial size=3D2>Hi=20
all,</FONT></SPAN></DIV>
<DIV><SPAN class=3D787332622-21102002><FONT face=3DArial=20
size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D787332622-21102002><FONT face=3DArial =
size=3D2>Another=20
question&nbsp;about indexing my web site:</FONT></SPAN></DIV>
<DIV><SPAN class=3D787332622-21102002><FONT face=3DArial=20
size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D787332622-21102002><FONT face=3DArial size=3D2>I run =
the indexer=20
locally on the server (static html pages), but am testing to see if I =
can index=20
some of the dynamic content on my site via the http option. I have a =
content=20
area where the html pages are produced by a custom cgi application. For =
example,=20
the url used to pull up any of the dynamic pages looks something like=20
this:</FONT></SPAN></DIV>
<DIV><SPAN class=3D787332622-21102002><FONT face=3DArial=20
size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D787332622-21102002><FONT face=3DArial size=3D2><A=20
href=3D"http://www.mysite.com/cgi-bin/dynamic.exe?func=3D1123doc=3D88">h=
ttp://www.mysite.com/cgi-bin/dynamic.exe?func=3D1123doc=3D88</A></FONT><=
/SPAN></DIV>
<DIV><SPAN class=3D787332622-21102002><FONT face=3DArial size=3D2>(this =
is just a fake=20
example)</FONT></SPAN></DIV>
<DIV><SPAN class=3D787332622-21102002><FONT face=3DArial=20
size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D787332622-21102002><FONT face=3DArial size=3D2>But =
there are no=20
direct links to these dynamic page urls on my site anywhere. So, I =
created a=20
basic html page that contains nothing but hyperlinks to the exact urls =
of the=20
dynamic pages. I setup the http start_url to be the location of this =
html page I=20
created on the server. I thought that the index process would hit this =
html=20
page, and crawl out to the urls that are in&nbsp;the page and index =
each of=20
them. But it is not doing this.</FONT></SPAN></DIV>
<DIV><SPAN class=3D787332622-21102002><FONT face=3DArial=20
size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D787332622-21102002><FONT face=3DArial size=3D2>Am I=20
missunderstanding how the http indexing works? Is there a problem =
because I=20
can't set the filename extension for documents that should be indexed =
in the=20
conf.pl file - because the urls to the dynamic pages on my site do not =
end in=20
something like ".php" or ".asp"? Or anyone have any tips as far as what =
I may be=20
doing wrong?</FONT></SPAN></DIV>
<DIV><SPAN class=3D787332622-21102002><FONT face=3DArial=20
size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D787332622-21102002><FONT face=3DArial =
size=3D2>Thanks very=20
much!!</FONT></SPAN></DIV>
<DIV><SPAN class=3D787332622-21102002></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D787332622-21102002><FONT face=3DArial=20
size=3D2>Craig</FONT>&nbsp;</SPAN></DIV></BODY></HTML>

------_=_NextPart_001_01C27953.CA81B5E0--