Perlfect Solutions
 

[Perlfect-search] IGNORE TEXT not working

Jon Ohmann perlfect-search@perlfect.com
Thu, 30 Jan 2003 10:04:22 -0500
This is a multi-part message in MIME format.

------=_NextPart_000_006F_01C2C846.FA7DCE90
Content-Type: text/plain;
        charset="us-ascii"
Content-Transfer-Encoding: 7bit

My site consists of dynamically created pages (using JSP).  These pages
are all coming through a central shell page that embeds the content
appropriately.  We have used the
<!--ignore_perlfect_search--><!--/ignore_perlfect_search--> extensively
to keep the same URL links from being indexed, but they do not seem to
work.  Links within these tags (as well as HTML comment tags) are
crawled by the indexer.  (we are initiating the indexer via HTTP).  The
issue here is that it ends up nesting too deep for the indexer to
complete all pages before running into the HTTP_MAX_PAGES limit
(currently set to 800).  
 
To make it clear...the inital page (index2.1.jsp) has a link to dynamic
pages (JSPID=12345 and 54321).  To display these pages, the URL is:
index2.1.jsp?JSPID=12345 or index2.1.jsp?JSPID=84321.  These pages in
turn have dynamic links as well.  The indexer crawls to index2.1.jsp,
then to the first dynamic document which is really index2.1.jsp with new
links.  It then crawls to these 'sub links' and so on and so on. Soon we
are many levels deep when in reality, I don't want it to continue
repeating the same initial page links (these are carried through to the
next page)
 
We have these repeating links in the ignore text, but the indexer seems
to disregard the tag and crawl anyway.
 
What are we doing wrong?

------=_NextPart_000_006F_01C2C846.FA7DCE90
Content-Type: text/html;
        charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">
<TITLE>Message</TITLE>

<META content=3D"MSHTML 6.00.2800.1126" name=3DGENERATOR></HEAD>
<BODY>
<DIV><SPAN class=3D575562222-28012003><FONT face=3DArial size=3D2>My =
site consists of=20
dynamically created pages (using JSP).&nbsp; These pages are all coming =
through=20
a central shell page that embeds the content appropriately.&nbsp; We =
have used=20
the =
&lt;!--ignore_perlfect_search--&gt;&lt;!--/ignore_perlfect_search--&gt;=20
extensively to keep the same URL links from being indexed, but they do =
not seem=20
to work.&nbsp; Links within these tags (as well as HTML comment tags) =
are=20
crawled by the indexer.&nbsp; (we are initiating the indexer via =
HTTP).&nbsp;=20
The issue here is that it ends up nesting too deep for the indexer to =
complete=20
all pages before running into the HTTP_MAX_PAGES limit (currently set to =

800).&nbsp; </FONT></SPAN></DIV>
<DIV><SPAN class=3D575562222-28012003><FONT face=3DArial=20
size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D575562222-28012003><FONT face=3DArial size=3D2>To =
make it=20
clear...the inital page (index2.1.jsp) has a link to&nbsp;dynamic pages=20
(JSPID=3D12345 and 54321).&nbsp; To display these pages, the URL is:=20
index2.1.jsp?JSPID=3D12345 or index2.1.jsp?JSPID=3D84321.&nbsp; These =
pages in turn=20
have dynamic links as well.&nbsp; The indexer crawls to index2.1.jsp, =
then to=20
the first dynamic document which is really index2.1.jsp with new =
links.&nbsp; It=20
then crawls to these 'sub links' and so on and so on.&nbsp;Soon we are =
many=20
levels deep when in reality, I don't want it to continue repeating the =
same=20
initial page links (these are carried through to the next=20
page)</FONT></SPAN></DIV>
<DIV><SPAN class=3D575562222-28012003><FONT face=3DArial=20
size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D575562222-28012003><FONT face=3DArial size=3D2>We =
have these=20
repeating links in the ignore text, but the indexer seems to disregard =
the tag=20
and crawl anyway.</FONT></SPAN></DIV>
<DIV><SPAN class=3D575562222-28012003></SPAN><SPAN =
class=3D575562222-28012003><FONT=20
face=3DArial size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D575562222-28012003><FONT face=3DArial size=3D2>What =
are we doing=20
wrong?</FONT></SPAN></DIV></BODY></HTML>

------=_NextPart_000_006F_01C2C846.FA7DCE90--