Perlfect Solutions
 

[Perlfect-search] Can't index PDFs

Finnis, John A perlfect-search@perlfect.com
Fri, 30 Jan 2004 16:18:15 -0000
This is a multi-part message in MIME format.

------_=_NextPart_001_01C3E74C.A9F7B475
Content-Type: text/plain;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Am testing Perlfect locally. Indexer works fine with HTMLs, but will not =
index pdfs, giving the following errors.

Conf.pl as follows:

$DOCUMENT_ROOT =3D 'c:\index';

@EXT =3D ("htm","html","shtml","asp","txt","pdf");

"pdf" =3D> "c:\xpdf\pdftotext FILENAME -",

Error messages:

    1: c:\index/cintro.html (0.52 KB)
Ignoring 'c:\index/cintro.pdf': illegal characters in filename
    2: c:\index/cintro.pdf (7.75 KB)
    3: c:\index/cintros.html (4.83 KB)
    4: c:\index/cintro_ind.html (0.25 KB)
    5: c:\index/Course Introduction.htm (11.37 KB)
    6: c:\index/index.html (1.54 KB)
Ignoring 'c:\index/mod02.pdf': illegal characters in filename
    7: c:\index/mod02.pdf (142.70 KB)
Ignoring 'c:\index/mod03.pdf': illegal characters in filename
    8: c:\index/mod03.pdf (77.88 KB)
Ignoring 'c:\index/mod04.pdf': illegal characters in filename
    9: c:\index/mod04.pdf (62.75 KB)
Ignoring 'c:\index/mod05.pdf': illegal characters in filename
    10: c:\index/mod05.pdf (108.41 KB)
Ignoring 'c:\index/mod06.pdf': illegal characters in filename
    11: c:\index/mod06.pdf (100.29 KB)
Ignoring 'c:\index/mod07.pdf': illegal characters in filename
    12: c:\index/mod07.pdf (110.09 KB)
Ignoring 'c:\index/mod08.pdf': illegal characters in filename
    13: c:\index/mod08.pdf (88.88 KB)

Grateful for any assistance


------_=_NextPart_001_01C3E74C.A9F7B475
Content-Type: text/html;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.0.6521.0">
<TITLE>Can't index PDFs</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/rtf format -->

<P><FONT SIZE=3D2 FACE=3D"Arial">Am testing Perlfect locally. Indexer =
works fine with HTMLs, but will not index pdfs, giving the following =
errors.</FONT>
</P>

<P><FONT SIZE=3D2 FACE=3D"Arial">Conf.pl as follows:</FONT>
</P>

<P><FONT SIZE=3D2 FACE=3D"Courier New">$DOCUMENT_ROOT =3D =
'c:\index';</FONT>
</P>

<P><FONT SIZE=3D2 FACE=3D"Courier New">@EXT =3D =
(&quot;htm&quot;,&quot;html&quot;,&quot;shtml&quot;,&quot;asp&quot;,&quot=
;txt&quot;,&quot;pdf&quot;);</FONT>
</P>

<P><FONT SIZE=3D2 FACE=3D"Courier New">&quot;pdf&quot; =3D&gt; =
&quot;c:\xpdf\pdftotext FILENAME -&quot;,</FONT>
</P>

<P><FONT SIZE=3D2 FACE=3D"Arial">Error messages:</FONT>
</P>

<P><FONT SIZE=3D2 FACE=3D"Arial">&nbsp;&nbsp;&nbsp; 1: =
c:\index/cintro.html (0.52 KB)</FONT>

<BR><FONT SIZE=3D2 FACE=3D"Arial">Ignoring 'c:\index/cintro.pdf': =
illegal characters in filename</FONT>

<BR><FONT SIZE=3D2 FACE=3D"Arial">&nbsp;&nbsp;&nbsp; 2: =
c:\index/cintro.pdf (7.75 KB)</FONT>

<BR><FONT SIZE=3D2 FACE=3D"Arial">&nbsp;&nbsp;&nbsp; 3: =
c:\index/cintros.html (4.83 KB)</FONT>

<BR><FONT SIZE=3D2 FACE=3D"Arial">&nbsp;&nbsp;&nbsp; 4: =
c:\index/cintro_ind.html (0.25 KB)</FONT>

<BR><FONT SIZE=3D2 FACE=3D"Arial">&nbsp;&nbsp;&nbsp; 5: c:\index/Course =
Introduction.htm (11.37 KB)</FONT>

<BR><FONT SIZE=3D2 FACE=3D"Arial">&nbsp;&nbsp;&nbsp; 6: =
c:\index/index.html (1.54 KB)</FONT>

<BR><FONT SIZE=3D2 FACE=3D"Arial">Ignoring 'c:\index/mod02.pdf': illegal =
characters in filename</FONT>

<BR><FONT SIZE=3D2 FACE=3D"Arial">&nbsp;&nbsp;&nbsp; 7: =
c:\index/mod02.pdf (142.70 KB)</FONT>

<BR><FONT SIZE=3D2 FACE=3D"Arial">Ignoring 'c:\index/mod03.pdf': illegal =
characters in filename</FONT>

<BR><FONT SIZE=3D2 FACE=3D"Arial">&nbsp;&nbsp;&nbsp; 8: =
c:\index/mod03.pdf (77.88 KB)</FONT>

<BR><FONT SIZE=3D2 FACE=3D"Arial">Ignoring 'c:\index/mod04.pdf': illegal =
characters in filename</FONT>

<BR><FONT SIZE=3D2 FACE=3D"Arial">&nbsp;&nbsp;&nbsp; 9: =
c:\index/mod04.pdf (62.75 KB)</FONT>

<BR><FONT SIZE=3D2 FACE=3D"Arial">Ignoring 'c:\index/mod05.pdf': illegal =
characters in filename</FONT>

<BR><FONT SIZE=3D2 FACE=3D"Arial">&nbsp;&nbsp;&nbsp; 10: =
c:\index/mod05.pdf (108.41 KB)</FONT>

<BR><FONT SIZE=3D2 FACE=3D"Arial">Ignoring 'c:\index/mod06.pdf': illegal =
characters in filename</FONT>

<BR><FONT SIZE=3D2 FACE=3D"Arial">&nbsp;&nbsp;&nbsp; 11: =
c:\index/mod06.pdf (100.29 KB)</FONT>

<BR><FONT SIZE=3D2 FACE=3D"Arial">Ignoring 'c:\index/mod07.pdf': illegal =
characters in filename</FONT>

<BR><FONT SIZE=3D2 FACE=3D"Arial">&nbsp;&nbsp;&nbsp; 12: =
c:\index/mod07.pdf (110.09 KB)</FONT>

<BR><FONT SIZE=3D2 FACE=3D"Arial">Ignoring 'c:\index/mod08.pdf': illegal =
characters in filename</FONT>

<BR><FONT SIZE=3D2 FACE=3D"Arial">&nbsp;&nbsp;&nbsp; 13: =
c:\index/mod08.pdf (88.88 KB)</FONT>
</P>

<P><FONT SIZE=3D2 FACE=3D"Arial">Grateful for any assistance</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C3E74C.A9F7B475--