lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roy Liu <liuchua...@gmail.com>
Subject Re: How to index PDF file stored in SQL Server 2008
Date Mon, 11 Apr 2011 07:29:12 GMT
I changed data-config-sql.xml to
<dataConfig>
  <dataSource type="JdbcDataSource"
              name="bsds"
              driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"

url="jdbc:sqlserver://localhost:1433;databaseName=bs_docmanager"
              user="username"
              password="pw"
              convertType="true"
              />

  <document name="docs">
    <entity name="doc" dataSource="bsds"
            query="select id,filename,attachment from attachment where
ext='pdf' and id=3632" >
            <field column="id" name="id" />
            <field column="filename" name="title" />
            <field column="attachment" name="bs_attachment" />
    </entity>
  </document>
</dataConfig>


There are no errors, but, the indexed pdf is convert to Numbers..
200 1 202 1 203 1 212 1 222 1 236 1 242 1 244 1 254 1 255
-- 
Best Regards,
Roy Liu


On Mon, Apr 11, 2011 at 2:02 PM, Roy Liu <liuchuanbo@gmail.com> wrote:

> Hi, all
> Thank YOU very much for your kindly help.
>
> *1. I have upgrade from Solr 1.4 to Solr 3.1*
> *2. Change data-config-sql.xml *
>
> <dataConfig>
>   <dataSource type="JdbcDataSource"
>               name="*bsds*"
>               driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
>
> url="jdbc:sqlserver://localhost:1433;databaseName=bs_docmanager"
>               user="username"
>               password="pw"/>
>   <datasource name="*docds*" type="*BinURLDataSource*" />
>
>   <document name="docs">
>     <entity name="*doc*" dataSource="*bsds*"
>             query="select id,attachment,filename from attachment where
> ext='pdf' and id>30001030" >
>
>             <field column="id" name="id" />
>             *<entity dataSource="docds" processor="TikaEntityProcessor"
> url="${doc.attachment}" format="text" >**
>                 <field column="attachment" name="bs_attachment" />
>             </entity>*
>             <field column="filename" name="title" />
>     </entity>
>   </document>
> </dataConfig>
>
> *3. solrconfig.xml and schema.xml are NOT changed.*
>
> However, when I access
>
> *http://localhost:8080/solr/dataimport?command=full-import*
>
> It still has errors:
> Full Import
> failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
> Unable to execute query:[B@ae1393 Processing Document # 1
>
> Could you give me some advices. This problem is so boring me.
> Thanks.
>
> --
> Best Regards,
> Roy Liu
>
>
>
> On Mon, Apr 11, 2011 at 5:16 AM, Lance Norskog <goksron@gmail.com> wrote:
>
>> You have to upgrade completely to the Apache Solr 3.1 release. It is
>> worth the effort. You cannot copy any jars between Solr releases.
>> Also, you cannot copy over jars from newer Tika releases.
>>
>> On Fri, Apr 8, 2011 at 10:47 AM, Darx Oman <darxoman@gmail.com> wrote:
>> > Hi again
>> > what you are missing is field mapping
>> > <field column="id" name="id" />
>> > ....
>> >
>> >
>> > no need for TikaEntityProcessor  since you are not accessing pdf files
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message