lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <>
Subject Re: PDF Indexing
Date Wed, 02 Apr 2014 19:35:05 GMT
Hi Sujatha,

There is no built in mechanism. Prepare page documents outside of the solr.

And you may want to save text content somewhere too. If you change something in index analysis/schema
you need to reindex. If you save text data, you can skip extraction phase at least.


On Wednesday, April 2, 2014 10:05 PM, Sujatha Arun <> wrote:

I  am able to use TIKA and DIH to  Index a pdf as a single document.However
I need each page to be single document. Is there any inbuilt mechanism to
achieve the same or do I have to use pdfbox or any other tool achieve this?


View raw message