lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dyer, James" <James.D...@ingramcontent.com>
Subject RE: Get page number of searchresult of a pdf in solr
Date Fri, 01 Mar 2013 15:21:40 GMT
Is there an easy (enough) way to do this, storing the page number as a payload on each term?

James Dyer
Ingram Content Group
(615) 213-4311

-----Original Message-----
From: Michael Della Bitta [mailto:michael.della.bitta@appinions.com] 
Sent: Thursday, February 28, 2013 3:33 PM
To: solr-user@lucene.apache.org
Subject: Re: Get page number of searchresult of a pdf in solr

My guess is the best way to do this is to index each page separately
and to store a link to the PDF/page in each document.

That would probably require you to preprocess the PDFs to turn each
one into a single page per PDF, or to extract the text per page
another way.

Michael Della Bitta

------------------------------------------------
Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn't a Game


On Thu, Feb 28, 2013 at 3:26 PM,  <dev@geschan.de> wrote:
> Hello,
>
> I'm building a web application where users can search for pdf documents and
> view them with pdf.js. I would like to display the search results with a
> short snippet of the paragraph where the search term where found and a link
> to open the document at the right page.
>
> So what I need is the page number and a short text snippet of every search
> result.
>
> I'm using SOLR 4.1 for indexing pdf documents. The indexing itself works
> fine but I don't know how to get the page number and paragraph of a search
> result. I only get the document where the search term was found in.
>
> -Gesh
>



Mime
View raw message