lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aloke Ghoshal <alghos...@gmail.com>
Subject Re: Get page number of searchresult of a pdf in solr
Date Sat, 02 Mar 2013 06:46:27 GMT
Hi,

We are going about solving this problem by splitting a N-page document in
to N separate documents (one per page, type=Page) + 1 additional combined
document (that has all the pages, type=Combined). All the N+1 documents
have the same doc_id.

The search is initially performed against the combined document
(type=Combined) to identify documents that match. For each search result a
second search is performed against the separate pages (type=Page AND
doc_id) to idetify the pages from within that document that match.

Keen to know how others have solved this.

Regards,
Aloke

On Fri, Mar 1, 2013 at 8:51 PM, Dyer, James <James.Dyer@ingramcontent.com>wrote:

> Is there an easy (enough) way to do this, storing the page number as a
> payload on each term?
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
> -----Original Message-----
> From: Michael Della Bitta [mailto:michael.della.bitta@appinions.com]
> Sent: Thursday, February 28, 2013 3:33 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Get page number of searchresult of a pdf in solr
>
> My guess is the best way to do this is to index each page separately
> and to store a link to the PDF/page in each document.
>
> That would probably require you to preprocess the PDFs to turn each
> one into a single page per PDF, or to extract the text per page
> another way.
>
> Michael Della Bitta
>
> ------------------------------------------------
> Appinions
> 18 East 41st Street, 2nd Floor
> New York, NY 10017-6271
>
> www.appinions.com
>
> Where Influence Isn't a Game
>
>
> On Thu, Feb 28, 2013 at 3:26 PM,  <dev@geschan.de> wrote:
> > Hello,
> >
> > I'm building a web application where users can search for pdf documents
> and
> > view them with pdf.js. I would like to display the search results with a
> > short snippet of the paragraph where the search term where found and a
> link
> > to open the document at the right page.
> >
> > So what I need is the page number and a short text snippet of every
> search
> > result.
> >
> > I'm using SOLR 4.1 for indexing pdf documents. The indexing itself works
> > fine but I don't know how to get the page number and paragraph of a
> search
> > result. I only get the document where the search term was found in.
> >
> > -Gesh
> >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message