lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahson Iqbal <mianah...@yahoo.com>
Subject Re: Not storing, but highlighting from document sentences
Date Tue, 18 Jan 2011 10:46:55 GMT
Hi

A simple solution to this could be, for all such searches (foo and bar), search 
them as it is from 1st(primary index) and while sending these queries to 
secondary index replace and with or. 


But in this particular scenario u could also have problem with proximity and 
phrase queries that is much difficult to tackle.

Regards
Ahsan





________________________________
From: Otis Gospodnetic <otis_gospodnetic@yahoo.com>
To: solr-user@lucene.apache.org
Sent: Tue, January 18, 2011 12:25:12 PM
Subject: Re: Not storing, but highlighting from document sentences

Hi Tarjei,

:)
Yeah, that is the solution we are going with, actually.


Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Tarjei Huse <tarjei@scanmine.com>
> To: solr-user@lucene.apache.org
> Sent: Tue, January 18, 2011 1:33:44 AM
> Subject: Re: Not storing, but highlighting from document sentences
> 
> On 01/12/2011 12:02 PM, Otis Gospodnetic wrote:
> > Hello,
> >
> >  I'm indexing some content (articles) whose text I cannot store in its 
>original 
>
> > form for copyright reason.  So I can index the content, but cannot  store 
>it.  
>
> > However, I need snippets and search term  highlighting.  
> >
> >
> > Any way to accomplish this  elegantly?  Or even not so elegantly?
> >
> > Here is one  idea:
> >
> > * Create 2 indices: main index for indexing (but not  storing) the original 
> > content, the secondary index for storing  individual sentences from the 
>original 
>
> > article.
> How about storing  the sentences in the same index in a separate field
> but with random ordering,  would that be ok?
> 
> Tarjei
> > * That is, before indexing an article,  split it into sentences.  Then index 

>the 
>
> > article in the main  index, and index+store each sentence in the secondary 
> > index.  So  for each doc in the main index there will be multiple docs in the 
>
>
> >  secondary index with individual sentences.  Each sentence doc includes an  
>ID of 
>
> > the "parent" document.
> >
> > * Then run queries against  the main index, and pull individual sentences 
>from 
>
> > the secondary index  for snippet+highlight purposes.
> >
> >
> > The problem I see with  this approach (and there may be other ones that I am 

>not 
>
> > seeing yet) is  with queries like foo AND bar.  In this case "foo" may be a 
>match 
>
> >  from sentence #1, and "bar" may be a match from sentence #7.  Or maybe  
>"foo" is 
>
> > a match in sentence #1, and "bar" is a match in multiple  sentences: #7 and 
>#10 
>
> > and #23.
> >
> > Regardless, when a query  is run against the main index, you don't know where 
>
>the 
>
> > match was, so  you don't know which sentences to go get from the secondary  
>index.
> >
> > Does anyone have any suggestions for how to handle  this?
> >
> > Thanks,
> > Otis
> > ----
> > Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
> > Lucene ecosystem search :: http://search-lucene.com/
> >
> 
> 
> -- 
> Regards / Med vennlig  hilsen
> Tarjei Huse
> Mobil: 920 63 413
> 
> 



      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message