lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: How to do ? Articles and Its Associated Comments Indexing , One to Many relationship
Date Thu, 26 Aug 2010 17:00:50 GMT
See below...

On Thu, Aug 26, 2010 at 4:31 AM, Sumit Arora <sumit1234@gmail.com> wrote:

> Thanks Ephraim for your response.
>
> If I use MultiValued for Comments Field then While Picking data from Solr,
> Should I use following Logic :
>
> /*  Sample PseudoCode */
>
> Get Rows from Article and Article-Comments Table ;  *// It will retrieve -
> 1
> Article and 20 Comments*
>
> Begin;
>
> Include 'Article Fields Value' in 'Solr Fields Value' Defined in Schema.Xml
>  */* One Article in this Case, So it will generate one document id for Solr
> - */*
>
> Comments = 0;
>
> While (Comments ! = 20 )
>
> {
>   Include this Comment;
>
>   ++Comments;
> }
>
> End;
>
> Result : One Article with MultipleComments as MultiValued indexed in Solr,
> Finally Solr will have only one document or multiple document ?
>
>
A multi-valued field is just what it says, a field within a single
document. So you'd have one document with 20 values for
your comment field.

However, note that SOLR doesn't have partial updates of a document,
it deletes and re-adds a document when you update. This is handled
automatically for you if you have a uniquekey defined. That is, if
you add a new document with the SAME unique key as a previous
document, the previous one will be removed and the new one
will replace it (with a new internal document id).


> If I suppose to use HighLight Text in this case, and Search - Keyword exist
> in more than one Comments ? How I can achieve below result where it has
> found 'web' keyword exist in two comments.
>
> ... 1.The *web* portal will connect a lot of people for some specific
> domain, and then people can post their interesting story, upload files
>
>  ... 2.1 accessing multiple sites will slow down the user experience - try
> not to do it. *web* hosting is not too expensive as compared to the other
> components ...
>
>
>
I believe this is controlled by the hl.fragsize, see:
http://wiki.apache.org/solr/HighlightingParameters#hl.fragsize

The other thing you should be aware of is "increment gap". This
is useful if you want, say, phrase queries to NOT work across
two comments. I.e.
comment 1: comments are very nice
comment 2: day in and day out

If you don't want a phrase query "nice day" to match the
enclosing document, you probably want to work with the
positionIncrementGap. See:
http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-td488338.html

Best
Erick


>
>
> On Thu, Aug 26, 2010 at 4:32 PM, Ephraim Ofir <EphraimO@icq.com> wrote:
>
> > Why not define the comment field as multiValued? That way you only index
> > each document once and you don't need to collapse anything...
> >
> > Ephraim Ofir
> >
> >
> > -----Original Message-----
> > From: Sumit Arora [mailto:sumit1234@gmail.com]
> > Sent: Thursday, August 26, 2010 12:54 PM
> > To: solr-user@lucene.apache.org
> > Subject: How to do ? Articles and Its Associated Comments Indexing , One
> > to Many relationship
> >
> > I have set of Articles and then Comments on it, so in database I have
> > two
> > major tables one for Articles and one for Comments, but each Article
> > could
> > have many comments (One to Many).
> >
> >
> > If One Article will have 20 Comments, then on DB to SOLR - Index - Sync
> > :
> > Solr will index 20 Similar Documents with a difference of each Comment.
> >
> >
> > Use Case :
> >
> > On Search: If keyword would be a fit to more than one comment, then it
> > will
> > return duplicate documents.
> >
> >
> > One Possible solution I thought to Apply:
> >
> > ******************************************
> >
> > I should go for Indexing 20 Similar Documents with a difference of each
> > Comment.
> >
> >
> > While retrieving results from Query: I could use: collapse.field = By
> > Article Id
> >
> >
> > Am I following right approach?
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message