lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Faceting on multivalued field
Date Mon, 04 Apr 2011 11:44:42 GMT
I hadn't thought that far. But if you can change your query to
sum the fields that'd be easiest.

Mostly, I was thinking that since that information is known up
front, storing it with the document makes sense and would
avoid costly Solr work.

I don't know of any transformers that would do this for you,
it's almost an introspection transformation you'd want to do.

You could also consider using SolrJ/jdbc to query your database,
but I'd try for the SQL query first.

Best
Erick

On Mon, Apr 4, 2011 at 4:18 AM, Kaushik Chakraborty <kaychaks@gmail.com>wrote:

> Are you implying to change the DB query of the nested entity which fetches
> the comments (query is in my post) or something can be done during the
> index
> like using Transformers etc. ?
>
> Thanks,
> Kaushik
>
>
> On Mon, Apr 4, 2011 at 8:07 AM, Erick Erickson <erickerickson@gmail.com
> >wrote:
>
> > Why not count them on the way in and just store that number along
> > with the original e-mail?
> >
> > Best
> > Erick
> >
> > On Sun, Apr 3, 2011 at 10:10 PM, Kaushik Chakraborty <kaychaks@gmail.com
> > >wrote:
> >
> > > Ok. My expectation was since "comment_post_id" is a MultiValued field
> > hence
> > > it would appear multiple times (i.e. for each comment). And hence when
> I
> > > would facet with that field it would also give me the count of those
> many
> > > documents where comment_post_id appears.
> > >
> > > My requirement is getting total for every document i.e. finding number
> of
> > > comments per post in the whole corpus. To explain it more clearly, I'm
> > > getting a result xml something like this
> > >
> > > <str name="post_id">46</str>
> > > <str name="post_text">Hello World</str>
> > > <str name="person_id">20</str>
> > > <arr name="comment_id">
> > >    <str>9</str>
> > >    <str>10</str>
> > > </arr>
> > > <arr name="comment_person_id">
> > >   <str>19</str>
> > >   <str>2</str>
> > > </arr>
> > > <arr name="comment_post_id">
> > >  <str>46</str>
> > >  <str>46</str>
> > > </arr>
> > > <arr name="comment_text">
> > >   <str>Hello - from World</str>
> > >   <str>Hi</str>
> > > </arr>
> > >
> > > <lst name="facet_fields">
> > >  <lst name="comment_post_id">
> > >     *<int name="46">1</int>*
> > >
> > > I need the count to be 2 as the post 46 has 2 comments.
> > >
> > >  What other way can I approach?
> > >
> > > Thanks,
> > > Kaushik
> > >
> > >
> > > On Mon, Apr 4, 2011 at 4:29 AM, Erick Erickson <
> erickerickson@gmail.com
> > > >wrote:
> > >
> > > > Hmmm, I think you're misunderstanding faceting. It's counting the
> > > > number of documents that have a particular value. So if you're
> > > > faceting on "comment_post_id", there is one and only one document
> > > > with that value (assuming that the comment_post_ids are unique).
> > > > Which is what's being reported.... This will be quite expensive on a
> > > > large corpus, BTW.
> > > >
> > > > Is your task to show the totals for *every* document in your corpus
> or
> > > > just the ones in a display page? Because if the latter, your app
> could
> > > > just count up the number of elements in the XML returned for the
> > > > multiValued comments field.
> > > >
> > > > If that's not relevant, could you explain a bit more why you need
> this
> > > > count?
> > > >
> > > > Best
> > > > Erick
> > > >
> > > > On Sun, Apr 3, 2011 at 2:31 PM, Kaushik Chakraborty <
> > kaychaks@gmail.com
> > > > >wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > My index contains a root entity "Post" and a child entity
> "Comments".
> > > > Each
> > > > > post can have multiple comments. data-config.xml:
> > > > >
> > > > > <document>
> > > > >            <entity name="posts" transformer="TemplateTransformer"
> > > > > dataSource="jdbc" query="">
> > > > >
> > > > >                <field column="post_id" />
> > > > >                <field column="post_text"/>
> > > > >                <field column="person_id"/>
> > > > >                <entity name="comments" dataSource="jdbc"
> > query="select
> > > *
> > > > > from comments where post_id = ${posts.post_id}" >
> > > > >                    <field column="comment_id" />
> > > > >                    <field column="comment_text" />
> > > > >                    <field column="comment_person_id" />
> > > > >                    <field column="comment_post_id" />
> > > > >               </entity>
> > > > >            </entity>
> > > > > </document>
> > > > >
> > > > > The schema has all columns of "comment" entity as "MultiValued"
> > fields
> > > > and
> > > > > all fields are indexed & stored. My requirement is to count the
> > number
> > > of
> > > > > comments for each post. Approach I'm taking is to query on "*:*"
> and
> > > > > faceting the result on "comment_post_id" so that it gives the count
> > of
> > > > > comment occurred for that post.
> > > > >
> > > > > But I'm getting incorrect result e.g. if a post has 2 comments, the
> > > > > multivalued fields are populated alright but the facet count is
> > coming
> > > as
> > > > 1
> > > > > (for that post_id). What else do I need to do?
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Kaushik
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message