lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chitra R <chithu.r...@gmail.com>
Subject Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?
Date Fri, 18 Nov 2016 06:43:56 GMT
case 1:
        In taxonomy, for each indexed document, examines facet label ,
computes their ordinals and mappings, and which will be stored in sidecar
index at index time.

case 2:
        In doc values, these(ordinals) are computed at search time, so
there will be a time and memory trade-off between both cases, hope so.


In taxonomy, building hierarchical facets at index time makes faceting cost
minimal at search time than flat facets in doc values.

Except (memory,time and NRT latency) , Is any another contrast between
hierarchical and flat facets at search time?


Kindly post your suggestions...


Regards,
Chitra

On Thu, Nov 17, 2016 at 6:40 PM, Chitra R <chithu.r111@gmail.com> wrote:

> Okay. I agree with you, Taxonomy maintains and supports hierarchical
> facets during indexing. Hope hierarchical in the sense, we might index the field
> Publish date : 2010/10/15 as Publish date: 2010 , Publish date: 2010/10
> and Publish date: 2010/10/15 , their facet ordinals are maintained in
> sidecar index and it is mapped to the main index.
>
> For example:
>
>                 In search-lucene.com , I enter a term (say facet), top
> documents and their categories are displayed after performing the search.
> Say I drill down through Publish date/2010 to collect its child counts and
> after I will pass through publishdate/2010/10 to collect their child
> counts. And for each drill down, each search will be performed to collect
> its top docs and categories.
>
>
>                *Even I can achieve this in flat facets by changing the
> drill down query. *
>
> Am I right or missed anything? yet I don't know if I missed anything...
>
> So What is the need of hierarchical facets? Could you please explain
> it(hierarchical facets) in the real-world use case?
>
>
> Regards,
> Chitra
>
> On Wed, Nov 16, 2016 at 7:36 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> You store dimension + string (a single value path, since it's not
>> hierarchical) into SSDVFF so that you can compute facet counts, either
>> ordinary drill down counts or the drill sideways counts.
>>
>> You can see examples of drill sideways at
>> http://jirasearch.mikemccandless.com, e.g. drill down on any of those
>> fields on the left and you don't lose the previous facet counts for
>> that field.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Wed, Nov 16, 2016 at 8:51 AM, Chitra R <chithu.r111@gmail.com> wrote:
>> > Hi,
>> >
>> > Lucene-Drill sideways
>> >
>> > jira_issue:LUCENE-4748
>> >
>> >                                  Is this the reason( ie Drill sideways
>> makes
>> > a very nice faceted search UI because we
>> > don't "lose" the facet counts after drilling in) behind storing path and
>> > dimension for the given SSDVF field? Else anything?
>> >
>> > Regards,
>> > Chitra
>> >
>> >
>> >      Hey, thank you so much for the fast response, I agree NRT refresh
>> is
>> > somewhat costly operations and this is the major pitfall, suppose we
>> use doc
>> > value faceting.
>> >
>> >
>> >                  While indexing SortedSetDocValuesFacetField , it stores
>> > path and dimension of the given field internally. So Can we achieve
>> > hierarchical facets using DrillDownQuery? Hope, purpose of storing path
>> and
>> > dimension is to achieve hierarchical facets. If yes (ie we can achieve
>> > hierarchy in SSDVFF) , so what is the need to move over taxonomy?
>> >  Else I missed anything?
>> >
>> >
>> >                  What is the real purpose to store path and dimension in
>> > SSDVF field?
>> >
>> >
>> > Kindly post your suggestions.
>> >
>> > Regards,
>> > Chitra
>> >
>> >
>> >
>> > On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless
>> > <lucene@mikemccandless.com> wrote:
>> >>
>> >> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R <chithu.r111@gmail.com>
>> wrote:
>> >>
>> >> >         i)Hope, when opening SortedSetDocValuesReaderState , we are
>> >> > calculating ordinals( this will be used to calculate facet count )
>> for
>> >> > doc
>> >> > values field and this only made the state instance somewhat costly.
>> >> >                       Am I right or any other reason behind that?
>> >>
>> >> That's correct.  It adds some latency to an NRT refresh, and some heap
>> >> used to hold the ordinal mappings.
>> >>
>> >> >          ii) During indexing, we are providing facet ordinals in each
>> >> > doc
>> >> > and I think it will be useful in search side, to calculate facet
>> counts
>> >> > only for matching docs.  otherwise, it carries any other benefits?
>> >>
>> >> Well, compared to the taxonomy facets, SSDV facets don't require a
>> >> separate index.
>> >>
>> >> But they add latency/heap usage, and they cannot do hierarchical
>> >> facets yet (though this could be fixed if someone just built it).
>> >>
>> >> >          iii) Is SortedSetDocValuesReaderState thread-safe (ie)
>> multiple
>> >> > threads can call this method concurrently?
>> >>
>> >> Yes.
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message