lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?
Date Fri, 18 Nov 2016 13:45:42 GMT
I think you've summed up exactly the differences!

And, yes, it would be possible to emulate hierarchical facets on top
of flat facets, if the hierarchy is fixed depth like year/month/day.

But if it's variable depth, it's trickier (but I think still
possible).  See e.g. the Committed Paths drill-down on the left, on
our dog-food server
http://jirasearch.mikemccandless.com/search.py?index=jira

Mike McCandless

http://blog.mikemccandless.com


On Fri, Nov 18, 2016 at 1:43 AM, Chitra R <chithu.r111@gmail.com> wrote:
> case 1:
>         In taxonomy, for each indexed document, examines facet label ,
> computes their ordinals and mappings, and which will be stored in sidecar
> index at index time.
>
> case 2:
>         In doc values, these(ordinals) are computed at search time, so there
> will be a time and memory trade-off between both cases, hope so.
>
>
> In taxonomy, building hierarchical facets at index time makes faceting cost
> minimal at search time than flat facets in doc values.
>
> Except (memory,time and NRT latency) , Is any another contrast between
> hierarchical and flat facets at search time?
>
>
> Kindly post your suggestions...
>
>
> Regards,
> Chitra
>
> On Thu, Nov 17, 2016 at 6:40 PM, Chitra R <chithu.r111@gmail.com> wrote:
>>
>> Okay. I agree with you, Taxonomy maintains and supports hierarchical
>> facets during indexing. Hope hierarchical in the sense, we might index the
>> field Publish date : 2010/10/15 as Publish date: 2010 , Publish date:
>> 2010/10 and Publish date: 2010/10/15 , their facet ordinals are maintained
>> in sidecar index and it is mapped to the main index.
>>
>> For example:
>>
>>                 In search-lucene.com , I enter a term (say facet), top
>> documents and their categories are displayed after performing the search.
>> Say I drill down through Publish date/2010 to collect its child counts and
>> after I will pass through publishdate/2010/10 to collect their child counts.
>> And for each drill down, each search will be performed to collect its top
>> docs and categories.
>>
>>
>>                Even I can achieve this in flat facets by changing the
>> drill down query.
>>
>> Am I right or missed anything? yet I don't know if I missed anything...
>>
>> So What is the need of hierarchical facets? Could you please explain
>> it(hierarchical facets) in the real-world use case?
>>
>>
>> Regards,
>> Chitra
>>
>> On Wed, Nov 16, 2016 at 7:36 PM, Michael McCandless
>> <lucene@mikemccandless.com> wrote:
>>>
>>> You store dimension + string (a single value path, since it's not
>>> hierarchical) into SSDVFF so that you can compute facet counts, either
>>> ordinary drill down counts or the drill sideways counts.
>>>
>>> You can see examples of drill sideways at
>>> http://jirasearch.mikemccandless.com, e.g. drill down on any of those
>>> fields on the left and you don't lose the previous facet counts for
>>> that field.
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Wed, Nov 16, 2016 at 8:51 AM, Chitra R <chithu.r111@gmail.com> wrote:
>>> > Hi,
>>> >
>>> > Lucene-Drill sideways
>>> >
>>> > jira_issue:LUCENE-4748
>>> >
>>> >                                  Is this the reason( ie Drill sideways
>>> > makes
>>> > a very nice faceted search UI because we
>>> > don't "lose" the facet counts after drilling in) behind storing path
>>> > and
>>> > dimension for the given SSDVF field? Else anything?
>>> >
>>> > Regards,
>>> > Chitra
>>> >
>>> >
>>> >      Hey, thank you so much for the fast response, I agree NRT refresh
>>> > is
>>> > somewhat costly operations and this is the major pitfall, suppose we
>>> > use doc
>>> > value faceting.
>>> >
>>> >
>>> >                  While indexing SortedSetDocValuesFacetField , it
>>> > stores
>>> > path and dimension of the given field internally. So Can we achieve
>>> > hierarchical facets using DrillDownQuery? Hope, purpose of storing path
>>> > and
>>> > dimension is to achieve hierarchical facets. If yes (ie we can achieve
>>> > hierarchy in SSDVFF) , so what is the need to move over taxonomy?
>>> >  Else I missed anything?
>>> >
>>> >
>>> >                  What is the real purpose to store path and dimension
>>> > in
>>> > SSDVF field?
>>> >
>>> >
>>> > Kindly post your suggestions.
>>> >
>>> > Regards,
>>> > Chitra
>>> >
>>> >
>>> >
>>> > On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless
>>> > <lucene@mikemccandless.com> wrote:
>>> >>
>>> >> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R <chithu.r111@gmail.com>
>>> >> wrote:
>>> >>
>>> >> >         i)Hope, when opening SortedSetDocValuesReaderState , we
are
>>> >> > calculating ordinals( this will be used to calculate facet count
)
>>> >> > for
>>> >> > doc
>>> >> > values field and this only made the state instance somewhat costly.
>>> >> >                       Am I right or any other reason behind that?
>>> >>
>>> >> That's correct.  It adds some latency to an NRT refresh, and some heap
>>> >> used to hold the ordinal mappings.
>>> >>
>>> >> >          ii) During indexing, we are providing facet ordinals in
>>> >> > each
>>> >> > doc
>>> >> > and I think it will be useful in search side, to calculate facet
>>> >> > counts
>>> >> > only for matching docs.  otherwise, it carries any other benefits?
>>> >>
>>> >> Well, compared to the taxonomy facets, SSDV facets don't require a
>>> >> separate index.
>>> >>
>>> >> But they add latency/heap usage, and they cannot do hierarchical
>>> >> facets yet (though this could be fixed if someone just built it).
>>> >>
>>> >> >          iii) Is SortedSetDocValuesReaderState thread-safe (ie)
>>> >> > multiple
>>> >> > threads can call this method concurrently?
>>> >>
>>> >> Yes.
>>> >>
>>> >> Mike McCandless
>>> >>
>>> >> http://blog.mikemccandless.com
>>> >
>>> >
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message