lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chitra R <chithu.r...@gmail.com>
Subject Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?
Date Tue, 22 Nov 2016 12:17:09 GMT
Kindly post your suggestions.

Regards,
Chitra






























On Sat, Nov 19, 2016 at 1:38 PM, Chitra R <chithu.r111@gmail.com> wrote:

> Hey, I got it clearly. Thank you so much. Could you please help us to
> implement it in our use case?
>
>
> In our case, we are having dynamic index and it is variable depth too. So
> flat facet is enough.No need of hierarchical facets.
>
> What I think is,
>
>
>    1. Index my facet field as normal doc value field, so that no special
>    operation (like taxonomy and sorted set doc values facet field) will be
>    done at index time and only doc value field stores its ordinals in their
>    respective field.
>    2. At search time, I will pass query (user search query) , filter
>    (path traversed list)  and collect the matching documents in
>    Facetscollector.
>
>    3. To compute facet count for the specific field, I will gather those
>    resulted docs, then move through each segment for collecting the matching
>    ordinals using AtomicReader.
>
>
> And know when I use this means, can't calculate facet count for more than
> one field(facet) in a search.
>
> Instead of loading all the dimensions in DocValuesReaderState (will take
> more time and memory) at search time, loading specific fields will take
> less time and memory, hope so. Kindly help to solve.
>
>
> It will do it in a minimal index and search cost, I think. And hope this
> won't put overload at index time, also at search time this will be better.
>
>
> Kindly post your suggestions.
>
>
> Regards,
> Chitra
>
>
>
>
> On Fri, Nov 18, 2016 at 7:15 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> I think you've summed up exactly the differences!
>>
>> And, yes, it would be possible to emulate hierarchical facets on top
>> of flat facets, if the hierarchy is fixed depth like year/month/day.
>>
>> But if it's variable depth, it's trickier (but I think still
>> possible).  See e.g. the Committed Paths drill-down on the left, on
>> our dog-food server
>> http://jirasearch.mikemccandless.com/search.py?index=jira
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Fri, Nov 18, 2016 at 1:43 AM, Chitra R <chithu.r111@gmail.com> wrote:
>> > case 1:
>> >         In taxonomy, for each indexed document, examines facet label ,
>> > computes their ordinals and mappings, and which will be stored in
>> sidecar
>> > index at index time.
>> >
>> > case 2:
>> >         In doc values, these(ordinals) are computed at search time, so
>> there
>> > will be a time and memory trade-off between both cases, hope so.
>> >
>> >
>> > In taxonomy, building hierarchical facets at index time makes faceting
>> cost
>> > minimal at search time than flat facets in doc values.
>> >
>> > Except (memory,time and NRT latency) , Is any another contrast between
>> > hierarchical and flat facets at search time?
>> >
>> >
>> > Kindly post your suggestions...
>> >
>> >
>> > Regards,
>> > Chitra
>> >
>> > On Thu, Nov 17, 2016 at 6:40 PM, Chitra R <chithu.r111@gmail.com>
>> wrote:
>> >>
>> >> Okay. I agree with you, Taxonomy maintains and supports hierarchical
>> >> facets during indexing. Hope hierarchical in the sense, we might index
>> the
>> >> field Publish date : 2010/10/15 as Publish date: 2010 , Publish date:
>> >> 2010/10 and Publish date: 2010/10/15 , their facet ordinals are
>> maintained
>> >> in sidecar index and it is mapped to the main index.
>> >>
>> >> For example:
>> >>
>> >>                 In search-lucene.com , I enter a term (say facet), top
>> >> documents and their categories are displayed after performing the
>> search.
>> >> Say I drill down through Publish date/2010 to collect its child counts
>> and
>> >> after I will pass through publishdate/2010/10 to collect their child
>> counts.
>> >> And for each drill down, each search will be performed to collect its
>> top
>> >> docs and categories.
>> >>
>> >>
>> >>                Even I can achieve this in flat facets by changing the
>> >> drill down query.
>> >>
>> >> Am I right or missed anything? yet I don't know if I missed anything...
>> >>
>> >> So What is the need of hierarchical facets? Could you please explain
>> >> it(hierarchical facets) in the real-world use case?
>> >>
>> >>
>> >> Regards,
>> >> Chitra
>> >>
>> >> On Wed, Nov 16, 2016 at 7:36 PM, Michael McCandless
>> >> <lucene@mikemccandless.com> wrote:
>> >>>
>> >>> You store dimension + string (a single value path, since it's not
>> >>> hierarchical) into SSDVFF so that you can compute facet counts, either
>> >>> ordinary drill down counts or the drill sideways counts.
>> >>>
>> >>> You can see examples of drill sideways at
>> >>> http://jirasearch.mikemccandless.com, e.g. drill down on any of those
>> >>> fields on the left and you don't lose the previous facet counts for
>> >>> that field.
>> >>>
>> >>> Mike McCandless
>> >>>
>> >>> http://blog.mikemccandless.com
>> >>>
>> >>>
>> >>> On Wed, Nov 16, 2016 at 8:51 AM, Chitra R <chithu.r111@gmail.com>
>> wrote:
>> >>> > Hi,
>> >>> >
>> >>> > Lucene-Drill sideways
>> >>> >
>> >>> > jira_issue:LUCENE-4748
>> >>> >
>> >>> >                                  Is this the reason( ie Drill
>> sideways
>> >>> > makes
>> >>> > a very nice faceted search UI because we
>> >>> > don't "lose" the facet counts after drilling in) behind storing
path
>> >>> > and
>> >>> > dimension for the given SSDVF field? Else anything?
>> >>> >
>> >>> > Regards,
>> >>> > Chitra
>> >>> >
>> >>> >
>> >>> >      Hey, thank you so much for the fast response, I agree NRT
>> refresh
>> >>> > is
>> >>> > somewhat costly operations and this is the major pitfall, suppose
we
>> >>> > use doc
>> >>> > value faceting.
>> >>> >
>> >>> >
>> >>> >                  While indexing SortedSetDocValuesFacetField ,
it
>> >>> > stores
>> >>> > path and dimension of the given field internally. So Can we achieve
>> >>> > hierarchical facets using DrillDownQuery? Hope, purpose of storing
>> path
>> >>> > and
>> >>> > dimension is to achieve hierarchical facets. If yes (ie we can
>> achieve
>> >>> > hierarchy in SSDVFF) , so what is the need to move over taxonomy?
>> >>> >  Else I missed anything?
>> >>> >
>> >>> >
>> >>> >                  What is the real purpose to store path and
>> dimension
>> >>> > in
>> >>> > SSDVF field?
>> >>> >
>> >>> >
>> >>> > Kindly post your suggestions.
>> >>> >
>> >>> > Regards,
>> >>> > Chitra
>> >>> >
>> >>> >
>> >>> >
>> >>> > On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless
>> >>> > <lucene@mikemccandless.com> wrote:
>> >>> >>
>> >>> >> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R <chithu.r111@gmail.com>
>> >>> >> wrote:
>> >>> >>
>> >>> >> >         i)Hope, when opening SortedSetDocValuesReaderState
, we
>> are
>> >>> >> > calculating ordinals( this will be used to calculate facet
count
>> )
>> >>> >> > for
>> >>> >> > doc
>> >>> >> > values field and this only made the state instance somewhat
>> costly.
>> >>> >> >                       Am I right or any other reason behind
that?
>> >>> >>
>> >>> >> That's correct.  It adds some latency to an NRT refresh, and
some
>> heap
>> >>> >> used to hold the ordinal mappings.
>> >>> >>
>> >>> >> >          ii) During indexing, we are providing facet ordinals
in
>> >>> >> > each
>> >>> >> > doc
>> >>> >> > and I think it will be useful in search side, to calculate
facet
>> >>> >> > counts
>> >>> >> > only for matching docs.  otherwise, it carries any other
>> benefits?
>> >>> >>
>> >>> >> Well, compared to the taxonomy facets, SSDV facets don't require
a
>> >>> >> separate index.
>> >>> >>
>> >>> >> But they add latency/heap usage, and they cannot do hierarchical
>> >>> >> facets yet (though this could be fixed if someone just built
it).
>> >>> >>
>> >>> >> >          iii) Is SortedSetDocValuesReaderState thread-safe
(ie)
>> >>> >> > multiple
>> >>> >> > threads can call this method concurrently?
>> >>> >>
>> >>> >> Yes.
>> >>> >>
>> >>> >> Mike McCandless
>> >>> >>
>> >>> >> http://blog.mikemccandless.com
>> >>> >
>> >>> >
>> >>
>> >>
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message