lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gwk <g...@eyefi.nl>
Subject Re: Date Faceting and Double Counting
Date Tue, 01 Sep 2009 08:50:15 GMT
Hi Stephen,

When I added numerical faceting to my checkout of solr (solr-1240) I 
basically copied date faceting and modified it to work with numbers 
instead of dates. With numbers I got a lot of doulbe-counted values as 
well. So to fix my problem I added an extra parameter to number faceting 
where you can specify if either end of each range should be inclusive or 
exclusive. I just ported it back to date faceting (disclaimer, 
completely untested) and it should be attached to my post.

The following parameter is added: facet.date.exclusive
valid values for the parameter are: start, end, both and neither

To maintain compatibility with solr without the patch the default is 
neither. I hope the meaning of the values are self-explanatory.

Regards,

gwk

Stephen Duncan Jr wrote:
> If we do date faceting and start at 2009-01-01T00:00:00Z, end at
> 2009-01-03T00:00:00Z, with a gap of +1DAY, then documents that occur at
> exactly 2009-01-02T00:00:00Z will be included in both the returned counts
> (2009-01-01T00:00:00Z and 2009-01-02T00:00:00Z).  At the moment, this is
> quite bad for us, as we only index the day-level, so all of our documents
> are exactly on the line between each facet-range.
>
> Because we know our data is indexed as being exactly at midnight each day, I
> think we can simply always start from 1 second prior and get the results we
> want (start=2008-12-31T23:59:59Z, end=2009-01-02T23:59:59Z), but I think
> this problem would affect everyone, even if usually more subtly (instead of
> all documents being counted twice, only a few on the fencepost between
> ranges).
>
> Is this a known behavior people are happy with, or should I file an issue
> asking for ranges in date-facets to be constructed to subtract one second
> from the end of each range (so that the effective range queries for my case
> would be: [2009-01-01T00:00:00Z TO 2009-01-01T23:59:59Z] &
> [2009-01-02T00:00:00Z TO 2009-01-02T23:59:59Z])?
>
> Alternatively, is there some other suggested way of using the date faceting
> to avoid this problem?
>
>   


Mime
View raw message