lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-5648) Index/search multi-valued time durations
Date Mon, 12 May 2014 23:00:16 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

David Smiley updated LUCENE-5648:
---------------------------------

    Attachment: LUCENE-5648.patch

Here it is; including tests.
* Works with all main predicates: Intersects, IsWithin, Contains, IsDisjointTo
* The implementation is split into the core, a NumberRangePrefixTree and knows nothing about
calendars, and then a DateRangePrefixTree subclass which just has the calendaring specifics.
* Is able to work with any java.util.Calendar passed to it, including those initialized with
Long.MIN_VALUE or MAX_VALUE.  Care is taken to check & avoid Calendar/Long underflow.
* Optimized calculation for dates after the "Gregorian Change Date" October 15th 1582, in
which I basically need to check for leap years & that's it.  Earlier dates use Calendar
directly with more overhead but will likely make this work with a variety of Calendars.
* toString() for a cell will use ISO-8601 output, including putting a leading "-" if it's
2BC or before.  1BC is actually the year "0000".  "*" means the universe / all-time.  There
is no date parsing in this patch; that is going to happen in a subsequent Solr FieldType.
 I might end up moving the code down here for convenience of non-Solr users though.
* The year range is divided into intermediate levels -- there are 1 million year intervals
and 1 thousand year intervals.  They are aligned at year 0000 (the year before 1AD).

It uses the changes to the SpatialPrefixTree API in LUCENE-5608 so it's still limited to trunk
for now. I want to make some more changes to that API still, before eventually back-porting
it all to 4x.

The patch references some changes in the various filters, which theoretically wouldn't have
to be modified for new SPTs.  It's pretty much just comments, and limiting an over-aggressive
assertion that couldn't universally hold.

> Index/search multi-valued time durations
> ----------------------------------------
>
>                 Key: LUCENE-5648
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5648
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/spatial
>            Reporter: David Smiley
>            Assignee: David Smiley
>         Attachments: LUCENE-5648.patch
>
>
> If you need to index a date/time duration, then the way to do that is to have a pair
of date fields; one for the start and one for the end -- pretty straight-forward. But if you
need to index a variable number of durations per document, then the options aren't pretty,
ranging from denormalization, to joins, to using Lucene spatial with 2D as described [here|http://wiki.apache.org/solr/SpatialForTimeDurations].
 Ideally it would be easier to index durations, and work in a more optimal way.
> This issue implements the aforementioned feature using Lucene-spatial with a new single-dimensional
SpatialPrefixTree implementation. Unlike the other two SPT implementations, it's not based
on floating point numbers. It will have a Date based customization that indexes levels at
meaningful quantities like seconds, minutes, hours, etc.  The point of that alignment is to
make it faster to query across meaningful ranges (i.e. [2000 TO 2014]) and to enable a follow-on
issue to facet on the data in a really fast way.
> I'll expect to have a working patch up this week.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message