lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tricia Williams <williams.tri...@gmail.com>
Subject Why isn't the DateField implementation of ISO 8601 broader?
Date Thu, 01 Oct 2009 07:53:31 GMT
Hi All,

    I'm working with data that has multiple date precisions most of 
which do not have a time associated with them, rather centuries (like 
1800's),  years (like 1867),  and years/months (like  1918-11).  I'm 
able to sort and search using a workaround where we store the date as a 
string CCYYMM where YYMM are optional.

    I was hoping to be able to tie this into the DateField type so that 
it becomes possible to facet on them without much work and duplication 
of data.  Unfortunately it requires the "cannonical representation of 
dateTime" which means the time part of the string is mandatory.

    My question is why isn't the DateField implementation of ISO 8601 
broader so that it could include YYYY and YYYYMM as acceptable date 
strings?  What would it take to do so?  Are there any work-arounds for 
faceting by century, year, month without creating new fields in my 
schema?  The last resort would be to create these new fields but I'm 
hoping to leverage the power of the DateField and the trie to replace 
range stuff.

Thanks,
Tricia

Some interesting observations from tinkering with the DateFieldTest:

    * 2003-03-00T00:00:00Z becomes 2003-02-28T00:00:00Z
    * 2008-03-00T00:00:00Z becomes 2008-02-29T00:00:00Z
    * 2003-00-00T00:00:00Z becomes 2002-11-30T00:00:00Z
    * 2000-00-00T00:00:00Z becomes 1999-11-30T00:00:00Z
    * 1979-00-31T00:00:00Z becomes 1978-12-31T00:00:00Z
    * 2005-04-00T00:00:00Z becomes 2005-03-31T00:00:00Z
    * 1850-10-00T00:00:00Z becomes 1850-09-30T00:00:00Z

The rounding /YEAR, /MONTH, etc artificially imposes extra precision 
that the original data wouldn't have.  In any case where months are zero 
weird rounding happens.

Mime
View raw message