lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: Faceted Dates
Date Tue, 09 Jan 2007 13:31:20 GMT
On 1/9/07, Ryan McKinley <ryantxu@gmail.com> wrote:
> I would like to use faceted browsing to group documents by year,
> month, and day.

I don't know what your particular use-case is, but this might be a job
for facet hierarchies.
http://www.nabble.com/Hierarchical-Facets--tf2560327.html#a7135353
No, there isn't anything implemented in Solr yet... but then again,
built in faceting didn't even exist in Solr until 4 months ago :-)
I'm pretty sure we could handle the computational requirements, it's
more a matter of defining useful generic semantics and the interface.

> Option 1:
> Add three fields, one for year, month, day.  Something like:
>
>  <field name="addedTime" type="date" indexed="true" stored="true" />
>  <field name="addedTimeYEAR" type="string" ... />
>  <field name="addedTimeMONTH" type="string" ... />
>  <field name="addedTimeDAY" type="string" ... />
>
> then use copyField to generate the various versions:
>  <copyField source="addedTime" dest="addedTimeYEAR"/>
>  <copyField source="addedTime" dest="addedTimeMONTH"/>
>  <copyField source="addedTime" dest="addedTimeDAY"/>
>
> this would somehow convert the original date format for each copy:
>  addedTime      = "2007-01-08T21:36:15.635Z"
>  addedTimeYEAR  = "2007"
>  addedTimeMONTH = "2007-01"
>  addedTimeDAY   = "2007-01-08"
>
> Perhaps this requires a custom FieldType for Y/M/D to convert the
> larger string to the smaller one.
>
> pros:
> * Can use SimpleFacets directly
> cons:
> * seems messy.  particularly since i have multiple fields i'd like to
> have the same behavior.

There's also a question of if you would really want a breakdown by
each day (if you had 10 years of data say) returned to the client.  It
starts to be a lot of data.  That's what made me think of a hierarchy
where you could start out at a higher level and drill down.  Of
course, that's possible with simple facets too I guess (via filtering)

> Option 2:
> Add an analyzer to the date field that adds multiple Tokens with
> various resolutions, then write a custom faceter that knows a string
> length 4=year, y=month, 10=day.  Or, perhaps it could look at the
> token name.

I don't think adding to the same field buys you much (anything?) over
adding to a different field.  In any case, you could also do simple
faceting on this field as-is if your client has knowledge of the
different lengths of strings.

> schema.xml:
>
>   <fieldtype name="fdate" class="solr.DateField">
>     <analyzer type="index" class="...DateFacetAnalyzer"/>
>   </fieldtype>
>
> DateFacetAnalyzer:
>  Token t = new Token( date, 0, date.length(), "original" );
>  t.setPositionIncrement( 0 );
>  tokens.add( t );
>
>  t = new Token( date, 0, 4, "year" );
>  t.setPositionIncrement( 0 );
>  tokens.add( t );
>
>  t = new Token( date, 0, 7, "month" );
>  t.setPositionIncrement( 0 );
>  tokens.add( t );
>
>  ...
>
> pros:
> * simple / reusable
> cons:
> * I don't fully understand how it would affect search & sorting
>
> Any thoughts / pointers / advice?
>
> thanks
> ryan
>

-Yonik

Mime
View raw message