lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Susan Rust <>
Subject remove from list
Date Wed, 23 Jun 2010 16:22:44 GMT
Hey SOLR folks -- There's too much info for me to digest, so please  
remove me from the email threads.

However, if we can build you a forum, bulletin board or other web- 
based tool, please let us know. For that matter, we would be happy to  
build you a new website.

Bill O'Connor is our CTO and the SOLR Redesign Lead. So we  
love SOLR! Let us know how we can support your efforts.

Susan Rust
VP of Client Services

If you wish to travel quickly, go alone
If you wish to travel far, go together
Achieve Internet
1767 Grand Avenue, Suite 2
San Diego, CA 92109

800-618-8777 x106
858-453-5760 x106

Susan-Rust (skype)
@Susan_Rust (twitter)
@Achieveinternet (twitter)
@drupalsandiego (San Diego Drupal Users' Group Twitter)

This message contains confidential information and is intended only  
for the individual named. If you are not the named addressee you  
should not disseminate, distribute or copy this e-mail. Please notify  
the sender immediately by e-mail if you have received this e-mail by  
mistake and delete this e-mail from your system. E-mail transmission  
cannot be guaranteed to be secure or error-free as information could  
be intercepted, corrupted, lost, destroyed, arrive late or incomplete,  
or contain viruses. The sender therefore does not accept liability for  
any errors or omissions in the contents of this message, which arise  
as a result of e-mail transmission. If verification is required please  
request a hard-copy version.

On Jun 23, 2010, at 1:52 AM, Mark Allan wrote:

> Cheers, Geert-Jan, that's very helpful.
> We won't always be searching with dates and we wouldn't want  
> duplicates to show up in the results, so your second suggestion  
> looks like a good workaround if I can't solve the actual problem.  I  
> didn't know about FieldCollapsing, so I'll definitely keep it in mind.
> Thanks
> Mark
> On 22 Jun 2010, at 3:44 pm, Geert-Jan Brits wrote:
>> Perhaps my answer is useless, bc I don't have an answer to your  
>> direct
>> question, but:
>> You *might* want to consider if your concept of a solr-document is  
>> on the
>> correct granular level, i.e:
>> your problem posted could be tackled (afaik) by defining a   
>> document being a
>> 'sub-event' with only 1 daterange.
>> So for each event-doc you have now, this is replaced by several sub- 
>> event
>> docs in this proposed situation.
>> Additionally each sub-event doc gets an additional field 'parent- 
>> eventid'
>> which maps to something like an event-id (which you're probably  
>> using) .
>> So several sub-event docs can point to the same event-id.
>> Lastly, all sub-event docs belonging to a particular event  
>> implement all the
>> other fields that you may have stored in that particular event-doc.
>> Now you can query for events based on data-rages like you  
>> envisioned, but
>> instead of returning events you return sub-event-docs. However  
>> since all
>> data of the original event (except the multiple dateranges) is  
>> available in
>> the subevent-doc this shouldn't really bother the client. If you  
>> need to
>> display all dates of an event (the only info missing from the  
>> returned
>> solr-doc) you could easily store it in a RDB and fetch it using the  
>> defined
>> parent-eventid.
>> The only caveat I see, is that possibly multiple sub-events with  
>> the same
>> 'parent-eventid' might get returned for a particular query.
>> This however depends on the type of queries you envision. i.e:
>> 1)  If you always issue queries with date-filters, and *assuming*  
>> that
>> sub-events of a particular event don't temporally overlap, you will  
>> never
>> get multiple sub-events returned.
>> 2)  if 1)  doesn't hold and assuming you *do* mind multiple sub- 
>> events of
>> the same actual event, you could try to use Field Collapsing on
>> 'parent-eventid' to only return the first sub-event per parent- 
>> eventid that
>> matches the rest of your query. (Note however, that Field  
>> Collapsing is a
>> patch at the moment.
>> Not sure if this helped you at all, but at the very least it was a  
>> nice
>> conceptual exercise ;-)
>> Cheers,
>> Geert-Jan
>> 2010/6/22 Mark Allan <>
>>> Hi all,
>>> Firstly, I apologise for the length of this email but I need to  
>>> describe
>>> properly what I'm doing before I get to the problem!
>>> I'm working on a project just now which requires the ability to  
>>> store and
>>> search on temporal coverage data - ie. a field which specifies a  
>>> date range
>>> during which a certain event took place.
>>> I hunted around for a few days and couldn't find anything which  
>>> seemed to
>>> fit, so I had a go at writing my own field type based on  
>>> solr.PointType.
>>> It's used as follows:
>>> schema.xml
>>>      <fieldType name="temporal" class="solr.TemporalCoverage"
>>> dimension="2" subFieldSuffix="_i"/>
>>>      <field name="daterange" type="temporal" indexed="true"  
>>> stored="true"
>>> multiValued="true"/>
>>> data.xml
>>>      <add>
>>>      <doc>
>>>      ...
>>>      <field name="daterange">1940,1945</field>
>>>      </doc>
>>>      </add>
>>> Internally, this gets stored as:
>>>  <arr name="daterange"><str>1940,1945</str></arr>
>>>  <int name="daterange_0_i">19400000</int>
>>>  <int name="daterange_1_i">19450000</int>
>>> In due course, I'll declare the subfields as a proper date type,  
>>> but in the
>>> meantime, this works absolutely fine.  I can search for an  
>>> individual date
>>> and Solr will check (queryDate > daterange_0 AND queryDate <  
>>> daterange_1 )
>>> and the correct documents are returned.  My code also allows the  
>>> user to
>>> input a date range in the query but I won't complicate matters  
>>> with that
>>> just now!
>>> The problem arises when a document has more than one "daterange"  
>>> field
>>> (imagine a news broadcast which covers a variety of topics and  
>>> hence time
>>> periods).
>>> A document with two daterange fields
>>>      <doc>
>>>      ...
>>>      <field name="daterange">19820402,19820614</field>
>>>      <field name="daterange">1990,2000</field>
>>>      </doc>
>>> gets stored internally as
>>>  <arr
>>> name="daterange"><str>19820402,19820614</str><str>1990,2000</str></

>>> arr>
>>>  <arr name="daterange_0_i"><int>19820402</int><int>19900000</int></

>>> arr>
>>>  <arr name="daterange_1_i"><int>19820614</int><int>20000000</int></

>>> arr>
>>> In this situation, searching for 1985 should yield zero results as  
>>> it is
>>> contained within neither daterange, however, the above document is  
>>> returned
>>> in the result set.  What Solr is doing is checking that the  
>>> queryDate (1985)
>>> is greater than *any* of the values in daterange_0 AND queryDate  
>>> is less
>>> than *any* of the values in daterange_1.
>>> How can I get Solr to respect the positions of each item in the  
>>> daterange_0
>>> and _1 arrays?  Ideally I'd like the search to use the following  
>>> logic, thus
>>> preventing the above document from being returned in a search for  
>>> 1985:
>>>      (queryDate > daterange_0[0] AND queryDate < daterange_1[0]) OR
>>> (queryDate > daterange_0[1] AND queryDate < daterange_1[1])
>>> Someone else had a very similar problem recently on the mailing  
>>> list with a
>>> multiValued PointType field but the thread went cold without a final
>>> solution.
>>> While I could filter the results when they get back to my  
>>> application
>>> layer, it seems like it's not really the right place to do it.
>>> Any help getting Solr to respect the positions of items in arrays  
>>> would be
>>> very gratefully received.
>>> Many thanks,
>>> Mark
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message