lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bharani <bharani_...@yahoo.com>
Subject Re: Multiple Values -Structured?
Date Tue, 04 Sep 2007 14:55:41 GMT

No Size is not an issue - atleast for now. But i am thinking of implementing
some sort of duplicate removal based on field. I happen to look at this
thread
http://www.nabble.com/Group-results-by-field--tf3683765.html#a10296394

Tom mentions some changes to the code to do that so was thinking in those
lines too. Any idea how you can do this with out changes to solr?

Thanks
Bharani



Jed Reynolds-2 wrote:
> 
> Bharani wrote:
>> Hi,
>>
>> I have got two sets of document
>>
>> 1) Primary Document
>> 2) Occurrences of primary document
>>
>> Since there is no such thing as "join" i can either 
>>
>> a) Post the primary document with occurrences as multi valued field
>>  or
>> b) Post the primary document for every occurrences i.e. classic
>> de-normalized route
>>
>> My problem with 
>>
>> Option a) This works great as long as the occurrence is a single field
>> but
>> if i had a group of fields that describes the occurrence then the search
>> returns wrong results becuase of the nature of text search
>>
>> i.e <date>1 Jan 2007</date>
>> <type> review</type>
>>
>> <date> 2 Jan 2007 </date>
>> <type> revision</type>
>>
>> if i search for 2 Jan 2007 and <date> 1 Jan 2007 </date> i will get a
hit
>> (which is wrong)  becuase there is no grouping of fields to associate
>> date
>> and type as one unit. If i merge them as one entity then i cant use the
>> range quieries for date
>>
>> Option B) This would result in large number of documents and even if i
>> try
>> with index only and not store i am still have to deal with duplicate hit
>> -
>> becuase all i want is the primary document
>>
>>
>> Is there a better approach to the problem?
>>   
> 
> Are you concerned about the size of your index?
> 
> One of the difficulties that you're going to find with multi-valued 
> fields is that they are an unordered collection without relation. If you 
> have a document with a list of editors and revisions, the two fields 
> have no inherent correlation unless your application can extract it from 
> the data itself.
> 
> [doc]
>     [id]123[/id]
>     [str name=name]hello world[/str]
>     [array name=editor]
>         [str name=editor]Fred[/str]
>         [str name=editor]Bob[/str]
>     [/array]
>     [array name=revisiondate]
>        [date name=revisiondate]2006-01-01T00:00:00Z[/date]
>        [date name=revisiondate]2006-01-02T00:00:00Z[/date]
>     [/array]
> [/doc]
> 
> If your application can decipher that and do a slice on it showing a 
> revision...then brilliant! But if the multi-value fields are out of 
> order, that might make a significant different.
> 
> I would create a document per revision and take advantage of range 
> queries and sorting available at the query level.
> 
> 
> 
> 
> Jed
> 
> 

-- 
View this message in context: http://www.nabble.com/Multiple-Values--Structured--tf4370282.html#a12479721
Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message