lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jed Reynolds <li...@benrey.is-a-geek.net>
Subject Re: Multiple Values -Structured?
Date Tue, 04 Sep 2007 04:37:32 GMT
Bharani wrote:
> Hi,
>
> I have got two sets of document
>
> 1) Primary Document
> 2) Occurrences of primary document
>
> Since there is no such thing as "join" i can either 
>
> a) Post the primary document with occurrences as multi valued field
>  or
> b) Post the primary document for every occurrences i.e. classic
> de-normalized route
>
> My problem with 
>
> Option a) This works great as long as the occurrence is a single field but
> if i had a group of fields that describes the occurrence then the search
> returns wrong results becuase of the nature of text search
>
> i.e <date>1 Jan 2007</date>
> <type> review</type>
>
> <date> 2 Jan 2007 </date>
> <type> revision</type>
>
> if i search for 2 Jan 2007 and <date> 1 Jan 2007 </date> i will get a hit
> (which is wrong)  becuase there is no grouping of fields to associate date
> and type as one unit. If i merge them as one entity then i cant use the
> range quieries for date
>
> Option B) This would result in large number of documents and even if i try
> with index only and not store i am still have to deal with duplicate hit -
> becuase all i want is the primary document
>
>
> Is there a better approach to the problem?
>   

Are you concerned about the size of your index?

One of the difficulties that you're going to find with multi-valued 
fields is that they are an unordered collection without relation. If you 
have a document with a list of editors and revisions, the two fields 
have no inherent correlation unless your application can extract it from 
the data itself.

[doc]
    [id]123[/id]
    [str name=name]hello world[/str]
    [array name=editor]
        [str name=editor]Fred[/str]
        [str name=editor]Bob[/str]
    [/array]
    [array name=revisiondate]
       [date name=revisiondate]2006-01-01T00:00:00Z[/date]
       [date name=revisiondate]2006-01-02T00:00:00Z[/date]
    [/array]
[/doc]

If your application can decipher that and do a slice on it showing a 
revision...then brilliant! But if the multi-value fields are out of 
order, that might make a significant different.

I would create a document per revision and take advantage of range 
queries and sorting available at the query level.




Jed

Mime
View raw message