lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Klaas <mike.kl...@gmail.com>
Subject Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true
Date Tue, 29 Jul 2008 20:16:46 GMT
On 28-Jul-08, at 11:16 PM, Britske wrote:

>
> That sounds interesting. Let me explain my situation, which may be a  
> variant
> of what you are proposing. My documents contain more than 10.000  
> fields, but
> these fields are divided like:
>
> 1. about 20 general purpose fields, of which more than 1 can be  
> selected in
> a query.
> 2. about 10.000 fields of which each query based on some criteria  
> exactly
> selects one field.
>
> Obviously 2. is killing me here, but given the above perhaps it  
> would be
> possible to make 10.000 vertical slices/ indices, and based on the  
> field to
> be selected (from point 2) select the slice/index to search in.
> The 10.000 indices would run on the same box, and the 20 general  
> purpose
> fields have have to be copied to all slices (which means some  
> increase in
> overall index size, but managable), but this would give me far more
> reasonable sized and compact documents, which would mean (documents  
> are far
> more likely to be in the same cached slot, and be accessed in the  
> same disk
> -seek.

Are all 10k values equally-likely to be retrieved?

> Does this make sense?

Well, I would probably split into two indices, one containing the 20  
fields and one containing the 10k.  However, if the 10k fields are  
equally likely to be chosen, this will not help in the long term,  
since the working set of disk blocks is still going to be all of them.

> Am I correct that this has nothing to do with
> Distributed search, since that really is all about horizontal  
> splitting /
> sharding of the index, and what I'm suggesting is splitting  
> vertically? Is
> there some other part of Solr that I can use for this, or would it  
> be all
> home-grown?

There is some stuff that is coming down the pipeline in lucene, but  
nothing is currently there.  Honestly, it sounds like these extra  
fields should just be stored in a separate file/database.  I also  
wonder if solving the underlying problem really requires storing 10k  
values per doc (you haven't given us many clues in this regard)?

-Mike

Mime
View raw message