lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Britske <gbr...@gmail.com>
Subject Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true
Date Tue, 29 Jul 2008 06:16:34 GMT

That sounds interesting. Let me explain my situation, which may be a variant
of what you are proposing. My documents contain more than 10.000 fields, but
these fields are divided like: 

1. about 20 general purpose fields, of which more than 1 can be selected in
a query. 
2. about 10.000 fields of which each query based on some criteria exactly
selects one field. 

Obviously 2. is killing me here, but given the above perhaps it would be
possible to make 10.000 vertical slices/ indices, and based on the field to
be selected (from point 2) select the slice/index to search in. 
The 10.000 indices would run on the same box, and the 20 general purpose
fields have have to be copied to all slices (which means some increase in
overall index size, but managable), but this would give me far more
reasonable sized and compact documents, which would mean (documents are far
more likely to be in the same cached slot, and be accessed in the same disk
-seek. 

Does this make sense? Am I correct that this has nothing to do with
Distributed search, since that really is all about horizontal splitting /
sharding of the index, and what I'm suggesting is splitting vertically? Is
there some other part of Solr that I can use for this, or would it be all
home-grown?

Thanks,
Britske


Mike Klaas wrote:
> 
> Another possibility is to partition the stored fields into a  
> frequently-accessed set and a full set.  If the frequently-accessed  
> set is significantly smaller (in terms of # bytes), then the documents  
> will be tightly-packed on disk and the os caching will be much more  
> effective given the same amount of ram.
> 
> The situation you are experiencing is one-seek-per-doc, which is  
> performance death.
> 
> -Mike
> 
> On 28-Jul-08, at 1:34 PM, Yonik Seeley wrote:
> 
>> That's a bit too tight to have *all* of the index cached...your best
>> bet is to go to 4GB+, or figure out a way not to have to retrieve so
>> many stored fields.
>>
>> -Yonik
>>
>> On Mon, Jul 28, 2008 at 4:27 PM, Britske <gbrits@gmail.com> wrote:
>>>
>>> Size on disk is 1.84 GB (of which 1.3 GB sits in FDT files if that  
>>> matters)
>>> Physical RAM is 2 GB with -Xmx800M set to Solr.
>>>
>>>
>>> Yonik Seeley wrote:
>>>>
>>>> That high of a difference is due to the part of the index containing
>>>> these particular stored fields not being in OS cache.  What's the  
>>>> size
>>>> on disk of your index compared to your physical RAM?
>>>>
>>>> -Yonik
>>>>
>>>> On Mon, Jul 28, 2008 at 4:10 PM, Britske <gbrits@gmail.com> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> For some queries I need to return a lot of rows at once (say 100).
>>>>> When performing these queries I notice a big difference between  
>>>>> qTime
>>>>> (which
>>>>> is mostly in the 15-30 ms range due to caching) and total time  
>>>>> taken to
>>>>> return the response (measured through SolrJ's elapsedTime), which  
>>>>> takes
>>>>> between 500-1600 ms.
>>>>>
>>>>> For queries which return less rows the difference becomes less big.
>>>>>
>>>>> I presume (after reading some threads in the past) that this is  
>>>>> due to
>>>>> solr
>>>>> constructing and streaming the response (which includes  
>>>>> retrieving the
>>>>> stored fields) , which is something that is not calculated in  
>>>>> qTime.
>>>>>
>>>>> Documents have a lot of stored fields (more than 10.000), but at  
>>>>> any
>>>>> given
>>>>> query a maximum of say 20 are returned (through fl-field ) or  
>>>>> used (as
>>>>> part
>>>>> of filtering, faceting, sorting)
>>>>>
>>>>> I would have thought that enabling enableLazyFieldLoading for this
>>>>> situation
>>>>> would mean a lot, since so many stored fields can be skipped, but I
>>>>> notice
>>>>> no real difference in measuring total elapsed time (or qTime for  
>>>>> that
>>>>> matter).
>>>>>
>>>>> Am I missing something here? What criteria would need to be met  
>>>>> for a
>>>>> field
>>>>> to not be loaded for instance? Should I see a big performance  
>>>>> boost in
>>>>> this
>>>>> situation?
>>>>>
>>>>> Thanks,
>>>>> Britske
>>>>> --
>>>>> View this message in context:
>>>>> http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698590.html
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698909.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18706099.html
Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message