lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Britske <gbr...@gmail.com>
Subject Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true
Date Thu, 31 Jul 2008 15:26:41 GMT

no, I'm using dynamic fields, they've been around for a pretty long time. 
I use int-values in the 10k fields for filtering and sorting. On top of that
I use a lot of full-text filtering on the other fields, as well as faceting,
etc. 

I do understand that, at first glance, it seems possible to use multivalued
fields, but with multivalued fields it's not possible to pinpoint the exact
value within the multivalued field that I need. Consider the case with 1
multi-valued field, category, as you called it, which would have at most 10k
fields. The meaning of these values within the field are completely lost,
although it is a requirement to fetch products (thus values in the
multivalued field)  given a specific set of criteria. In other words, there
is no way of getting a specific value from a multivalued field given a set
of criteria.  Now, compare that with my current design in which these
criteria pinpoint a specific field / column to use and the difference should
be clear. 

regards,
Britske


Funtick wrote:
> 
> 
> Yes, it should be extremely simple! I simply can't understand how you
> describe it:
> 
> Britske wrote:
>> 
>> Rows in solr represent productcategories. I will have up to 100k of them. 
>> 
>> - Each product category can have 10k products each. These are encoded as
>> the 10k columns / fields (all 10k fields are int values) 
>>   
>> - At any given at most 1 product per productcategory is returned,
>> (analoguous to selecting 1 out of 10k columns). (This is the requirements
>> that makes this scheme possible) 
>> 
>> -products in the same column have certain characteristics in common,
>> which are encoded in the column name (using dynamic fields). So the
>> combination of these characteristics uniquely determines 1 out of 10k
>> columns. When the user hasn't supplied all characteristics good defaults
>> for these characteristics can be chosen, so a column can always be
>> determined. 
>> 
>> - on top of that each row has 20 productcategory-fields (which all
>> possible 10k products of that category share). 
>> 
> 
> 1. You can't really define 10.000 columns; you are probably using
> multivalued field for that. (sorry if I am not familiar with
> newest-greatest features of SOLR such as 'dynamic fields')
> 
> 2. You are trying to pass to Lucene 'normalized data'
> - But it is indeed the job of Lucene, to normalize data!
> 
> 3. All 10k fields are int values!? Lucene is designed for full-text
> search... are you trying to use Lucene instead of a database?
> 
> Sorry if I don't understand your design...
> 
> 
> 
> 
> Britske wrote:
>> 
>> 
>> 
>> Funtick wrote:
>>> 
>>> 
>>> Britske wrote:
>>>> 
>>>> - Rows in solr represent productcategories. I will have up to 100k of
>>>> them. 
>>>> - Each product category can have 10k products each. These are encoded
>>>> as the 10k columns / fields (all 10k fields are int values) 
>>>> 
>>> 
>>> You are using multivalued fields, you are not using 10k fields. And 10k
>>> is huge.
>>> 
>>> Design is wrong... you should define two fileds only: <Category,
>>> Product>. Lucene will do the rest.
>>> 
>>> -Fuad
>>> 
>> 
>> ;-). Well I wish it was that simple. 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18757094.html
Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message