lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcin Rzewucki <>
Subject Re: SOLR - Documents with large number of fields ~ 450
Date Fri, 22 Mar 2013 13:55:08 GMT

I have a collection with more than 4K fields, but mostly Trie*Fields types.
It is used for faceting,sorting,searching and statsComponent. It works
pretty fine on Amazon 4xm1.large (7.5GB RAM) EC2 boxes. I'm using
SolrCloud, multi A-Z setup and ephemeral storage. Index is managed by mmap,
4GB for Java heap, CMS for GC. Currently there is 800K records, but will be
about 2m. Queries response is much longer (couple to dozen of seconds)
during bulk loading, but this is rather typical as I think. Indexing takes
much much longer than in case of records with less number of fields. I'm
sending updates in 5MB batches. No OOM issues.

Regarding DocValues: I believe they are great improvement for faceting, but
they are annoying because of their limitations: as far as I checked a field
has to be required or to have default value which is not possible in my
case (I can't set some figures to 0 by default as it may impact other
results displayed to the end user, which is not good). I wish it could


On 21 March 2013 07:56, <
> wrote:

> Hello All,
> Scenario:
> My data model consist of approx. 450 fields with different types of data.
> We
> want to include each field for indexing as a result it will create a single
> SOLR document with *450 fields*. The total of number of records in the data
> set is *755K*. We will be using the features like faceting and sorting on
> approx. 50 fields.
> We are planning to use SOLR 4.1. Following is the hardware configuration of
> the web server that we plan to install SOLR on:-
> CPU: 2 x Dual Core (4 cores) | RAM: 12GB | Storage: 212 GB
> Questions :
> 1)What's the best approach when dealing with documents with large number of
> fields. What's the drawback of having a single document with a very large
> number of fields. Does SOLR support documents with large number of fields
> as
> in my case?
> 2)Will there be any performance issue if i define all of the 450 fields for
> indexing? Also if faceting is done on 50 fields with document having large
> number of fields and huge number of records?
> 3)The name of the fields in the data set are quiet lengthy around 60
> characters. Will it be a problem defining fields with such a huge name in
> the schema file? Is there any best practice to be followed related to
> naming
> convention? Will big field names create problem during querying?
> Thanks!
> --
> View this message in context:
> Sent from the Solr - User mailing list archive at

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message