lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From xavi jmlucjav <jmluc...@gmail.com>
Subject Re: any changes about limitations on huge number of fields lately?
Date Sat, 30 May 2015 20:27:06 GMT
Thanks Toke for the input.

I think the plan is to facet only on class_u1, class_u2 for queries from
user1, etc. So faceting would not happen on all fields on a single query.
But still.

I did not design the schema, just found out about the number of fields and
advised again that, when they asked for a second opinion. We did not get to
discuss a different schema, but if we get to this point I will take that
plan into consideration for sure.

xavi

On Sat, May 30, 2015 at 10:17 PM, Toke Eskildsen <te@statsbiblioteket.dk>
wrote:

> xavi jmlucjav <jmlucjav@gmail.com> wrote:
> > They reason for such a large number of fields:
> > - users create dynamically 'classes' of documents, say one user creates
> 10
> > classes on average
> > - for each 'class', the fields are created like this:
> "unique_id_"+fieldname
> > - there are potentially hundreds of thousands of users.
>
> Switch to a scheme where you control the names of fields outside of Solr,
> but share the fields internally:
>
> User 1 has 10 custom classes: u1_a, u1_b, u1_c, ... u1_j
> Internally they are mapped to class1, class2, class3, ... class10
>
> User 2 uses 2 classes: u2_horses, u2_elephants
> Internally they are mapped to class1, class2
>
> When User 2 queries field u2_horses, you rewrite the query to use class1
> instead.
>
> > There is faceting in each users' fields.
> > So this will result in >1M fields, very sparsely populated.
>
> If you are faceting on all of them and if you are not using DocValues,
> this will explode your memory requirements with vanilla Solr: UnInverted
> faceting maintains separate a map from all documentIDs to field values
> (ordinals for Strings) for _all_ the facet fields. Even if you only had 10
> million documents and even if your 1 million facet fields all had just 1
> value, represented by 1 bit, it would still require 10M * 1M * 1 bits in
> memory, which is 10 terabyte of RAM.
>
> - Toke Eskildsen
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message