lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shawn Heisey (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-10014) Log a warning when the number of fields in a core exceeds a configurable value
Date Fri, 20 Jan 2017 20:59:26 GMT

     [ https://issues.apache.org/jira/browse/SOLR-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shawn Heisey updated SOLR-10014:
--------------------------------
    Attachment: SOLR-10014.patch

Patch with an idea for how to implement the warning.  I can see that the initIndex method
has a "firstTime" boolean, but I don't think that method has  access to the objects needed
to get the field count ... so for now I'm not attempting to suppress the warning on reload.
 Also, the configuration option for solrconfig.xml hasn't been worked out yet, so the threshold
isn't configurable yet.  I'm pretty sure that I'm using the searcher object incorrectly, but
I'm not sure how to do it correctly.

> Log a warning when the number of fields in a core exceeds a configurable value
> ------------------------------------------------------------------------------
>
>                 Key: SOLR-10014
>                 URL: https://issues.apache.org/jira/browse/SOLR-10014
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 4.10.4
>            Reporter: Shawn Heisey
>            Priority: Minor
>         Attachments: SOLR-10014.patch
>
>
> When the number of fields in an index gets extremely large, major performance problems
can occur.  If the number of fields in a core exceeds a configurable number, with a default
somewhere around 10000, a warning should be logged when the SolrCore is first created.  A
decision needs to be made about whether to repeat the warning on core reload ... my instinct
is that it should NOT be repeated, but I can see where a repeat might have some value.  Logging
on reloads as well as startup would likely be easier.
> This was discovered by a Solr user who had a 420MB index with 650K documents, but their
applications were abusing dynamic fields to the point where they had about 2 million unique
fields in the index.  The small size of the index *should* have resulted in extremely fast
commit times, but commits were taking about 10 seconds because of what Lucene had to do to
handle all those fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message