lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Problems creating index for suggestions
Date Tue, 04 Apr 2017 23:05:42 GMT
Something's indeed not what I'd expect here. One note: buildOnCommit
will rebuild the suggester every time the index has a document
committed _anywhere_. So if there's any activity at all in terms of
indexing your suggester is being built. I.e. if you have your
autocommit interval set to 1 minute and are actively indexing, your
suggester gets rebuilt every minute.

But that's not your problem. How big is the index this suggester is
part of? You say 8 documents. Exclusive of the suggester parts of the
index, how big is the rest of your index on disk?

The suggester re-reads all of the stored values in your entire base
index for the field _sugerencia_ to build itself. So I'm guessing that
when you say the index is 8 documents it's not quite what you think it
is.

On the admin screen, what are numDocs and maxDocs for the index in question?

Best,
Erick

On Tue, Apr 4, 2017 at 2:11 PM, Alexis Aravena Silva
<aaravena@itsofteg.com> wrote:
> Hi,
>
>
> I'm creating an index for suggestions, when I rebuild the index with 8 documents, Solr
creates a temp file that consumes over 20GB in the process and It takes more than 10 minutes
in reindex, what is the problem?, It's illogic that Solr takes so long and consumes such size
of my disk:
>
>
>
> Filed Type Definition:
>
>
> <fieldType name="text_suggestion" class="solr.TextField" positionIncrementGap="100"
multiValued="true">
>       <analyzer type="index">
>         <tokenizer class="solr.LowerCaseTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
/>
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15"
/>
>         <filter class="solr.ASCIIFoldingFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.LowerCaseTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
/>
>         <filter class="solr.ASCIIFoldingFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
>
> Suggester Configuration:
>
>
> <searchComponent name="suggest" class="solr.SuggestComponent">
>     <lst name="suggester">
>       <str name="name">fuzzySuggester</str>
>       <str name="lookupImpl">FuzzyLookupFactory</str>
>       <str name="indexPath">fuzzy_suggestions</str>
>       <str name="dictionaryImpl">DocumentDictionaryFactory</str>
>       <str name="field">_sugerencia_</str>
>       <str name="payloadField">idTipoRegistro</str>
>       <str name="suggestAnalyzerFieldType">text_suggestion</str>
>       <str name="buildOnStartup">false</str>
>       <str name="buildOnCommit">true</str>
>     </lst>
>     <lst name="suggester">
>       <str name="name">infixSuggester</str>
>       <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
>       <str name="indexPath">infix_suggestions</str>
>       <str name="dictionaryImpl">DocumentDictionaryFactory</str>
>       <str name="field">_sugerencia_</str>
>       <str name="payloadField">idTipoRegistro</str>
>       <str name="suggestAnalyzerFieldType">text_suggestion</str>
>       <str name="buildOnStartup">false</str>
>       <str name="buildOnCommit">true</str>
>     </lst>
>   </searchComponent>
>   <requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
>     <lst name="defaults">
>       <str name="suggest">true</str>
>       <str name="suggest.dictionary">infixSuggester</str>
>       <str name="suggest.dictionary">fuzzySuggester</str>
>       <str name="suggest.onlyMorePopular">true</str>
>       <str name="suggest.count">10</str>
>       <str name="suggest.collate">true</str>
>     </lst>
>     <arr name="components">
>       <str>suggest</str>
>     </arr>
>   </requestHandler>
>
>
>
> I rebuild the suggestions once by week, that's why I set buildOnCommit = true.
>
>
> Regards.

Mime
View raw message