lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Odysci <ody...@gmail.com>
Subject Re: Search Performance and omitNorms
Date Thu, 05 Dec 2019 12:34:21 GMT
Hi Erick,
thanks for the reply.
Just to follow up, I'm using "unified" highlighter (fastVector does not
work for my purposes). I search and highlight on a multivalued string
string field which contains small strings (usually less than 200 chars).
This multivalued field is subject to various processors (tokenizer, word
delimiter, stemming), and all termVectors, termPositions, termOffsets are
"true".
This is what I'm using:

------------------ schema ------------------
   <fieldType name="documentSearchP"
class="solr.TextField" positionIncrementGap="100" omitNorms="false">
        <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory" />
            <filter class="solr.SynonymGraphFilterFactory"
synonyms="synonyms_p.txt"
                ignoreCase="true" expand="false" />
            <filter class="solr.FlattenGraphFilterFactory" />
            <filter class="solr.ASCIIFoldingFilterFactory" />
            <filter class="solr.WordDelimiterGraphFilterFactory"
            splitOnCaseChange="0" splitOnNumerics="0"
stemEnglishPossessive="0"
            generateWordParts="1" generateNumberParts="1" catenateWords="0"
            catenateNumbers="0" catenateAll="1" preserveOriginal="1"/>
            <filter class="solr.FlattenGraphFilterFactory" />
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="solr.KeywordMarkerFilterFactory"
                protected="protwords.txt"/>
            <filter class="solr.PortugueseLightStemFilterFactory" />
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory" />
            <filter class="solr.SynonymGraphFilterFactory"
synonyms="synonyms_p.txt"
                ignoreCase="true" expand="false" />
            <filter class="solr.ASCIIFoldingFilterFactory" />
            <filter class="solr.WordDelimiterGraphFilterFactory"
            splitOnCaseChange="0" splitOnNumerics="0"
stemEnglishPossessive="0"
            generateWordParts="1" generateNumberParts="1" catenateWords="0"
            catenateNumbers="0" catenateAll="1" preserveOriginal="1"/>
            <filter class="solr.LowerCaseFilterFactory" />
            <filter
class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
            <filter class="solr.PortugueseLightStemFilterFactory" />
        </analyzer>
    </fieldType>

    <dynamicField name="*_msearchp" type="documentSearchP" indexed="true"
stored="true" required="false" multiValued="true"
        storeOffsetsWithPositions="true" termVectors="true"
termPositions="true" termOffsets="true" />

------------------ schema ------------------

And the java code I set the following params. Considering the multivalued
field above is called "text_msearchp")

SolrQuery solrQ = new SolrQuery();
solrQ.setFilterQueries( -- set some filters --);
solrQ.setStart(0);
solrQ.setRows( -- set max rows --);
solrQ.setQuery("text_msearchp"+":(\"+string_being_searched+ "\")");
// ativate highlight
solrQ.setHighlight(true);
solrQ.setHighlightSnippets(500);   // normally this number is low

// set highligher type
solrQ.setParam("hl.method", "unified");
// set highlight field to be the same as the search field
solrQ.setParam("hl.fl", "text_msearchp");
//Seta o termo que irá gerar o highlight
solrQ.setParam("hl.q", "text_msearchp"+":(\"+string_being_searched+ "\")");

----------------------------------------------------------------------------

Still, my tests indicate a significant speed up using omitNorms="false".
Best,

Reinaldo

On Tue, Dec 3, 2019 at 6:35 PM Erick Erickson <erickerickson@gmail.com>
wrote:

> I suspect this is spurious. Norms are just an encoding
> of the length of a field, offhand I have no clue how having
> them (or not) would affect highlighting at all.
>
> Term _vectors_ OTOH could have a major impact. If
> FastVectorHighlighter is not used, the highlighter has
> to re-analyze the text in order to highlight, and if you’re
> highlighting in large text fields that can be very expensive.
>
> Norms, aren’t relevant there….
>
> So let’s see the full highlighter configuration you have, along
> with the field definition for the field you’re highlighting on.
>
> Best,
> Erick
>
> > On Dec 3, 2019, at 4:27 PM, Odysci <odysci@gmail.com> wrote:
> >
> > I'm using solr-8.3.1 on a solrcloud set up with 2 solr nodes and 2 ZK
> nodes.
> > I was experiencing very slow search-with-highlighting on a index that had
> > 'omitNorms="true"' on all fields.
> > At the suggestion of a stackoverflow post, I changed all fields to be
> > 'omitNorms="false"' and the search-with-highlight time came down to about
> > 1/10th of what it was!!!
> >
> > This was a relatively small index and I had no issues with memory
> increase.
> > Now my question is whether I should expect the same speed up on regular
> > search calls, or search with only filters (no query)?
> > This would be on a different, much larger index - and I do want to incur
> > the memory increase unless the search is significantly faster.
> > Does anyone have any experience in comparing search speed using
> "omitNorms"
> > true or false in regular search (non-highlight)?
> > Thanks!
> >
> > Reinaldo
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message