lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick D <ndrake0...@gmail.com>
Subject Re: Synonym expansions w/ phrase slop exhausting memory after upgrading to SOLR 7
Date Thu, 19 Dec 2019 02:47:40 GMT
Michael,

Thank you so much, that was extremely helpful. My googlefu wasn't good
enough I guess.

1. Was my initial fix just to stop it from exploding.

2. Will be the perm solutions for now until we can get some things squared
away for 8.0.

Sounds like even in 8 there is a problem with any graph query expansion
potential still growing rather large but it just won't consume all
available memory, is that correct?

One final question, why would the maxbooleanqueries value in the solrconfig
still apply? Reading through all the jiras I thought that was supposed to
still be a fail safe, did I miss something?

Thanks again for your help,

Nick

On Wed, Dec 18, 2019, 8:10 AM Michael Gibney <michael@michaelgibney.net>
wrote:

> This is related to this issue:
> https://issues.apache.org/jira/browse/SOLR-13336
>
> Also tangentially relevant:
> https://issues.apache.org/jira/browse/LUCENE-8531
> https://issues.apache.org/jira/browse/SOLR-12243
>
> I think your options include:
> 1. setting slop=0, which restores SpanNearQuery as the graph phrase
> query implementation (see LUCENE-8531)
> 2. downgrading to 7.5 would avoid the OOM, but would cause graph
> phrase queries to be effectively ignored (see SOLR-12243)
> 3. upgrade to 8.0, which will restore the failsafe maxBooleanClauses,
> avoiding OOM but returning an error code for affected queries (which
> in your case sounds like most queries?) (see SOLR-13336)
>
> Michael
>
> On Tue, Dec 17, 2019 at 4:16 PM Nick D <ndrake0027@gmail.com> wrote:
> >
> > Hello All,
> >
> > We recently upgraded from Solr 6.6 to Solr 7.7.2 and recently had spikes
> in
> > memory that eventually caused either an OOM or almost 100% utilization of
> > the available memory. After trying a few things, increasing the JVM heap,
> > making sure docValues were set for all Sort, facet fields (thought maybe
> > the fieldCache was blowing up), I was able to isolate a single query that
> > would cause the used memory to become fully exhausted and effectively
> > render the instance dead. After applying a timeAllowed  value to the
> query
> > and reducing the query phrase (system would crash on without throwing the
> > warning on longer queries containing synonyms). I was able to idenitify
> the
> > following warning in the logs:
> >
> > o.a.s.s.SolrIndexSearcher Query: <____very long synonym expansion____>
> >
> > the request took too long to iterate over terms. Timeout: timeoutAt:
> > 812182664173653 (System.nanoTime(): 812182715745553),
> > TermsEnum=org.apache.lucene.codecs.blocktree.SegmentTermsEnum@7a0db441
> >
> > I have narrowed the problem down to the following:
> > the way synonyms are being expaneded along with phrase slop.
> >
> > With a ps=5 I get 4096 possible permutations of the phrase being searched
> > with because of synonyms, looking similar to:
> > ngs_title:"bereavement leave type build bereavement leave type data p"~5
> >  ngs_title:"bereavement leave type build bereavement bereavement type
> data
> > p"~5
> >  ngs_title:"bereavement leave type build bereavement jury duty type data
> > p"~5
> >  ngs_title:"bereavement leave type build bereavement maternity leave type
> > data p"~5
> >  ngs_title:"bereavement leave type build bereavement paternity type data
> > p"~5
> >  ngs_title:"bereavement leave type build bereavement paternity leave type
> > data p"~5
> >  ngs_title:"bereavement leave type build bereavement adoption leave type
> > data p"~5
> >  ngs_title:"bereavement leave type build jury duty maternity leave type
> > data p"~5
> >  ngs_title:"bereavement leave type build jury duty paternity type data
> p"~5
> >  ngs_title:"bereavement leave type build jury duty paternity leave type
> > data p"~5
> >  ngs_title:"bereavement leave type build jury duty adoption leave type
> data
> > p"~5
> >  ngs_title:"bereavement leave type build jury duty absence type data p"~5
> >  ngs_title:"bereavement leave type build maternity leave leave type data
> > p"~5
> >  ngs_title:"bereavement leave type build maternity leave bereavement type
> > data p"~5
> >  ngs_title:"bereavement leave type build maternity leave jury duty type
> > data p"~5
> >
> > ....
> >
> > Previously in Solr 6 that same query, with the same synonyms (and query
> > analysis chain) would produce a parsedQuery like when using a &ps=5:
> > DisjunctionMaxQuery(((ngs_field_description:\"leave leave type build
> leave
> > leave type data ? p leave leave type type.enabled\"~5)^3.0 |
> > (ngs_title:\"leave leave type build leave leave type data ? p leave leave
> > type type.enabled\"~5)^10.0)
> >
> > The expansion wasn't being applied to the added disjunctionMaxQuery to
> when
> > adjusting rankings with phrase slop.
> >
> > In general the parsedqueries between 6 and 7 are differnet, with some new
> > `spanNears` showing but they don't create the memory consumpution issues
> > that I have seen when a large synonym expansion is happening along w/
> using
> > a PS parameter.
> >
> > I didn't see much in terms on release notes changes for synonym changes
> > (outside of SOW=false being the default for version . 7).
> >
> > The field being opertated on has the following query analysis chain:
> >
> >  <analyzer type="query">
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt"/>
> >         <filter class="solr.SynonymGraphFilterFactory"
> > synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >       </analyzer>
> >
> > Not sure if there is a change in phrase slop that now takes synonyms into
> > account and if there is way to disable that kind of expansion or not. I
> am
> > not sure if it is related to SOLR-10980
> > <https://issues.apache.org/jira/plugins/servlet/mobile#issue/SOLR-10980>
> or
> > not, does seem to be related,  but referenced Solr 6 which does not do
> the
> > expansion.
> >
> > Any help would be greatly appreciated.
> >
> > Nick
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message