lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject RE: Reduce QueryComponent prepare time
Date Wed, 21 Nov 2012 08:41:06 GMT
Hi Mikhail,

Thanks for sharing your experiences. I'll look into the flexible query parser.

Markus
 
 
-----Original message-----
> From:Mikhail Khludnev <mkhludnev@griddynamics.com>
> Sent: Tue 20-Nov-2012 19:53
> To: solr-user@lucene.apache.org
> Subject: Re: Reduce QueryComponent prepare time
> 
> Markus,
> 
> It seems you faced the challenge of optimizing complex eDisMax code for
> your particular usecase, which is not so common. I can not help with these
> coding, just can share some experience: we have mind blowing queries too -
> they spawns many fields and enumerate many phrase shingles. We have similar
> contra intuitive hot spot - query parsing takes more than searching and
> faceting. But for our case dictionaries lookup - i.e. terms substitution
> and transformations are the main CPU consumption. We build our own query
> parser with something like
> http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/core/package-summary.html.
> This way, when you represent core query structure as a DOM-like nodes
> skeleton, and then transform them into particular queries instances, *might
> be more performant* (and *might be not* for you) than current eDismax.
> Nothing more useful from me.
> 
> Bye.
> 
> 
> On Tue, Nov 20, 2012 at 7:01 PM, Markus Jelsma
> <markus.jelsma@openindex.io>wrote:
> 
> > Hi,
> >
> > Profiling pointed me directly to the method i already suspected:
> > ExtendedDismaxQParser.parse(). I added manual timers in parts of the method
> > and made sure the timers add up to the QueryComponent prepare time. After
> > starting Solr there's one small part taking almost 100ms on a fast machine
> > with lots of memory, fortunately this is only once. KStemmer and the
> > loading of the KStemData and the ThaiWordFilter's init take the bulk of it.
> >
> >       ExtendedSolrQueryParser up =
> >         new ExtendedSolrQueryParser(this, IMPOSSIBLE_FIELD_NAME);
> >       up.addAlias(IMPOSSIBLE_FIELD_NAME,
> >                 tiebreaker, queryFields);
> >       addAliasesFromRequest(up, tiebreaker);
> >       up.setPhraseSlop(qslop);     // slop for explicit user phrase queries
> >       up.setAllowLeadingWildcard(true);
> >
> > After it's been running for some time two parts continue to take a lot of
> > time, parsing the query
> >
> >       if (parsedUserQuery == null) {
> >         sb = new StringBuilder();
> >         for (Clause clause : clauses) {
> >
> >         ....
> >
> >         if (parsedUserQuery instanceof BooleanQuery) {
> >           BooleanQuery t = new BooleanQuery();
> >           SolrPluginUtils.flattenBooleanQuery(t,
> > (BooleanQuery)parsedUserQuery);
> >           SolrPluginUtils.setMinShouldMatch(t, minShouldMatch);
> >           parsedUserQuery = t;
> >         }
> >       }
> >
> > and handing the phrase fields (pf, pf2, pf3):
> >
> >       if (allPhraseFields.size() > 0) {
> >         // full phrase and shingles
> >         for (FieldParams phraseField: allPhraseFields) {
> >           Map<String,Float> pf = new HashMap<String,Float>(1);
> >           pf.put(phraseField.getField(),phraseField.getBoost());
> >           addShingledPhraseQueries(query, normalClauses, pf,
> >           phraseField.getWordGrams(),tiebreaker, phraseField.getSlop());
> >         }
> >       }
> >
> > The problem is significant when having a lot of fields, the prepare time
> > is usually higher than the process times of query, highlight and facet
> > combined.
> >
> >
> >
> > -----Original message-----
> > > From:Mikhail Khludnev <mkhludnev@griddynamics.com>
> > > Sent: Mon 19-Nov-2012 12:52
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Reduce QueryComponent prepare time
> > >
> > > Markus,
> > >
> > > It's hard to suggest anything until you provide a profiler snapshot which
> > > says what it spends time in prepare for. As far as I know in prepare it
> > > parses queries e.g. we have a really heavy query parsers, but I don't
> > think
> > > it's really common.
> > >
> > >
> > > On Mon, Nov 19, 2012 at 3:08 PM, Markus Jelsma
> > > <markus.jelsma@openindex.io>wrote:
> > >
> > > > I'd also like to know which parts of the entire query constitute the
> > > > prepare time and if it would matter significantly if we extend the
> > edismax
> > > > plugin and hardcode the parameters we pass into (reusable) objects.
> > > >
> > > > Thanks,
> > > > Markus
> > > >
> > > > -----Original message-----
> > > > > From:Markus Jelsma <markus.jelsma@openindex.io>
> > > > > Sent: Fri 16-Nov-2012 15:57
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: Reduce QueryComponent prepare time
> > > > >
> > > > > Hi,
> > > > >
> > > > > We're seeing high prepare times for the QueryComponent, obviously
> > due to
> > > > the vast amount of field and queries. It's common to have a prepare
> > time of
> > > > 70-80ms while the process times drop significantly due to warmed
> > searchers,
> > > > OS cache etc. The prepare time is a recurring issue and i'd hope if
> > there
> > > > are people here that can share some thoughts or hints.
> > > > >
> > > > > We're using a recent check out on a 10 node test cluster with SSD's
> > > > (although this is no IO issue) and edismax on about a hundred different
> > > > fields, this includes phrase searches over most of those fields and
> > > > SpanFirst queries on about 25 fields.  We'd like to see how we can
> > avoid
> > > > doing the same prepare procedure over and over again ;)
> > > > >
> > > > > Thanks,
> > > > > Markus
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > > Principal Engineer,
> > > Grid Dynamics
> > >
> > > <http://www.griddynamics.com>
> > >  <mkhludnev@griddynamics.com>
> > >
> >
> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
> 
> <http://www.griddynamics.com>
>  <mkhludnev@griddynamics.com>
> 

Mime
View raw message