lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Trejkaz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-7260) StandardQueryParser is over 100 times slower in v5 compared to v3
Date Thu, 16 Feb 2017 01:12:41 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868920#comment-15868920
] 

Trejkaz commented on LUCENE-7260:
---------------------------------

Lucene 6.3:

{noformat}
dt: 22307
dt: 21190
dt: 21004
dt: 20972
dt: 21435
dt: 21802
dt: 21487
dt: 21282
dt: 20886
dt: 21386
{noformat}


> StandardQueryParser is over 100 times slower in v5 compared to v3
> -----------------------------------------------------------------
>
>                 Key: LUCENE-7260
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7260
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/queryparser
>    Affects Versions: 5.4.1, 5.5.2, 6.3
>         Environment: Java 8u51
>            Reporter: Trejkaz
>              Labels: performance
>
> The following test code times parsing a large query.
> {code}
> import org.apache.lucene.analysis.KeywordAnalyzer;
> //import org.apache.lucene.analysis.core.KeywordAnalyzer;
> import org.apache.lucene.queryParser.standard.StandardQueryParser;
> //import org.apache.lucene.queryparser.flexible.standard.StandardQueryParser;
> import org.apache.lucene.search.BooleanQuery;
> public class LargeQueryTest {
>     public static void main(String[] args) throws Exception {
>         BooleanQuery.setMaxClauseCount(50_000);
>         StringBuilder builder = new StringBuilder(50_000*10);
>         builder.append("id:( ");
>         boolean first = true;
>         for (int i = 0; i < 50_000; i++) {
>             if (first) {
>                 first = false;
>             } else {
>                 builder.append(" OR ");
>             }
>             builder.append(String.valueOf(i));
>         }
>         builder.append(" )");
>         String queryString = builder.toString();
>         StandardQueryParser parser2 = new StandardQueryParser(new KeywordAnalyzer());
>         for (int i = 0; i < 10; i++) {
>             long t0 = System.currentTimeMillis();
>             parser2.parse(queryString, "nope");
>             long t1 = System.currentTimeMillis();
>             System.out.println(t1-t0);
>         }
>     }
> }
> {code}
> For Lucene 3.6.2, the timings settle down to 200~300 with the fastest being 207.
> For Lucene 5.4.1, the timings settle down to 20000~30000 with the fastest being 22444.
> So at some point, some change made the query parser 100 times slower. I would suspect
that it has something to do with how the list of children is now handled. Every time someone
gets the children, it copies the list. Every time someone sets the children, it walks through
to detach parent references and then reattaches them all again.
> If it were me, I would probably make these collections immutable so that I didn't have
to defensively copy them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message