lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bahaa Eldesouky <>
Subject QueryParser with CustomAnalyzer wrongly uses PatternReplaceCharFilter
Date Thu, 28 Apr 2016 09:54:22 GMT
 I am using org.apache.lucene.queryparser.classic.QueryParser in lucene
6.0.0 to parse queries using a CustomAnalyzer as shown below:

public static void testFilmAnalyzer() throws IOException, ParseException {
    CustomAnalyzer nameAnalyzer = CustomAnalyzer.builder()
                    "pattern", "(movie|film|picture).*",
                    "replacement", "")

    QueryParser qp = new QueryParser("name", nameAnalyzer);
    String[] strs = {"avatar film fiction", "avatar-film fiction",

    for (String str : strs) {
        System.out.println("Analyzing \"" + str + "\":");
        showTokens(str, nameAnalyzer);
        Query q = qp.parse(str);
        System.out.println("Parsed query of \"" + str + "\":");
        System.out.println(q + "\n");
private static void showTokens(String text, Analyzer analyzer) throws
IOException {
    StringReader reader = new StringReader(text);
    TokenStream stream = analyzer.tokenStream("name", reader);
    CharTermAttribute term = stream.addAttribute(CharTermAttribute.class);
    while (stream.incrementToken()) {
        System.out.print("[" + term.toString() + "]");

I get the following output, when I invoke testFilmAnalyzer():

Analyzing "avatar film fiction":[avatar]Parsed query of "avatar film
fiction":+name:avatar +name:fiction
Analyzing "avatar-film fiction":[avatar]Parsed query of "avatar-film
fiction":+name:avatar +name:fiction
Analyzing "avatar-film-fiction":[avatar]Parsed query of "avatar-film-fiction":

It seems like the analyzer uses the PatternReplaceCharFilter in its correct
intended order (i.e. before tokenization), while the QueryParser does so
afterwards. Does anyone have an explanation for that? Isn't that a bug?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message