jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chetan Mehrotra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-2301) QueryEngine should not tokenize fulltext expression by default
Date Mon, 08 Dec 2014 11:17:12 GMT

    [ https://issues.apache.org/jira/browse/OAK-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237775#comment-14237775
] 

Chetan Mehrotra commented on OAK-2301:
--------------------------------------

Had to tweak it a bit to work for me
* Changed the default impl in FullTextVisitorBase. Otherwise would get a stckoverflow unless
I override the method in places where I extend {{FullTextVisitorBase}}. This also makes the
change backward compatible for existing users. Only code which wants to make use of new support
would override this vistor callback
{code}
        @Override
        public boolean visit(FullTextContains contains) {
            return contains.getBase().accept(this);
        }
{code}
* In most cases we would need to convert a FullTextContains to FullTextTerm as same code is
to be invoked. So added a method to FullTextContains. One thing I was not sure about was the
isNot flag
{code}
    public FullTextTerm getAsFullTextTerm(){
        return new FullTextTerm(propertyName, rawText, false, true, null);
    }
{code}

With this following test case would pass in {{LucenePropertyIndexTest}}

{code}
    @Test
    public void fulltextSearchWithCustomAnalyzer() throws Exception{
        Tree idx = createFulltextIndex(root.getTree("/"), "test");
        TestUtil.useV2(idx);

        Tree anl = idx.addChild(ANALYZERS).addChild(ANL_DEFAULT);
        anl.addChild(ANL_TOKENIZER).setProperty(ANL_NAME, "whitespace");
        anl.addChild(ANL_FILTERS).addChild("stop");

        Tree test = root.getTree("/").addChild("test");
        test.setProperty("foo", "fox jumping");
        root.commit();

        assertQuery("select * from [nt:base] where CONTAINS(*, 'fox was jumping')", asList("/test"));
    }
{code}

However faced another issue. If the original text which was indexed had _fox is jumping_ and
text I pass for test is _fox was jumping_ then test was failing. I expected that stop word
should work both ways. However currently {{LuceneIndex}} creates a {{PhraseQuery}} [1] for
all cases. I think it should create a phrase query only if original text was quoted. [~alex.parvulescu]
Thoughts?


[1] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L913

> QueryEngine should not tokenize fulltext expression by default
> --------------------------------------------------------------
>
>                 Key: OAK-2301
>                 URL: https://issues.apache.org/jira/browse/OAK-2301
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: query
>            Reporter: Chetan Mehrotra
>            Assignee: Thomas Mueller
>             Fix For: 1.2
>
>         Attachments: OAK-2301.patch
>
>
> QueryEngine currently parses the fulltext expression on its own. This would cause issue
with index implementation like Lucene which use a different analysis logic. For fulltext search
to work properly it should be possible for LuceneIndex to get access to non tokenized text
> For more details refer to http://markmail.org/thread/syoha44std3fm4j2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message