lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Rowe (JIRA)" <>
Subject [jira] [Updated] (SOLR-4619) Improve PreAnalyzedField query analysis
Date Thu, 14 Jan 2016 01:31:39 GMT


Steve Rowe updated SOLR-4619:
    Attachment: SOLR-4619.patch

Patch that brings Andrzej's patch up to date with trunk, and adds tests for query-time functionality.

I had assumed that {{PreAnalyzedField}}-s would use the {{PreAnalyzedTokenizer}} at query
time, but that is not (currently) the case: instead {{FieldType.DefaultAnalyzer}} is used.
 This patch changes the behavior when no analyzer is specified to instead use {{PreAnalyzedTokenizer}}.

However, there is a chicken-and-egg interaction between {{PreAnalyzedTokenizer}} and {{QueryBuilder.createFieldQuery()}},
which aborts before performing any tokenization if the supplied analyzer's attribute factory
doesn't contain a {{TermToBytesRefAttribute}}.  But {{PreAnalyzedTokenizer}} doesn't have
any attributes defined until the input stream is consumed, in {{reset()}}. [~rcmuir] added
a comment as part of LUCENE-5388 to {{PreAnalyzedTokenizer}}'s ctor, where {{AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY}}
is set as the attribute factory rather than the default packed implementation: "we don't pack
attributes: since we are used for (de)serialization and dont want bloat."

This patch moves the {{stream.reset()}} call in {{QueryBuilder.createFieldQuery()}} in front
of the {{TermToBytesRefAttribute}} check, so that {{PreAnalyzedTokenizer}} (and other tokenizers
that don't have a pre-added set of attributes) and also moves the {{addAttribute(PositionIncrementAttribute.class)}}
call to after the the {{TermToBytesRefAttribute}} check.

An alternate approach to fix the chicken-and-egg problem might be to have {{PreAnalyzedTokenizer}}
always include a dummy {{TermToBytesRefAttribute}} implementation, and then remove it when
{{reset()}} is called, but that seems hackish.

I haven't run the full tests yet with this patch, but the included query-time {{PreAnalyzedField}}
tests success.

I welcome feedback.

> Improve PreAnalyzedField query analysis
> ---------------------------------------
>                 Key: SOLR-4619
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 4.0, 4.1, 4.2, 4.2.1, Trunk
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>             Fix For: Trunk
>         Attachments: SOLR-4619.patch, SOLR-4619.patch
> PreAnalyzed field extends plain FieldType and mistakenly uses the DefaultAnalyzer as
query analyzer, and doesn't allow for customization via <analyzer> schema elements.
> Instead it should extend TextField and support all query analysis supported by that type.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message