lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Trey Grainger (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers
Date Wed, 22 Jun 2016 03:01:02 GMT

    [ https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343254#comment-15343254
] 

Trey Grainger commented on SOLR-6492:
-------------------------------------

Hi [~krantiparisa] and [~dannytei1]. Apologies for the long lapse without a response on this
issue. I won't get into the reasons here (combination of personal and professional commitments),
but I just wanted to say that I expect to pick this issue back up in the near future and continue
work on this patch.

In the meantime, I have added an ASL 2.0 license to the current code (from Solr in Action)
so that folks can feel free to use what's there now: https://github.com/treygrainger/solr-in-action/tree/master/src/main/java/sia/ch14

I'll turn what's there now into a patch, update it to Solr trunk, and keep iterating on it
until the folks commenting on this issue are satisfied with the design and capabilities. Stay
tuned...

> Solr field type that supports multiple, dynamic analyzers
> ---------------------------------------------------------
>
>                 Key: SOLR-6492
>                 URL: https://issues.apache.org/jira/browse/SOLR-6492
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Trey Grainger
>             Fix For: 5.0
>
>
> A common request - particularly for multilingual search - is to be able to support one
or more dynamically-selected analyzers for a field. For example, someone may have a "content"
field and pass in a document in Greek (using an Analyzer with Tokenizer/Filters for German),
a separate document in English (using an English Analyzer), and possibly even a field with
mixed-language content in Greek and English. This latter case could pass the content separately
through both an analyzer defined for Greek and another Analyzer defined for English, stacking
or concatenating the token streams based upon the use-case.
> There are some distinct advantages in terms of index size and query performance which
can be obtained by stacking terms from multiple analyzers in the same field instead of duplicating
content in separate fields and searching across multiple fields. 
> Other non-multilingual use cases may include things like switching to a different analyzer
for the same field to remove a feature (i.e. turning on/off query-time synonyms against the
same field on a per-query basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message