lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Matheis (steffkes) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-5800) Admin UI - Analysis form doesn't render results correctly when a CharFilter is used.
Date Sun, 02 Mar 2014 12:55:19 GMT

     [ https://issues.apache.org/jira/browse/SOLR-5800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Stefan Matheis (steffkes) updated SOLR-5800:
--------------------------------------------

    Fix Version/s: 5.0
                   4.8

> Admin UI - Analysis form doesn't render results correctly when a CharFilter is used.
> ------------------------------------------------------------------------------------
>
>                 Key: SOLR-5800
>                 URL: https://issues.apache.org/jira/browse/SOLR-5800
>             Project: Solr
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 4.7
>            Reporter: Timothy Potter
>            Priority: Minor
>             Fix For: 4.8, 5.0
>
>         Attachments: SOLR-5800-sample.json, SOLR-5800.patch
>
>
> I have an example in Solr In Action that uses the
> PatternReplaceCharFilterFactory and now it doesn't work in 4.7.0.
> Specifically, the <fieldType> is:
>     <fieldType name="text_microblog" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer>
>         <charFilter class="solr.PatternReplaceCharFilterFactory"
>                     pattern="([a-zA-Z])\1+"
>                     replacement="$1$1"/>
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.WordDelimiterFilterFactory"
>                 generateWordParts="1"
>                 splitOnCaseChange="0"
>                 splitOnNumerics="0"
>                 stemEnglishPossessive="1"
>                 preserveOriginal="0"
>                 catenateWords="1"
>                 generateNumberParts="1"
>                 catenateNumbers="0"
>                 catenateAll="0"
>                 types="wdfftypes.txt"/>
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="lang/stopwords_en.txt"
>                 />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.ASCIIFoldingFilterFactory"/>
>         <filter class="solr.KStemFilterFactory"/>
>       </analyzer>
>     </fieldType>
> The PatternReplaceCharFilterFactory (PRCF) is used to collapse
> repeated letters in a term down to a max of 2, such as #yummmm would
> be #yumm
> When I run some text through this analyzer using the Analysis form,
> the output is as if the resulting text is unavailable to the
> tokenizer. In other words, the only results being displayed in the
> output on the form is for the PRCF
> This example stopped working in 4.7.0 and I've verified it worked
> correctly in 4.6.1.
> Initially, I thought this might be an issue with the actual analysis,
> but the analyzer actually works when indexing / querying. Then,
> looking at the JSON response in the Developer console with Chrome, I
> see the JSON that comes back includes output for all the components in
> my chain (see below) ... so looks like a UI rendering issue to me?
> {"responseHeader":{"status":0,"QTime":24},"analysis":{"field_types":{"text_microblog":{"index":["org.apache.lucene.analysis.pattern.PatternReplaceCharFilter","#Yumm
> :) Drinking a latte at Caffe Grecco in SF's historic North Beach...
> Learning text analysis with #SolrInAction by @ManningBooks on my i-Pad
> foo5","org.apache.lucene.analysis.core.WhitespaceTokenizer",[{"text":"#Yumm","raw_bytes":"[23
> 59 75 6d 6d]","start":0,"end":6,"position":1,"positionHistory":[1],"type":"word"},{"text":":)","raw_bytes":"[3a
> 29]","start":7,"end":9,"position":2,"positionHistory":[2],"type":"word"},{"text":"Drinking","raw_bytes":"[44
> 72 69 6e 6b 69 6e
> 67]","start":10,"end":18,"position":3,"positionHistory":[3],"type":"word"},{"text":"a","raw_bytes":"[61]","start":19,"end":20,"position":4,"positionHistory":[4],"type":"word"},{"text":"latte","raw_bytes":"[6c
...
> the JSON returned to the browser has evidence that the full analysis chain was applied,
so this seems to just be a rendering issue.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message