lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-5800) Admin UI - Analysis form doesn't render results correctly when a CharFilter is used.
Date Mon, 17 Mar 2014 16:13:46 GMT

    [ https://issues.apache.org/jira/browse/SOLR-5800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937968#comment-13937968
] 

ASF subversion and git services commented on SOLR-5800:
-------------------------------------------------------

Commit 1578444 from [~steffkes] in branch 'dev/branches/lucene_solr_4_7'
[ https://svn.apache.org/r1578444 ]

SOLR-5800: Admin UI - Analysis form doesn't render results correctly when a CharFilter is
used (merge r1576652)

> Admin UI - Analysis form doesn't render results correctly when a CharFilter is used.
> ------------------------------------------------------------------------------------
>
>                 Key: SOLR-5800
>                 URL: https://issues.apache.org/jira/browse/SOLR-5800
>             Project: Solr
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 4.7
>            Reporter: Timothy Potter
>            Assignee: Stefan Matheis (steffkes)
>            Priority: Minor
>             Fix For: 4.8, 5.0, 4.7.1
>
>         Attachments: SOLR-5800-sample.json, SOLR-5800.patch
>
>
> I have an example in Solr In Action that uses the
> PatternReplaceCharFilterFactory and now it doesn't work in 4.7.0.
> Specifically, the <fieldType> is:
>     <fieldType name="text_microblog" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer>
>         <charFilter class="solr.PatternReplaceCharFilterFactory"
>                     pattern="([a-zA-Z])\1+"
>                     replacement="$1$1"/>
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.WordDelimiterFilterFactory"
>                 generateWordParts="1"
>                 splitOnCaseChange="0"
>                 splitOnNumerics="0"
>                 stemEnglishPossessive="1"
>                 preserveOriginal="0"
>                 catenateWords="1"
>                 generateNumberParts="1"
>                 catenateNumbers="0"
>                 catenateAll="0"
>                 types="wdfftypes.txt"/>
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="lang/stopwords_en.txt"
>                 />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.ASCIIFoldingFilterFactory"/>
>         <filter class="solr.KStemFilterFactory"/>
>       </analyzer>
>     </fieldType>
> The PatternReplaceCharFilterFactory (PRCF) is used to collapse
> repeated letters in a term down to a max of 2, such as #yummmm would
> be #yumm
> When I run some text through this analyzer using the Analysis form,
> the output is as if the resulting text is unavailable to the
> tokenizer. In other words, the only results being displayed in the
> output on the form is for the PRCF
> This example stopped working in 4.7.0 and I've verified it worked
> correctly in 4.6.1.
> Initially, I thought this might be an issue with the actual analysis,
> but the analyzer actually works when indexing / querying. Then,
> looking at the JSON response in the Developer console with Chrome, I
> see the JSON that comes back includes output for all the components in
> my chain (see below) ... so looks like a UI rendering issue to me?
> {"responseHeader":{"status":0,"QTime":24},"analysis":{"field_types":{"text_microblog":{"index":["org.apache.lucene.analysis.pattern.PatternReplaceCharFilter","#Yumm
> :) Drinking a latte at Caffe Grecco in SF's historic North Beach...
> Learning text analysis with #SolrInAction by @ManningBooks on my i-Pad
> foo5","org.apache.lucene.analysis.core.WhitespaceTokenizer",[{"text":"#Yumm","raw_bytes":"[23
> 59 75 6d 6d]","start":0,"end":6,"position":1,"positionHistory":[1],"type":"word"},{"text":":)","raw_bytes":"[3a
> 29]","start":7,"end":9,"position":2,"positionHistory":[2],"type":"word"},{"text":"Drinking","raw_bytes":"[44
> 72 69 6e 6b 69 6e
> 67]","start":10,"end":18,"position":3,"positionHistory":[3],"type":"word"},{"text":"a","raw_bytes":"[61]","start":19,"end":20,"position":4,"positionHistory":[4],"type":"word"},{"text":"latte","raw_bytes":"[6c
...
> the JSON returned to the browser has evidence that the full analysis chain was applied,
so this seems to just be a rendering issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message