lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Rowe <sar...@gmail.com>
Subject Re: Defining tokenizer pattern with < character
Date Fri, 01 Mar 2013 17:21:30 GMT
Kristian,

I think what you want is pattern="&lt;[^&gt;]&gt;" (untested) - that is, you probably
don't want to regex-escape the character class brackets "[" and "]", and you should html-escape
the angle brackets.

Steve
 
On Mar 1, 2013, at 11:42 AM, "Van Tassell, Kristian" <kristian.vantassell@siemens.com>
wrote:

> I'm trying to define the pattern:
> 
>   <tokenizer class="solr.PatternTokenizerFactory" pattern="<\[^\>\]*>" group="0"/>
> 
> But getting an error from Solr:
> 
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Schema Parsing
Failed: The value of attribute "pattern" associated with an element type "null" must not contain
the '<' character.
> 
> I'm trying to tokenize a CDATA section I am indexing. I've tried escaping the < character
numerous ways (and used the &lt; entity...) but can't get it to work.
> 
> Any ideas? Thanks in advance!


Mime
View raw message