lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gora Mohanty <g...@srijan.in>
Subject Data storage, and textual analysis
Date Tue, 19 Jan 2010 18:41:05 GMT
Hi,

Another simple query. I have set up a field to hold phonetic
equivalents, with the relevant part of schema.xml looking like:
<analyzer>
 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
 <filter class="solr.WordDelimiterFilterFactory"
 generateWordParts="1" generateNumberParts="0" catenateWords="1"
 catenateNumbers="0" catenateAll="0"/>
 <filter class="solr.LowerCaseFilterFactory"/> <filter
 class="com.srijan.search.solr.analysis.AspellFilterFactory"/>
</analyzer>

Here, com.srijan.search.solr.analysis.AspellFilterFactory is
a custom filter that provides a phonetic soundslike equivalent for
Indian languages transliterated into English. However, that is
irrelevant here, as the issue below holds even if I use the standard
solr.DoubleMetaphoneFilterFactory.

I have a data source where all text is upper-case, and from
various Solr-related discussions found through Google, I would have
thought that fields of this type would be stored as the lower-case,
soundslike equivalent. Instead the data (as seen through the Solr
admin. interface, or through a front-end search) seem to be stored
as is.

The Solr admin. analysis view does show the index and query
conversions as I would expect. Also, phonetic matches, and matches
with lower-case input work properly. I am just curious as to how
this works.

Regards,
Gora

Mime
View raw message