lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Teresa McMains <ter...@t14-consulting.com>
Subject match string fields with embedded hyphens
Date Fri, 03 Apr 2020 19:40:04 GMT
Forgive me if this is unclear, I am very much new here.

I am working with a customer who needs to be able to query various account/customer ID fields
which may or may not have embedded dashes.  But they want to be able to search by entering
the dashes or not and by entering partial values or not.

So we may have an account or customer ID like

1234-56AB45

And they would like to retrieve this by searching for any of the following:
1234-56AB45     (full string match)
1234-56                (partial string match)
123456AB45        (full string but no dashes)
123456                  (partial string no dashes)

I've defined this field type in schema.xml as:


<!-- String replace field for account number searches -->

<fieldType name="TrimmedString" class="solr.TextField" omitNorms="true">

<analyzer>

  <tokenizer class="solr.KeywordTokenizerFactory" />


  <!-- Normalizes token text to upper case -->

  <filter class="solr.UpperCaseFilterFactory" />

  <!-- Removes anything that isn't a letter or digit -->

  <filter class="solr.PatternReplaceFilterFactory" pattern="[^A-Za-z0-9]" replacement=""
replace="all"/>



</analyzer>

</fieldType>

But the behavior I see is completely unexpected.
Full string match works fine on the customer's DEV environment but not in QA (which is running
the same version of SOLR)
Partial string match works for some ID fields but not others
A Partial string match when the user does not enter the dashes just never works

I don't even know where to begin.  The behavior is not consistent enough to give me a sense.

So perhaps I will just ask - how would you define a fieldType which should ignore special
characters like hyphens or underscores (or anything non-alphanumeric) and works for full string
or partial string search?

Thank you.



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message