lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: full name free text search problem
Date Thu, 01 Feb 2018 07:48:21 GMT
You need to tokenize the full name in several different ways and then
search both (all) tokenization versions with different boosts.

This way you can tokenize as full string (perhaps lowercased) and then
also on white space and then maybe even with phonetic mapping to catch
spellings.

You can see something similar in:
https://gist.github.com/arafalov/5e04884e5aefaf46678c

Regards,
   Alex.

On 31 January 2018 at 05:48, Deepak Udapudi <DUdapudi@delta.org> wrote:
> Hi all,
>
> I have the below scenario in full name search that we are trying to implement.
>
> Solr configuration :-
>
> fieldType name="keywords_text" class="solr.TextField">
>     <analyzer type="index">
>       <tokenizer class="solr.KeywordTokenizerFactory"/>
>       <filter class="solr.LowerCaseTokenizerFactory"/>
>     </analyzer>
>     <analyzer type="query">
>       <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/"/>
>       <filter class="solr.LowerCaseTokenizerFactory"/>
>     </analyzer>
>   </fieldType>
>
>
> <field name="keywords" type="keywords_text" indexed="true" stored="false" multiValued="true"
/>
>   <copyField source="fullName" dest="keywords" docValues="true" />
>   <copyField source="officeName" dest="keywords" docValues="true" />
>   <copyField source="facilityName" dest="keywords" docValues="true" />
> </field>
>
> Scenario :-
>
> Solr configuration has office name, facility name and the full name as displayed above.
> We are searching based on the input name with the records sorts by distance.
>
> Problem :-
>
> I am getting the records matching the full name sorted by distance.
> If the input string(for ex Dae Kim) is provided, I am getting the records other than
Dae Kim(for ex Rodney Kim) too at the top of the search results including Dae Kim
> just before the next Dae Kim because Kim is matching with all the fields like full name,
facility name and the office name. So, the hit frequency is high and it's
> distance is less compared to the next Dae Kim in the search results with higher distance.
>
> Expected results :-
>
> I want to see all the records for Dae Kim to be seen at the top of the search results
sorted by distance without any irrelevant results.
>
> Queries :-
>
> What is the fix for the above problem if anyone has faced it?
> How do I handle the problem?
>
> Any inputs would be highly appreciated.
>
> Thanks in advance.
>
> Regards,
> Deepak
>
>
>
>
> The information contained in this email message and any attachments is confidential and
intended only for the addressee(s). If you are not an addressee, you may not copy or disclose
the information, or act upon it, and you should delete it entirely from your email system.
Please notify the sender that you received this email in error.

Mime
View raw message