lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maciej Ł. PCSS" <>
Subject Re: SQL-like queries (with percent character) - matching an exact substring, with parts of words
Date Thu, 02 Feb 2017 15:15:56 GMT
Hi Erick, All,

regardless of the value of such a use-case, there is another thing that 
stays unknown for me.

Does SOLR support a simple and silly 'exact substring match'? I mean, is 
it possible to search for (actually filter by) a raw substring without 
tokenization and without any kind of processing/simplifying the searched 
information? By a 'raw substring' I mean a character string that, among 
others, can contain non-letters (colons, brackets, etc.) - basically 
everything the user is able to input via keyboard.

Does this use case meet SOLR technical possibilities even if that means 
a big efficiency cost?


W dniu 30.01.2017 o 17:12, Erick Erickson pisze:
> Well, the usual Solr solution to leading and trailing wildcards is to
> ngram the field. You can get the entire field (incuding spaces) to be
> analyzed as a whole by using KeywordTokenizer. Sometimes you wind up
> using a copyField to support this and search against one or the other
> if necessary.
> You can do this with KeywordTokenizer and '*a bcd ef*", but that'll be
> slow for the exact same reason the SQL query is slow: It has to
> examine every value in every document to find terms that match then
> search on those.
> There's some index size cost here so you'll have to test.
> Really go back to your use-case to see if this is _really_ necessary
> though. Often people think it is because that's the only way they've
> been able to search at all in SQL and it can turn out that there are
> other ways to solve it. IOW, this could be an XY problem.
> Best,
> Erick
> On Mon, Jan 30, 2017 at 1:52 AM, Maciej Ł. PCSS <> wrote:
>> Hi All,
>> What solution have you applied in your implementations?
>> Regards
>> Maciej
>> W dniu 24.01.2017 o 14:10, Maciej Ł. PCSS pisze:
>>> Dear SOLR users,
>>> please point me to the right solution of my problem. I'm using SOLR to
>>> implement a Google-like search in my application and this scenario is
>>> working fine.
>>> However, in specific use-cases I need to filter documents that include a
>>> specific substring in a given field. It's about an SQL-like query similar to
>>> this:
>>> SELECT *  FROM table WHERE someField = '%c def g%'
>>> I expect to match documents having someField ='abc def ghi'. That means I
>>> expect match parts of words.
>>> As I understand SOLR, as a reversed-index, does work with tokens rather
>>> that character strings and thereby looks for whole words (not substrings).
>>> Is there any solution for such an issue?
>>> Regards
>>> Maciej Łabędzki

View raw message