lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saïd Radhouani <r.steve....@gmail.com>
Subject Re: Use free text to search against boolean fields?
Date Sat, 03 Jul 2010 15:32:11 GMT
Hi Jan,

The vocabulary of my domain is very small and pretty controlled. Users will ask queries about
features of our products, and we have less than one hundred features.. So the idea is to have
a text field "features" storing all the features. And, re: the multilingualism, I can have
"features_en", "features_fr", etc.  

What do you think?
-Saïd


On Jul 3, 2010, at 5:09 PM, Jan Høydahl / Cominvent wrote:

> Hi,
> 
> It would help to know more about the actual application, and see some use cases in order
to answer that question. I thought that this would be free-text queries from users, and as
soon as you have free-text then you WILL get all kinds of stuff in the queries. However, if
your users are well educated on how to query your system and behave, then what you suggest
makes more sense. It's quick to test and see how it works.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Training in Europe - www.solrtraining.com
> 
> On 3. juli 2010, at 01.11, Saïd Radhouani wrote:
> 
>> Hi Jan,
>> 
>> Thanks for this suggestion. If we choose parsing, then why don't we do it at the
indexing side, instead of the querying side, which might slows down the search process? i.e.,
if a document has "is_man=true" and "is_single=true", the we populate a text field by the
words "man" and "single". Then, during the search, we compare the user query with the text
field. There's no "intelligent" query in my application, i.e., users would not ask for "not
smoking". If they mention a word, it means that the boolean value is true.
>> 
>> I don't have many fields, so populating a text field will not dramatically increase
the size of my index.
>> 
>> What do you think?
>> 
>> -Saïd
>> 
>> On Jul 3, 2010, at 12:36 AM, Jan Høydahl / Cominvent wrote:
>> 
>>> Hi,
>>> 
>>> I would rather go for the boolean variant and spend some time writing a query
parser which tries to understand all kinds of input people may make, mapping it into boolean
filters. In this way you can support both navigation and search and keep both in sync whatever
people prefert to start with. I'm not saying it is easy to write such a parser, but you know
the domain and the users...
>>> 
>>> Another reason for doing it this way is that if you have a field does_smoke=true,
you still want to match if someone writes "not smoking". Your parser would have to understand
negations, e.g. through a set of regex ((not|non|no) (smoker|smoking|smoke))...
>>> 
>>> You could always do a mix also - to keep a free-text field as well, and any words
that your parser does not understand can be passed through to the free-text as a "should"
term with a boost.
>>> 
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>> Training in Europe - www.solrtraining.com
>>> 
>>> On 2. juli 2010, at 18.36, Saïd Radhouani wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I have the following kind of data to index in a multilingual context: is_man,
is_single, has_job, etc.
>>>> 
>>>> Logically, the underlying fields have a value of "yes" or "no." That's why
the boolean type would be appropriate. But my problem is, in addition to be able to filter
on these fields, I would like to give my users the possibility to search against these fields
using free text. i.e., a query might be "single man having job." Therefore, I think that the
boolean type is not appropriate anymore. Instead, I'm thinking of using the string type, and
each field will be either empty (the "no" case), or populated by its own tag. e.g., if we
deal about a man, the field is_man will contain the string "man." Then, I copy all these fields
into a text field that I ca user for free text search.
>>>> 
>>>> Does that make sense?
>>>> 
>>>> Does that make sense in a multilingual context, i.e., field tags can be different
in each language (EN => man, single, jog, FR => homme, célibataire, emploi, etc.)
>>>> 
>>>> Thanks!
>>>> 
>>>> -Saïd
>>> 
>> 
> 


Mime
View raw message