lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-2015) add a config hook for autoGeneratePhraseQueries
Date Mon, 26 Jul 2010 14:11:50 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892314#action_12892314
] 

Yonik Seeley commented on SOLR-2015:
------------------------------------

bq. is wi fi, then this will not turn into a phrase.

Right - but there's just a lack of information that can't be helped?
So while one might want stuff like this as a phrase, I don't think it's a bug that it's not.

What *is* a problem though is the lack of ability for the user to add additional context to
fix the issue (i.e. a SynonymFilter to manually map "wi fi" wouldn't work since it would get
"wi" and then "fi" in separate runs.

What is also the problem is that if the original doc contained "wifi" then a query of "wi-fi"
won't match (since it queries for "wi fi").  We work around this today (for people that really
need it) by indexing a second field that catenates instead of splits the parts of a split
token).  It's certainly not ideal, but people tend to be happy with the cases we can match.

So while our current system is far from perfect (and we should work on improving it).
The problem is not that we have an incorrect solution, but an incomplete solution.
Let's assume we had a QP that didn't split on whitespace (or whatever our optimal solution
is).
IMO, I would still want tokens joined by a dash to form a phrase query, just like tokens surrounded
by quotes.
It's important information and shouldn't be discarded.

bq.  there's no evidence auto-phrase-gen actually improves relevance even for English.

IMO, it's a case of "the customer is always right".   Many people have asked how to do this
sort of matching over the years and I think there is plenty of evidence that it increases
relevancy.

bq. Maybe we insert a "cp {english,cjk}schema.xml schema.xml" in between those two steps?
This would avoid the global default, ie, force an explicit choice.

And the tutorial that's in english would tell them to copy the english one... that only hurts
english speakers and doesn't help anyone else..
We can have different text field types in a single schema - it's just a matter of adding another
one that's good for non-whitespace delimited languages?


> add a config hook for autoGeneratePhraseQueries
> -----------------------------------------------
>
>                 Key: SOLR-2015
>                 URL: https://issues.apache.org/jira/browse/SOLR-2015
>             Project: Solr
>          Issue Type: New Feature
>    Affects Versions: 3.1, 4.0
>            Reporter: Koji Sekiguchi
>            Assignee: Yonik Seeley
>            Priority: Blocker
>             Fix For: 3.1, 4.0
>
>         Attachments: SOLR-2015.patch, SOLR-2015.patch, SOLR-2015.patch
>
>
> After committed LUCENE-2458, a hook for autoGeneratePhraseQueries will be convenient
for some situation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message