lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Namgyu Kim <kng0...@gmail.com>
Subject Re: ElasticSearch Query Relevancy
Date Tue, 28 May 2019 18:28:56 GMT
Hi Alicia,

I do not know it will help but I answer.

The query will search the *"Term"* in the Index.
When developer uses Elasticsearch first time, they confuse Full text
queries with Term level queries much.
These two are very different.

Please check.
Full text queries :
https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html

Term level queries :
https://www.elastic.co/guide/en/elasticsearch/reference/current/term-level-queries.html


About ranking,
the default ranking policy is BM25 in Elasticsearch. (if you didn't set
anything)
I attach the wikipedia link.
(https://en.wikipedia.org/wiki/Okapi_BM25)

If you don't want to see mathematics, see the next line.
BM25 has the following conditions.

1. Document have a lot of your search keyword.
search BM25
1) *BM25* is a ranking function. *BM25* is popular ranking method.
vs
2) There are two famous ranking functions. TF/IDF, *BM25*.
The first sentence is winner because the keyword "BM25" appears more
frequently.

2. There should not be many documents with this term. (means your keyword
is rare).
This means that the words frequently appearing in various documents are
worthless. (a, the, is, ...)
This is called IDF.

3. Document length should be short.
search BM25
1) *BM25* rank
vs
2) *BM25 *is a ranking function. It is a popular ranking method.
A short sentence looks like find more key information.

In conclusion, the higher the 1, 2 and 3, the more important it is.

Please give feedback if something is wrong.
I hope it helps.

Warm regards,
Namgyu Kim

On Wed, May 29, 2019 at 2:32 AM Doug Turnbull <
dturnbull@opensourceconnections.com> wrote:

> Hi Alica,
>
> You might want to ask your question at the Elasticsearch mailing list (
> http://discuss.elastic.co) or at Magento's (https://community.magento.com/
> ).
> Because Lucene is really just a library, with an very open-ended way of
> doing document scoring that could mix in any number of ways of doing
> ranking (text scoring, numerical attributes, etc). It will depend on how
> Elasticsearch is using Lucene, and probably more importantly, how Magento
> is configured to use Elasticsearch
>
> More concretely, a set of articles to get you started:
>
> https://www.elastic.co/guide/en/elasticsearch/guide/current/relevance-intro.html
> My book "Relevant Search" (happy to give you a discount code if you email
> me directly)
>
> But I suspect you probably want to configure things at a higher level than
> all of this...
>
> Hope that's helpful!
> -Doug
>
>
> On Tue, May 28, 2019 at 1:22 PM Alicia Watkinson <
> Alicia.Watkinson@kathmandu.co.nz> wrote:
>
> > Hello,
> >
> > We have recently configured Magento 2 with ElasticSuite, however our
> > search logic does not match expected behaviour.
> >
> > After reading through countless documents, I have been unable to find any
> > answers as to the logic behind search result relevancy, or how a search
> > query is matched and ranked against the Index.
> >
> > I found a document that stated that ElasticSearch uses Lucene to perfom
> > its scoring logic.
> >
> > We are extremely keen on fixing currently search logic! Are you able to
> > please provide me with any CLEAR documentation on how search querys are
> > match against the index and then scored? Is this via attributes? Or on
> page
> > text?
> >
> > If you could please get back to me as a matter of high importance that
> > would be great.
> >
> > Kindest,
> >
> > Alicia
> >
> > Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for
> > Windows 10
> >
> >
> > ________________________________
> >
> > This email and any files transmitted with it are confidential and
> intended
> > solely for the use of the individual or entity to whom they are
> addressed.
> > If you have received this email in error please notify the sender and
> > delete this email from your system. If you are not the named addressee
> you
> > are notified that; disclosing, disseminating, distributing or copying
> this
> > transmission or taking any action in reliance of the contents of this
> > information, is strictly prohibited.
> >
> > Any views or opinions presented in this email are solely those of the
> > author and do not necessarily represent those of Kathmandu Holdings
> Limited
> > or it's subsidiaries ("Kathmandu"). Employees of Kathmandu are expressly
> > required not to make inappropriate or defamatory statements and; not to
> > infringe copyright or any other legal right via email communications. Any
> > such communication is contrary to company policy and outside the scope of
> > the employment of the individual concerned. Kathmandu shall not accept
> > liability in respect to any unauthorised transmission by an employee who
> > shall remain personally responsible.
> >
> > The company has taken reasonable precautions to ensure no viruses are
> > present in this email, the company cannot accept responsibility for any
> > loss or damage arising from the use of this email or attachments.The
> > company accepts no liability for any damage caused by any virus
> transmitted
> > by this email.
> >
>
>
> --
> *Doug Turnbull **| CTO* | OpenSource Connections
> <http://opensourceconnections.com>, LLC | 240.476.9983
> Author: Relevant Search <http://manning.com/turnbull>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message