lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saar Carmi <saarca...@gmail.com>
Subject Re: New type of proximity/fuzzy search
Date Thu, 01 Sep 2016 19:29:15 GMT
Thanks Allison and Uwe.
Yes, indeed the SpanNearQuery is probably a better starting point as it
already considers the proximity.

I see you have opened an issue to track it.
Are you looking to add the functionality to the existing SpanNearQuery or
subclass it?

Anyway, looking at the issue created I am not sure the description exactly
matches what I had in mind
"Add minNumberShouldMatch parameter to SpanNearQuery"

My understanding is that  SpanNearQuery will allow the maximum slop between
each of the elements. So we have 10 SpanTermQuery with a slope of 3, it
could be that the distance between the first and the last occurrence is 30.
What I was looking for is to define the maximum distance between the first
and last terms and additional require the minNumber. That would probably
require adding to parameters - minNumberShouldMatch and maxSearchWindow.

Thanks
Saar

On Thu, Sep 1, 2016 at 9:14 PM, Allison, Timothy B. <tallison@mitre.org>
wrote:

> https://issues.apache.org/jira/browse/LUCENE-7434
>
> -----Original Message-----
> From: Allison, Timothy B. [mailto:tallison@mitre.org]
> Sent: Wednesday, August 31, 2016 3:41 PM
> To: java-user@lucene.apache.org
> Subject: RE: New type of proximity/fuzzy search
>
> Doh, sorry, Uwe, didn't see your response first.
>
> Scratch SpanOr, take a look at SpanNear.  This would be a great capability
> to have!
>
> -----Original Message-----
> From: Allison, Timothy B.
> Sent: Wednesday, August 31, 2016 3:30 PM
> To: java-user@lucene.apache.org
> Subject: RE: New type of proximity/fuzzy search
>
> Unfortunately, that does require a new type of query.  As you probably
> know, you can do the "at least" (minimum number should match) with regular
> BooleanQueries, but you can't yet do the "at least" with SpanQuery.  You
> might want to look at modifying the SpanOrQuery to get this functionality.
> It would be a great capability to have.  Perhaps open an issue and submit a
> patch?
>
> -----Original Message-----
> From: Saar Carmi [mailto:saarcarmi@gmail.com]
> Sent: Tuesday, August 30, 2016 11:03 PM
> To: java-user@lucene.apache.org
> Subject: New type of proximity/fuzzy search
>
> Hi
> I will appreciate some guidance for implementing the following type of
> query.
>
> Given a set of search terms (t1, t2, t3, ti), return all documents where
> in a sequence of x=10 tokens at least c=3 of the search terms appear within
> the sequence
>
> So for example the following document matches the search (expand,
> discount, file, search, lookup)
>
> "Many of us rely on Windows Search to find files and launch programs, but
> searching for text within files is limited to specific filetypes by
> default. Here’s how you can *expand *your *search *to include other text
> based *files*."
>
> Within the sequence of the last 10 words of the document the expand,
> files, and search terms appear so there is a match.
>
> Does any documentation exist on adding new types of queries into the
> Luence engine?
>
> Saar
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message