lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evert Wagenaar <evert.wagen...@gmail.com>
Subject Re: Lucene custom Query - efficiently and compare retrieve multiple document fields
Date Mon, 12 Feb 2018 21:26:12 GMT
Use a MultiFieldQuerySearcher.
Like this;

{
    "multi_match": {
        "query":    "quick brown fox",
        "fields": [ "title", "body" ]
    }
}


On Mon, 12 Feb 2018 at 22:04 Dominik Safaric <dominiksafaric@gmail.com>
wrote:

> Unfortunately you've misunderstood my question. The thing is that the
> FuzzyQuery does not unfortunately satisfy the given requirements of mine,
> in particular it is based on Levenshtein and not Hamming distance. Hence
> the need to implement the custom Query instance.
>
> As asked, how does Lucene internally store multi valued fields and is it
> possible to retrieve them in the same order as they were stored? In
> particular, I'd like to retrieve a multi valued keyword field in such a way.
>
> Kind regards,
> Dominik
>
> > On 12 Feb 2018, at 19:34, Adrien Grand <jpountz@gmail.com> wrote:
> >
> > Filtering by one query and scoring by a different query is easy: just put
> > the filter in a FILTER clause of a BooleanQuery and the scoring query in
> a
> > SHOULD clause. Documents that do not match the SHOULD clause will have a
> > score of zero.
> >
> > I'm wondering that maybe you are looking for something like this:
> >
> > Query q = new BooleanQuery.Builder()
> >  .add(new FuzzyQuery(new Term("coarse_grained", "search_term")),
> > Occur.FILTER)
> >  .add(new FuzzyQuery(new Term("fine_grained", "search_term")),
> > Occur.SHOULD)
> >  .build();
> >
> > It's not clear to me why you need to retain order: the order of your
> values
> > should not matter?
> >
> > Le lun. 12 févr. 2018 à 11:23, Dominik Safaric <dominiksafaric@gmail.com>
> a
> > écrit :
> >
> >> In particular, I have a document schema as follows:
> >>
> >> {
> >> "images": [{
> >> "image_id": 1,
> >> "features": {
> >> "coarse_grained": <keyword>,
> >> "fine_grained": [*<keyword>*]
> >> }
> >> }]
> >> }
> >>
> >> In the first run, using a custom Query instance I'd like to hit
> documents
> >> by matching the *coarse_grained *field. A document is said to be
> matching
> >> if the Hamming distance between the value of a document's
> >> *coarse_grained* field,
> >> compared to the one passed through the REST API, is less or equal then a
> >> set threshold. On the other hand, I'd like to score the hit documents
> using
> >> the *fine_grained *field values, which is an array of keywords. A
> similar
> >> method using Hamming distance as a similarity measure applies in this
> case
> >> as well.
> >>
> >> What I'm concerned with is the following: in the second (the scoring)
> phase
> >> I'd like to score documents using all fields of the *fine_grained*
> array of
> >> keywords. How can I effectively retrieve these values for each document,
> >> such that their order is equal to the one as they were inserted?
> >>
> >> Thanks in advance,
> >> Dominik
> >>
> >> 2018-02-12 8:56 GMT+01:00 Adrien Grand <jpountz@gmail.com>:
> >>
> >>> Whether this is doable is going to depend on what you mean by
> "match[ing]
> >>> documents according to criteria X". Can you give an example?
> >>>
> >>> Le ven. 9 févr. 2018 à 14:47, Dominik Safaric <
> dominiksafaric@gmail.com>
> >> a
> >>> écrit :
> >>>
> >>>> Hi,
> >>>>
> >>>> I am intending to implement a custom Query using Lucene 6.x and due
to
> >>> the
> >>>> lack of documentation concerned with a particular topic I have the
> >>>> following questions.
> >>>>
> >>>> The query is expected to implement a two-phase search, in the sense
> >> that
> >>>> during the first run it matches documents according to criteria X,
> >>> whereas
> >>>> during the later according to criteria Y of another document field.
> Can
> >>>> this be accomplished by using the TwoPhaseIterator?
> >>>>
> >>>> Secondly, the query as expressed through the API will not specify a
> >>>> specific query field, but instead of a field that stores an array of
> >>>> objects. From an implementation point of view, can I using the
> >> LeafReader
> >>>> retrieve an object that would map to a Java Map, which I can later use
> >>> for
> >>>> accessing a certain field within the object? Of is it perhaps more
> >>>> advisable to get the document instance using the LeafReader's
> >>>> getDocument(int docID) function, and then load particular? I'm afraid
> >>> that
> >>>> might hurt the performance in overall because the documents would need
> >> to
> >>>> be loaded from disk.
> >>>>
> >>>> Thanks in advance,
> >>>> Dominik
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>>
> >>>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> --
Sent from Gmail IPad

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message