ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuriy Shuliga <shul...@gmail.com>
Subject Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)
Date Fri, 22 Nov 2019 17:59:20 GMT
Dear Igniters,

The first part of TextQuery improvement - a result limit - was developed
and merged.
Now we have to develop most important functionality here - proper sorting
of Lucene index response and correct reducing of them for distributed
queries.

*There are two Lucene based aspects*

1. In case of using no sorting fields, the documents in response are still
ordered by relevance.
Actually this is ScoreDoc.score value.
In order to reduce the distributed results correctly, the score should be
passed with response.

2. When sorting by conventional fields, then Lucene should have these
fields properly indexed and
corresponding  Sort object should be applied to Lucene's search call.
In order to mark those fields a new annotation like '@SortField' may be
introduced.

*Reducing on Ignite *

The obvious point of distributed response reduction is class
GridCacheDistributedQueryFuture.
Though, @Ivan Pavlukhin mentioned class with similar functionality:
ReduceIndexSorted
What I see here, that it is tangled with H2 related classes (
org.h2.result.Row) and might not be unified with TextQuery reduction.

Still need a support here.

Overall, the goal of this letter is to initiate discussion on TextQuery
Sorting implementation and come closer to ticket creation.

BR,
Yuriy Shuliha

вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov <andrey.mashenkov@gmail.com>
пише:

> Hi Dmitry, Yuriy.
>
> I've found GridCacheQueryFutureAdapter has newly added AtomicInteger
> 'total' field and 'limit; field as primitive int.
>
> Both fields are used inside synchronized block only.
> So, we can make both private and downgrade AtomicInteger to primitive int.
>
> Most likely, these fields can be replaced with one field.
>
>
>
> On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov <dpavlov@apache.org>
> wrote:
>
> > Hi Andrey,
> >
> > I've checked this ticket comments, and there is a TC Bot visa (with no
> > blockers).
> >
> > Do you have any concerns related to this patch?
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga <shuliga@gmail.com>:
> >
> >>   Andrey,
> >>
> >> Per you request, I created ticket
> >> https://issues.apache.org/jira/browse/IGNITE-12291   linked to
> >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189
> >>
> >> Could you please proceed with PR merge ?
> >>
> >> BR,
> >> Yuriy Shuliha
> >>
> >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov <andrey.mashenkov@gmail.com>
> >> пише:
> >>
> >> > Hi Yuri,
> >> >
> >> > To get access to TC Bot you should register as TeamCity user [1], if
> you
> >> > didn't do this already.
> >> > Then you will be able to authorize on Ignite TC Bot page with same
> >> > credentials.
> >> >
> >> > [1] https://ci.ignite.apache.org/registerUser.html
> >> >
> >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga <shuliga@gmail.com>
> wrote:
> >> >
> >> >> Andrew,
> >> >>
> >> >> I have corrected PR according to your notes. Please review.
> >> >> What will be the next steps in order to merge in?
> >> >>
> >> >> Y.
> >> >>
> >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov <
> andrey.mashenkov@gmail.com>
> >> >> пише:
> >> >>
> >> >> > Yuri,
> >> >> >
> >> >> > I've done with review.
> >> >> > No crime found, but trivial compatibility bug.
> >> >> >
> >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga <shuliga@gmail.com>
> >> wrote:
> >> >> >
> >> >> > > Denis,
> >> >> > >
> >> >> > > Thank you for your attention to this.
> >> >> > > as for now, the
> https://issues.apache.org/jira/browse/IGNITE-12189
> >> >> > ticket
> >> >> > > is still pending review.
> >> >> > > Do we have a chance to move it forward somehow?
> >> >> > >
> >> >> > > BR,
> >> >> > > Yuriy Shuliha
> >> >> > >
> >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda <dmagda@apache.org> пише:
> >> >> > >
> >> >> > > > Yuriy,
> >> >> > > >
> >> >> > > > I've seen you opening a pull-request with the first changes:
> >> >> > > > https://issues.apache.org/jira/browse/IGNITE-12189
> >> >> > > >
> >> >> > > > Alex Scherbakov and Ivan are you the right guys to do the
> review?
> >> >> > > >
> >> >> > > > -
> >> >> > > > Denis
> >> >> > > >
> >> >> > > >
> >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван <
> >> vololo100@gmail.com>
> >> >> > > wrote:
> >> >> > > >
> >> >> > > > > Yuriy,
> >> >> > > > >
> >> >> > > > > Thank you for providing details! Quite interesting.
> >> >> > > > >
> >> >> > > > > Yes, we already have support of distributed limit and merging
> >> >> sorted
> >> >> > > > > subresults for SQL queries. E.g. ReduceIndexSorted and
> >> >> > > > > MergeStreamIterator are used for merging sorted streams.
> >> >> > > > >
> >> >> > > > > Could you please also clarify about score/relevance? Is it
> >> >> provided
> >> >> > by
> >> >> > > > > Lucene engine for each query result? I am thinking how to do
> >> >> sorted
> >> >> > > > > merge properly in this case.
> >> >> > > > >
> >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga <
> shuliga@gmail.com
> >> >:
> >> >> > > > > >
> >> >> > > > > > Ivan,
> >> >> > > > > >
> >> >> > > > > > Thank you for interesting question!
> >> >> > > > > >
> >> >> > > > > > Text searches (or full text searches) are mostly
> >> human-oriented.
> >> >> > And
> >> >> > > > the
> >> >> > > > > > point of user's interest is topmost part of response.
> >> >> > > > > > Then user can read it, evaluate and use the given records
> for
> >> >> > further
> >> >> > > > > > purposes.
> >> >> > > > > >
> >> >> > > > > > Particularly in our case, we use Ignite for operations with
> >> >> > financial
> >> >> > > > > data,
> >> >> > > > > > and there lots of text stuff like assets names, fin.
> >> >> instruments,
> >> >> > > > > companies
> >> >> > > > > > etc.
> >> >> > > > > > In order to operate with this quickly and reliably, users
> >> used
> >> >> to
> >> >> > > work
> >> >> > > > > with
> >> >> > > > > > text search, type-ahead completions, suggestions.
> >> >> > > > > >
> >> >> > > > > > For this purposes we are indexing particular string data in
> >> >> > separate
> >> >> > > > > caches.
> >> >> > > > > >
> >> >> > > > > > Sorting capabilities and response size limitations are very
> >> >> > important
> >> >> > > > > > there. As our API have to provide most relevant information
> >> in
> >> >> view
> >> >> > > of
> >> >> > > > > > limited size.
> >> >> > > > > >
> >> >> > > > > > Now let me comment some Ignite/Lucene perspective.
> >> >> > > > > > Actually Ignite queries and Lucene returns
> >> *TopDocs.scoresDocs
> >> >> > > *already
> >> >> > > > > > sorted by *score *(relevance). So most relevant documents
> >> are on
> >> >> > the
> >> >> > > > top.
> >> >> > > > > > And currently distributed queries responses from different
> >> nodes
> >> >> > are
> >> >> > > > > merged
> >> >> > > > > > into final query cursor queue in arbitrary way.
> >> >> > > > > > So in fact we already have the score order ruined here.
> Also
> >> >> Ignite
> >> >> > > > > > requests all possible documents from Lucene that is
> redundant
> >> >> and
> >> >> > not
> >> >> > > > > good
> >> >> > > > > > for performance.
> >> >> > > > > >
> >> >> > > > > > I'm implementing *limit* parameter to be part of *TextQuery
> >> *and
> >> >> > have
> >> >> > > > to
> >> >> > > > > > notice that we still have to add sorting for text queries
> >> >> > processing
> >> >> > > in
> >> >> > > > > > order to have applicable results.
> >> >> > > > > >
> >> >> > > > > > *Limit* parameter itself should improve the part of issues
> >> from
> >> >> > > above,
> >> >> > > > > but
> >> >> > > > > > definitely, sorting by document score at least  should be
> >> >> > implemented
> >> >> > > > > along
> >> >> > > > > > with limit.
> >> >> > > > > >
> >> >> > > > > > This is a pretty short commentary if you still have any
> >> >> questions,
> >> >> > > > please
> >> >> > > > > > ask, do not hesitate)
> >> >> > > > > >
> >> >> > > > > > BR,
> >> >> > > > > > Yuriy Shuliha
> >> >> > > > > >
> >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван <
> vololo100@gmail.com>
> >> >> пише:
> >> >> > > > > >
> >> >> > > > > > > Yuriy,
> >> >> > > > > > >
> >> >> > > > > > > Greatly appreciate your interest.
> >> >> > > > > > >
> >> >> > > > > > > Could you please elaborate a little bit about sorting?
> What
> >> >> tasks
> >> >> > > > does
> >> >> > > > > > > it help to solve and how? It would be great to provide an
> >> >> > example.
> >> >> > > > > > >
> >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov <
> >> >> > > > > > > alexey.scherbakoff@gmail.com>:
> >> >> > > > > > > >
> >> >> > > > > > > > Denis,
> >> >> > > > > > > >
> >> >> > > > > > > > I like the idea of throwing an exception for enabled
> text
> >> >> > queries
> >> >> > > > on
> >> >> > > > > > > > persistent caches.
> >> >> > > > > > > >
> >> >> > > > > > > > Also I'm fine with proposed limit for unsorted
> searches.
> >> >> > > > > > > >
> >> >> > > > > > > > Yury, please proceed with ticket creation.
> >> >> > > > > > > >
> >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda <
> >> dmagda@apache.org
> >> >> >:
> >> >> > > > > > > >
> >> >> > > > > > > > > Igniters,
> >> >> > > > > > > > >
> >> >> > > > > > > > > I see nothing wrong with Yury's proposal in regards
> >> >> full-text
> >> >> > > > > search
> >> >> > > > > > > API
> >> >> > > > > > > > > evolution as long as Yury is ready to push it
> forward.
> >> >> > > > > > > > >
> >> >> > > > > > > > > As for the in-memory mode only, it makes total sense
> >> for
> >> >> > > > in-memory
> >> >> > > > > data
> >> >> > > > > > > > > grid deployments when Ignite caches data of an
> >> underlying
> >> >> DB
> >> >> > > like
> >> >> > > > > > > Postgres.
> >> >> > > > > > > > > As part of the changes, I would simply throw an
> >> exception
> >> >> (by
> >> >> > > > > default)
> >> >> > > > > > > if
> >> >> > > > > > > > > the one attempts to use text indices with the native
> >> >> > > persistence
> >> >> > > > > > > enabled.
> >> >> > > > > > > > > If the person is ready to live with that limitation
> >> that
> >> >> an
> >> >> > > > > explicit
> >> >> > > > > > > > > configuration change is needed to come around the
> >> >> exception.
> >> >> > > > > > > > >
> >> >> > > > > > > > > Thoughts?
> >> >> > > > > > > > >
> >> >> > > > > > > > >
> >> >> > > > > > > > > -
> >> >> > > > > > > > > Denis
> >> >> > > > > > > > >
> >> >> > > > > > > > >
> >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga <
> >> >> > > shuliga@gmail.com
> >> >> > > > >
> >> >> > > > > > > wrote:
> >> >> > > > > > > > >
> >> >> > > > > > > > > > Hello to all again,
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > Thank you for important comments and notes given
> >> below!
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > Let me answer and continue the discussion.
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > (I) Overall needs in Lucene indexing
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > Alexei has referenced to
> >> >> > > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-5371
> >> where
> >> >> > > > > > > > > > absence of index persistence was declared as an
> >> >> obstacle to
> >> >> > > > > further
> >> >> > > > > > > > > > development.
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > a) This ticket is already closed as not valid.b)
> >> There
> >> >> are
> >> >> > > > > definite
> >> >> > > > > > > needs
> >> >> > > > > > > > > > (and in our project as well) in just in-memory
> >> indexing
> >> >> of
> >> >> > > > > selected
> >> >> > > > > > > data.
> >> >> > > > > > > > > > We intend to use search capabilities for fetching
> >> >> limited
> >> >> > > > amount
> >> >> > > > > of
> >> >> > > > > > > > > records
> >> >> > > > > > > > > > that should be used in type-ahead search /
> >> suggestions.
> >> >> > > > > > > > > > Not all of the data will be indexed and the are no
> >> need
> >> >> in
> >> >> > > > Lucene
> >> >> > > > > > > index
> >> >> > > > > > > > > to
> >> >> > > > > > > > > > be persistence. Hope this is a wide pattern of
> >> >> text-search
> >> >> > > > usage.
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > (II) Necessary fixes in current implementation.
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > a) Implementation of correct *limit *(*offset*
> seems
> >> to
> >> >> be
> >> >> > > not
> >> >> > > > > > > required
> >> >> > > > > > > > > in
> >> >> > > > > > > > > > text-search tasks for now)
> >> >> > > > > > > > > > I have investigated the data flow for distributed
> >> text
> >> >> > > queries.
> >> >> > > > > it
> >> >> > > > > > > was
> >> >> > > > > > > > > > simple test prefix query, like 'name'*='ene*'*
> >> >> > > > > > > > > > For now each server-node returns all response
> >> records to
> >> >> > the
> >> >> > > > > > > client-node
> >> >> > > > > > > > > > and it may contain ~thousands, ~hundred thousands
> >> >> records.
> >> >> > > > > > > > > > Event if we need only first 10-100. Again, all the
> >> >> results
> >> >> > > are
> >> >> > > > > added
> >> >> > > > > > > to
> >> >> > > > > > > > > > queue in GridCacheQueryFutureAdapter in arbitrary
> >> order
> >> >> by
> >> >> > > > pages.
> >> >> > > > > > > > > > I did not find here any means to deliver
> >> deterministic
> >> >> > > result.
> >> >> > > > > > > > > > So implementing limit as part of query and
> >> >> > > > > (GridCacheQueryRequest)
> >> >> > > > > > > will
> >> >> > > > > > > > > not
> >> >> > > > > > > > > > change the nature of response but will limit load
> on
> >> >> nodes
> >> >> > > and
> >> >> > > > > > > > > networking.
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > Can we consider to open a ticket for this?
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > (III) Further extension of Lucene API exposition to
> >> >> Ignite
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > a) Sorting
> >> >> > > > > > > > > > The solution for this could be:
> >> >> > > > > > > > > > - Make entities comparable
> >> >> > > > > > > > > > - Add custom comparator to entity
> >> >> > > > > > > > > > - Add annotations to mark sorted fields for Lucene
> >> >> indexing
> >> >> > > > > > > > > > - Use comparators when merging responses or
> reducing
> >> to
> >> >> > > desired
> >> >> > > > > > > limit on
> >> >> > > > > > > > > > client node.
> >> >> > > > > > > > > > Will require full result set to be loaded into
> >> memory.
> >> >> > Though
> >> >> > > > > can be
> >> >> > > > > > > used
> >> >> > > > > > > > > > for relatively small limits.
> >> >> > > > > > > > > > BR,
> >> >> > > > > > > > > > Yuriy Shuliha
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei Scherbakov <
> >> >> > > > > > > > > alexey.scherbakoff@gmail.com>
> >> >> > > > > > > > > > пише:
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > > Yuriy,
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > Note what one of major blockers for text queries
> is
> >> >> [1]
> >> >> > > which
> >> >> > > > > makes
> >> >> > > > > > > > > > lucene
> >> >> > > > > > > > > > > indexes unusable with persistence and main reason
> >> for
> >> >> > > > > > > discontinuation.
> >> >> > > > > > > > > > > Probably it's should be addressed first to make
> >> text
> >> >> > > queries
> >> >> > > > a
> >> >> > > > > > > valid
> >> >> > > > > > > > > > > product feature.
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > Distributed sorting and advanved querying is
> indeed
> >> >> not a
> >> >> > > > > trivial
> >> >> > > > > > > task.
> >> >> > > > > > > > > > > Some kind of merging must be implemented on query
> >> >> > > originating
> >> >> > > > > node.
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > [1]
> >> https://issues.apache.org/jira/browse/IGNITE-5371
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, Denis Magda <
> >> >> > > dmagda@apache.org
> >> >> > > > >:
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > > Yuriy,
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > If you are ready to take over the full-text
> >> search
> >> >> > > indexes
> >> >> > > > > then
> >> >> > > > > > > > > please
> >> >> > > > > > > > > > go
> >> >> > > > > > > > > > > > ahead. The primary reason why the community
> >> wants to
> >> >> > > > > discontinue
> >> >> > > > > > > them
> >> >> > > > > > > > > > > first
> >> >> > > > > > > > > > > > (and, probable, resurrect later) are the
> >> limitations
> >> >> > > listed
> >> >> > > > > by
> >> >> > > > > > > Andrey
> >> >> > > > > > > > > > and
> >> >> > > > > > > > > > > > minimal support from the community end.
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > -
> >> >> > > > > > > > > > > > Denis
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM Andrey
> Mashenkov
> >> <
> >> >> > > > > > > > > > > > andrey.mashenkov@gmail.com>
> >> >> > > > > > > > > > > > wrote:
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > > Hi Yuriy,
> >> >> > > > > > > > > > > > >
> >> >> > > > > > > > > > > > > Unfortunatelly, there is a plan to
> discontinue
> >> >> > > > TextQueries
> >> >> > > > > in
> >> >> > > > > > > > > Ignite
> >> >> > > > > > > > > > > [1].
> >> >> > > > > > > > > > > > > Motivation here is text indexes are not
> >> >> persistent,
> >> >> > not
> >> >> > > > > > > > > transactional
> >> >> > > > > > > > > > > and
> >> >> > > > > > > > > > > > > can't be user together with SQL or inside
> SQL.
> >> >> > > > > > > > > > > > > and there is a lack of interest from
> community
> >> >> side.
> >> >> > > > > > > > > > > > > You are weclome to take on these issues and
> >> make
> >> >> > > > > TextQueries
> >> >> > > > > > > great.
> >> >> > > > > > > > > > > > >
> >> >> > > > > > > > > > > > > 1,  PageSize can't be used to limit
> resultset.
> >> >> > > > > > > > > > > > > Query results return from data node to
> >> client-side
> >> >> > > cursor
> >> >> > > > > in
> >> >> > > > > > > > > > > page-by-page
> >> >> > > > > > > > > > > > > manner and
> >> >> > > > > > > > > > > > > this parameter is designed control page size.
> >> It
> >> >> is
> >> >> > > > > supposed
> >> >> > > > > > > query
> >> >> > > > > > > > > > > > executes
> >> >> > > > > > > > > > > > > lazily on server side and
> >> >> > > > > > > > > > > > > it is not excepted full resultset be loaded
> to
> >> >> memory
> >> >> > > on
> >> >> > > > > server
> >> >> > > > > > > > > side
> >> >> > > > > > > > > > at
> >> >> > > > > > > > > > > > > once, but by pages.
> >> >> > > > > > > > > > > > > Do you mean you found Lucene load entire
> >> resultset
> >> >> > into
> >> >> > > > > memory
> >> >> > > > > > > > > before
> >> >> > > > > > > > > > > > first
> >> >> > > > > > > > > > > > > page is sent to client?
> >> >> > > > > > > > > > > > >
> >> >> > > > > > > > > > > > > I'd think a new parameter should be added to
> >> limit
> >> >> > > > result.
> >> >> > > > > The
> >> >> > > > > > > best
> >> >> > > > > > > > > > > > > solution is to use query language commands
> for
> >> >> this,
> >> >> > > e.g.
> >> >> > > > > > > > > > > "LIMIT/OFFSET"
> >> >> > > > > > > > > > > > in
> >> >> > > > > > > > > > > > > SQL.
> >> >> > > > > > > > > > > > >
> >> >> > > > > > > > > > > > > This task doesn't look trivial. Query is
> >> >> distributed
> >> >> > > > > operation
> >> >> > > > > > > and
> >> >> > > > > > > > > > same
> >> >> > > > > > > > > > > > > user query will be executed on data nodes
> >> >> > > > > > > > > > > > > and then results from all nodes should be
> >> correcly
> >> >> > > merged
> >> >> > > > > > > before
> >> >> > > > > > > > > > being
> >> >> > > > > > > > > > > > > returned via client-cursor.
> >> >> > > > > > > > > > > > > So, LIMIT should be applied on every node and
> >> >> then on
> >> >> > > > merge
> >> >> > > > > > > phase.
> >> >> > > > > > > > > > > > >
> >> >> > > > > > > > > > > > > Also, this may be non-obviuos, limiting
> results
> >> >> make
> >> >> > no
> >> >> > > > > sence
> >> >> > > > > > > > > without
> >> >> > > > > > > > > > > > > sorting,
> >> >> > > > > > > > > > > > > as there is no guarantee every next query run
> >> will
> >> >> > > return
> >> >> > > > > same
> >> >> > > > > > > data
> >> >> > > > > > > > > > > > because
> >> >> > > > > > > > > > > > > of page reordeing.
> >> >> > > > > > > > > > > > > Basically, merge phase receive results from
> >> data
> >> >> > nodes
> >> >> > > > > > > > > asynchronously
> >> >> > > > > > > > > > > and
> >> >> > > > > > > > > > > > > messages from different nodes can't be
> ordered.
> >> >> > > > > > > > > > > > >
> >> >> > > > > > > > > > > > > 2.
> >> >> > > > > > > > > > > > > a. "tokenize" param name (for
> @QueryTextFiled)
> >> >> looks
> >> >> > > more
> >> >> > > > > > > verbose,
> >> >> > > > > > > > > > > isn't
> >> >> > > > > > > > > > > > > it.
> >> >> > > > > > > > > > > > > b,c. What about distributed query? How
> partial
> >> >> > results
> >> >> > > > from
> >> >> > > > > > > nodes
> >> >> > > > > > > > > > will
> >> >> > > > > > > > > > > be
> >> >> > > > > > > > > > > > > merged?
> >> >> > > > > > > > > > > > >  Does Lucene allows to configure comparator
> for
> >> >> data
> >> >> > > > > sorting?
> >> >> > > > > > > > > > > > > What comparator Ignite should choose to sort
> >> >> result
> >> >> > on
> >> >> > > > > merge
> >> >> > > > > > > phase?
> >> >> > > > > > > > > > > > >
> >> >> > > > > > > > > > > > > 3. For now Lucene engine is not configurable
> at
> >> >> all.
> >> >> > > E.g.
> >> >> > > > > it is
> >> >> > > > > > > > > > > > impossible
> >> >> > > > > > > > > > > > > to configure Tokenizer.
> >> >> > > > > > > > > > > > > I'd think about possible ways to configure
> >> engine
> >> >> at
> >> >> > > > first
> >> >> > > > > and
> >> >> > > > > > > only
> >> >> > > > > > > > > > > then
> >> >> > > > > > > > > > > > go
> >> >> > > > > > > > > > > > > further to discuss\implement complex
> features,
> >> >> > > > > > > > > > > > > that may depends on engine config.
> >> >> > > > > > > > > > > > >
> >> >> > > > > > > > > > > > >
> >> >> > > > > > > > > > > > >
> >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 PM Yuriy
> Shuliga <
> >> >> > > > > > > shuliga@gmail.com>
> >> >> > > > > > > > > > > wrote:
> >> >> > > > > > > > > > > > >
> >> >> > > > > > > > > > > > > > Dear community,
> >> >> > > > > > > > > > > > > >
> >> >> > > > > > > > > > > > > > By starting this chain I'd like to open
> >> >> discussion
> >> >> > > that
> >> >> > > > > would
> >> >> > > > > > > > > come
> >> >> > > > > > > > > > to
> >> >> > > > > > > > > > > > > > contribution results in subj. area.
> >> >> > > > > > > > > > > > > >
> >> >> > > > > > > > > > > > > > Ignite has indexing capabilities, backed up
> >> by
> >> >> > > > different
> >> >> > > > > > > > > > mechanisms,
> >> >> > > > > > > > > > > > > > including Lucene.
> >> >> > > > > > > > > > > > > >
> >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is used (past year
> >> >> > release).
> >> >> > > > > > > > > > > > > > This is a wide spread and mature technology
> >> that
> >> >> > > covers
> >> >> > > > > text
> >> >> > > > > > > > > search
> >> >> > > > > > > > > > > > area
> >> >> > > > > > > > > > > > > > and beyond (e.g. spacial data indexing).
> >> >> > > > > > > > > > > > > >
> >> >> > > > > > > > > > > > > > My goal is to *expose more Lucene
> >> functionality
> >> >> to
> >> >> > > > Ignite
> >> >> > > > > > > > > indexing
> >> >> > > > > > > > > > > and
> >> >> > > > > > > > > > > > > > query mechanisms for text data*.
> >> >> > > > > > > > > > > > > >
> >> >> > > > > > > > > > > > > > It's quite simple request at current stage.
> >> It
> >> >> is
> >> >> > > > coming
> >> >> > > > > > > from our
> >> >> > > > > > > > > > > > > project's
> >> >> > > > > > > > > > > > > > needs, but i believe, will be useful for a
> >> lot
> >> >> more
> >> >> > > > > people.
> >> >> > > > > > > > > > > > > > Let's walk through and vote or discuss
> about
> >> >> Jira
> >> >> > > > > tickets for
> >> >> > > > > > > > > them.
> >> >> > > > > > > > > > > > > >
> >> >> > > > > > > > > > > > > > 1.[trivial] Use  dataQuery.getPageSize()
> to
> >> >> limit
> >> >> > > > search
> >> >> > > > > > > > > response
> >> >> > > > > > > > > > > > items
> >> >> > > > > > > > > > > > > > inside GridLuceneIndex.query(). Currently
> it
> >> is
> >> >> > > calling
> >> >> > > > > > > > > > > > > > IndexSearcher.search(query,
> >> >> *Integer.MAX_VALUE*) -
> >> >> > so
> >> >> > > > > > > basically
> >> >> > > > > > > > > all
> >> >> > > > > > > > > > > > > scored
> >> >> > > > > > > > > > > > > > matches will me returned, what we do not
> >> need in
> >> >> > most
> >> >> > > > > cases.
> >> >> > > > > > > > > > > > > >
> >> >> > > > > > > > > > > > > > 2.[simple] Add sorting.  Then more capable
> >> >> search
> >> >> > > call
> >> >> > > > > can be
> >> >> > > > > > > > > > > > > > executed: *IndexSearcher.search(query,
> count,
> >> >> > > > > > > > > > > > > > sort) *
> >> >> > > > > > > > > > > > > > Implementation steps:
> >> >> > > > > > > > > > > > > > a) Introduce boolean *sortField* parameter
> in
> >> >> > > > > > > *@QueryTextFiled *
> >> >> > > > > > > > > > > > > > annotation. If
> >> >> > > > > > > > > > > > > > *true *the filed will be indexed but not
> >> >> tokenized.
> >> >> > > > > Number
> >> >> > > > > > > types
> >> >> > > > > > > > > > are
> >> >> > > > > > > > > > > > > > preferred here.
> >> >> > > > > > > > > > > > > > b) Add *sort* collection to *TextQuery*
> >> >> > constructor.
> >> >> > > It
> >> >> > > > > > > should
> >> >> > > > > > > > > > define
> >> >> > > > > > > > > > > > > > desired sort fields used for querying.
> >> >> > > > > > > > > > > > > > c) Implement Lucene sort usage in
> >> >> > > > > GridLuceneIndex.query().
> >> >> > > > > > > > > > > > > >
> >> >> > > > > > > > > > > > > > 3.[moderate] Build complex queries with
> >> >> > *TextQuery*,
> >> >> > > > > > > including
> >> >> > > > > > > > > > > > > > terms/queries boosting.
> >> >> > > > > > > > > > > > > > *This section for voting only, as requires
> >> more
> >> >> > > > detailed
> >> >> > > > > > > work.
> >> >> > > > > > > > > > Should
> >> >> > > > > > > > > > > > be
> >> >> > > > > > > > > > > > > > extended if community is interested in it.*
> >> >> > > > > > > > > > > > > >
> >> >> > > > > > > > > > > > > > Looking forward to your comments!
> >> >> > > > > > > > > > > > > >
> >> >> > > > > > > > > > > > > > BR,
> >> >> > > > > > > > > > > > > > Yuriy Shuliha
> >> >> > > > > > > > > > > > > >
> >> >> > > > > > > > > > > > >
> >> >> > > > > > > > > > > > >
> >> >> > > > > > > > > > > > > --
> >> >> > > > > > > > > > > > > Best regards,
> >> >> > > > > > > > > > > > > Andrey V. Mashenkov
> >> >> > > > > > > > > > > > >
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > --
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > Best regards,
> >> >> > > > > > > > > > > Alexei Scherbakov
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > >
> >> >> > > > > > > > >
> >> >> > > > > > >
> >> >> > > > > > >
> >> >> > > > > > >
> >> >> > > > > > > --
> >> >> > > > > > > Best regards,
> >> >> > > > > > > Ivan Pavlukhin
> >> >> > > > > > >
> >> >> > > > >
> >> >> > > > >
> >> >> > > > >
> >> >> > > > > --
> >> >> > > > > Best regards,
> >> >> > > > > Ivan Pavlukhin
> >> >> > > > >
> >> >> > > >
> >> >> > >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Best regards,
> >> >> > Andrey V. Mashenkov
> >> >> >
> >> >>
> >> >
> >> >
> >> > --
> >> > Best regards,
> >> > Andrey V. Mashenkov
> >> >
> >>
> >
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message