ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Mashenkov <andrey.mashen...@gmail.com>
Subject Re: [DISCUSS] IEP-71 Public API for secondary index search
Date Mon, 12 Apr 2021 10:45:59 GMT
Maksim,

Am I right, that you mean this ticket [1] *IGNITE-12291 Create controllable
> paged query requests / responses for TextQuery similar to current SQL
> result processing*, when talked about incorrect limit work for
> TextQueries?
>
Yes, sure, that's it.

On Mon, Apr 12, 2021 at 12:15 PM Maksim Timonin <timonin.maxim@gmail.com>
wrote:

> Hi, Andrey!
>
> Am I right, that you mean this ticket [1] *IGNITE-12291 Create
> controllable paged query requests / responses for TextQuery similar to
> current SQL result processing*, when talked about incorrect limit work
> for TextQueries?
>
>
> [1] https://issues.apache.org/jira/browse/IGNITE-12291
>
> On Thu, Apr 8, 2021 at 4:32 PM Maksim Timonin <timonin.maxim@gmail.com>
> wrote:
>
>> Hi, Andrey!
>>
>> >> ScanQuery, TextQuery and partially SQL query share the same
>> infrastructure
>> I think I understand what you mean. I debug query processing and now
>> agree that it's a nice idea to try to reuse the infrastructure of scan and
>> text queries. Also as I can see there already Reducer functionality exists,
>> so I hope we can use that. I'm not absolutely confident now that it will
>> work fine, but I'm going to start there. Thanks for pointing me this
>> direction!
>>
>> >> I don't like the idea a user code will be executed inside BTree
>> operation
>> On the confluence page I've shown that a predicate passes as
>> TreeRowClosure. In this case you're right, any exception in a predicate
>> will lead to a CorruptedTreeException. But I see another legal way to
>> implement the predicate operation. BPlusTree.find accepts the X param that
>> passed to IO.getRow(). As I understand this param helps to control how much
>> returned row is filled. Then we can use it to return an object that
>> contains only basic info - link, pageAddr, offset. Then predicate operation
>> will be applied on the higher level on a cursor returned by a tree (like
>> H2TreeIndex does). It's safe to run user code there, we can handle
>> exceptions there.
>>
>>
>>
>> On Wed, Apr 7, 2021 at 4:46 PM Andrey Mashenkov <
>> andrey.mashenkov@gmail.com> wrote:
>>
>>> Maksim,
>>>
>>> The ScanQuery API provides a filter as
>>> > param that for case of index query should be splitted on such
>>> conditions.
>>> > It looks like a non-trivial task.
>>> >
>>> ScanQuery, TextQuery and partially SQL query share the same
>>> infrastructure.
>>> I've thought we could extend, improve and reuse some ScanQuery code that
>>> already works fine: map query on topology, IO, batching.
>>> Add IndexCondition alongside the Filter, and abstract query executor from
>>> source (primary and secondary Indexes).
>>> Add a sorted merge algorithm to the query merge stage. It can be very
>>> useful also for TextQueries that suffers from the absence of sorted merge
>>> and a "limit' condition work incorrectly.
>>>
>>> If you think it will be too hard than creating from scratch, I'm ok.
>>>
>>> 3. Ignite creates a proxy object that is filled with objects that are
>>> > inlined. If a user tries to access a field that isn't inlined or not
>>> > indexed, then deserialization will start and Ignite will log.warn()
>>> about
>>> > that.
>>> >
>>> Agree, this can be faster.
>>> I don't like the idea a user code will be executed inside BTree
>>> operation,
>>> any exception can cause FailureHandler triggering and stop the node.
>>>
>>> There is one more thing that could be improved.
>>> ScanQuery now iterates over per-partition PK Hash index trees and has
>>> performance issues on a small grid with a large number of partitions.
>>> So, there are many partitions on every node and many trees should be
>>> scanned.
>>> In this case scan over a secondary index gives significant boots even if
>>> every row is materialized, because we need to traverse over a single tree
>>> per-node.
>>> Having the ability to run a ScanQuery over a secondary index (if one
>>> exists) instead of PK Hash will be great.
>>>
>>>
>>> On Wed, Apr 7, 2021 at 11:18 AM Maksim Timonin <timonin.maxim@gmail.com>
>>> wrote:
>>>
>>> > Hi, Andrey!
>>> >
>>> > Thanks for the review and your comments!
>>> >
>>> > >> Is it possible to extend ScanQuery functionality to pass index
>>> condition
>>> > I investigated this way and see some issues:
>>> > 1. Querying of indexes is not a scan actually. It's
>>> > a tree traverse (predicate operation is an exclusion, other operations
>>> like
>>> > gt, lt, min, max have explicit boundaries). An index query consists of
>>> > conditions that match an index structure. In general for a multi-key
>>> index
>>> > there can be multiple conditions. The ScanQuery API provides a filter
>>> as
>>> > param that for case of index query should be splitted on such
>>> conditions.
>>> > It looks like a non-trivial task.
>>> > 2. Querying of an index requires a sorted result, while The ScanQuery
>>> > doesn't matter about that. So there will be a different behavior of the
>>> > iterator for scanning a cache and querying indexes. It's not much to
>>> > implement I think, but it can make ScanQuery unclear for a user.
>>> >
>>> > Maybe it's a point to separate traverse (gt, lt, in, etc...) and scan
>>> > (predicate) index operations to different API. So there still will be
>>> a new
>>> > query type for the traversing.
>>> >
>>> > But we will introduce some inheritors for ScanQuery, like
>>> TableScanQuery
>>> > and IndexScanQuery, for scan and filter. Then the question is about
>>> > ordering, Cache and Table scans aren't ordered, but Index is. Then we
>>> can
>>> > introduce an optional param "order" for ScanQuery too.
>>> >
>>> > WDYT?
>>> >
>>> > >> Functional indices
>>> > >> This task looks like a huge one because the lifecycle of such
>>> classes
>>> > should be described first
>>> > I agree with you. That this part should be investigated deeper than I
>>> did.
>>> > So let's postpone discussion about functional indexes for a while.
>>> IEP-71
>>> > declares some phases, functional indexes are part of the 2nd phase, but
>>> > users will get new functionality already from the 1st phase. Then I'll
>>> dig
>>> > into things you mentioned. Thanks for pointing them out.
>>> >
>>> > >> IndexScan by the predicate is questionable
>>> > Also in comments to the IEP on the Confluence you mentioned about
>>> > deserialization that is required to get an object for predicate
>>> function.
>>> > Now I see it like that:
>>> > 1. The predicate should operate only with indexed fields;
>>> > 2. User win from predicate only if index is inlined properly (even a
>>> part
>>> > of rows aren't inlined due to varlen - it still can be faster then
>>> make a
>>> > ScanQuery);
>>> > 3. Ignite creates a proxy object that is filled with objects that are
>>> > inlined. If a user tries to access a field that isn't inlined or not
>>> > indexed, then deserialization will start and Ignite will log.warn()
>>> about
>>> > that.
>>> >
>>> > So, I think it's a valid use case. Is there smth I'm missing?
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Tue, Apr 6, 2021 at 6:21 PM Andrey Mashenkov <
>>> > andrey.mashenkov@gmail.com>
>>> > wrote:
>>> >
>>> > > Hi Maksim,
>>> > >
>>> > > Nice idea, I'd like to see this feature in Ignite.
>>> > > The motivation is clear to me, it would be nice to have fast scans
>>> and
>>> > omit
>>> > > SQL overhead on planning, parsing and etc in some simple use-cases.
>>> > >
>>> > > I've left few minor comments to the IEP, but I have the next
>>> questions
>>> > > which answer I failed to find in IEP.
>>> > > 1. Is it possible to extend ScanQuery functionality to pass index
>>> > condition
>>> > > as a hint/parameter rather than create a separate query type?
>>> > > This allows a user to run a query over the particular table (for
>>> > > multi-table per cache case) and use an index for some type of
>>> conditions.
>>> > >
>>> > > 2. Functional indices, as you wrote, should use Functions
>>> distributed via
>>> > > peerClassLoading mechanics.
>>> > > This means there will no class with function on server sides and such
>>> > > classes are not persistent. Seems, they can survive grid restart.
>>> > > This task looks like a huge one because the lifecycle of such classes
>>> > > should be described first.
>>> > > Possible pitfalls are:
>>> > > * Durability. Function code MUST be persistent, to survive node
>>> restart
>>> > as
>>> > > there can be no guaranteed classes available on the server-side.
>>> > > * Consistency. Server (and maybe clients) nodes MUST have the same
>>> class
>>> > > code at a time.
>>> > > * Code ownership. Would class code be shared or per-cache? If first,
>>> you
>>> > > can't just change class code by loading a new one, because other
>>> caches
>>> > may
>>> > > use this function.
>>> > > If second, different caches may have different code/behavior, that
>>> may be
>>> > > non-obvious to end-user.
>>> > >
>>> > > 3. IndexScan by the predicate is questionable.
>>> > > Maybe it will can faster if there are multiple tables in a cache, but
>>> > looks
>>> > > similar to ScanQuery with a filter.
>>> > >
>>> > > Also, I believe we can have a common API (configuring, creating,
>>> using)
>>> > for
>>> > > all types of Indices, but
>>> > > some types (e.g. functional) will be ignored in SQL due to limited
>>> > support
>>> > > on H2 side,
>>> > > and other types will be shared and could be used by ScanQuery engine
>>> as
>>> > > well as by SQL engine.
>>> > >
>>> > > On Tue, Apr 6, 2021 at 4:14 PM Maksim Timonin <
>>> timonin.maxim@gmail.com>
>>> > > wrote:
>>> > >
>>> > > > Hi, Igniters!
>>> > > >
>>> > > > I'd like to propose a new feature - opportunity to query and create
>>> > > indexes
>>> > > > from public API.
>>> > > >
>>> > > > It will help in some cases, where:
>>> > > > 1. SQL is not applicable by design of user application;
>>> > > > 2. Where IndexScan is preferable than ScanQuery for performance
>>> > reasons;
>>> > > > 3. Functional indexes are required.
>>> > > >
>>> > > > Also it'll be great to have a transactional support for such
>>> queries,
>>> > > like
>>> > > > the "select for update" query provides. But I don't dig there
>>> much. It
>>> > > will
>>> > > > be a next step if this API will be implemented.
>>> > > >
>>> > > > I've prepared an IEP-71 for that [1] with more details. Please
>>> share
>>> > your
>>> > > > thoughts.
>>> > > >
>>> > > >
>>> > > > [1]
>>> > > >
>>> > > >
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
>>> > > >
>>> > >
>>> > >
>>> > > --
>>> > > Best regards,
>>> > > Andrey V. Mashenkov
>>> > >
>>> >
>>>
>>>
>>> --
>>> Best regards,
>>> Andrey V. Mashenkov
>>>
>>

-- 
Best regards,
Andrey V. Mashenkov

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message