ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roman Kondakov <kondako...@mail.ru.INVALID>
Subject Re: New SQL execution engine
Date Mon, 18 Nov 2019 10:03:43 GMT
Hi, Steve

This behavior is actually not a bug, but this is not obvious. I'll try 
to explain.

When query parallelism = N is turned on, it means that each cache is 
divided into N parts from the SQL point of view. Every SQL query is 
executed independently over each particular part, and then results are 
merged together during the reducer step.

This is absolutely identical to the distributed query execution, where 
instead of a single node with query parallelism = N, we have N nodes 
with query parallelism = 1. SQL query is executed over each partition of 
data on all nodes and then results are merged on reducer.

As we can see, query parallelism is equivalent to the distributed query 
execution. When we do joins over distributed tables, we need to think 
about the collocation of data [1]. If data is not collocated, we get a 
wrong result. This happens silently, which is not good, IMO.

I reworked your example a bit in order to impose collocation on the 
joining key and now join returns correct result [2].

Current approach in configuration and query execution looks very 
uncomfortable and should be completely redesigned in the new engine.

[1] https://apacheignite-sql.readme.io/docs/distributed-joins

[2] https://github.com/hostettler/igniteParallelQueries/pull/1

Kind Regards
Roman Kondakov

On 16.11.2019 12:50, steve.hostettler@gmail.com wrote:
> Actually I am now wondering whether this is not just a bug and that I should
> record it as such. As the behavior is different with and without the
> parallelism and there is no warning during execution or in the api.
> Any thought?
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

View raw message