drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sudheesh Katkam <skat...@maprtech.com>
Subject Re: Improving Performance of SELECT * FROM hive.table LIMIT 0
Date Fri, 02 Oct 2015 23:31:39 GMT
Hey y’all,

I see that DRILL-1617 <https://issues.apache.org/jira/browse/DRILL-1617> disables producer-consumer
because of correctness issues. Should we enable this visitor (as Venki suggested) and resolve
the issues?

Thank you,
Sudheesh

> On Sep 29, 2015, at 6:00 PM, Sudheesh Katkam <skatkam@maprtech.com> wrote:
> 
> My initial work  <https://github.com/sudheeshkatkam/drill/commit/7a5fa64828408842ab515734004433747a2b3ef0>on
this brought down the execution time from 1203 seconds to ~20 seconds (most of this is planning
time). As Jinfeng pointed out, the planning time can be reduced using the parquet metadata
feature.
> 
> Now given the limitation that Venki pointed out, how do we optimize LIMIT n (where n
> 0)? Is there a reason why we did not opt for the producer-consumer model?
> 
> Thank you,
> Sudheesh
> 
>> On Sep 25, 2015, at 11:35 AM, Venki Korukanti <venki.korukanti@gmail.com <mailto:venki.korukanti@gmail.com>>
wrote:
>> 
>> One issue in moving RecordReader creation to setup is in chained
>> impersonation support. Fragment thread can be running within query user
>> doAs block, but the setup is in doAs block of the user (may not be the
>> query user) whom we want to impersonate when reading the underlying data.
>> May be we should move towards the producer-consumer mode where the scan
>> batch is always running in a separate thread that way we can lazily setup
>> readers and it runs within its own doAs block?
>> 
>> Thanks
>> Venki
>> 
>> On Fri, Sep 25, 2015 at 6:48 AM, Jacques Nadeau <jacques@dremio.com <mailto:jacques@dremio.com>>
wrote:
>> 
>>> Another thought: record batch tree creation time should be short. If any
>>> substantial work needs to be done, we should move it to setup.
>>> On Sep 25, 2015 6:47 AM, "Jacques Nadeau" <jacques@dremio.com <mailto:jacques@dremio.com>>
wrote:
>>> 
>>>> Limit zero shouldn't use any readers if we know the schema. Look at the
>>>> upstream constant reduction rule. We should be able to go straight from
>>>> calcite algebra to result without hitting any execution code. Think
>>> direct
>>>> response same as explain.
>>>> On Sep 24, 2015 10:46 PM, "Jinfeng Ni" <jinfengni99@gmail.com <mailto:jinfengni99@gmail.com>>
wrote:
>>>> 
>>>>> The query itself is quite simple; it normally should not take 60
>>>>> seconds for planning. I guess most of the planning time is spent on
>>>>> reading parquet metadata. The metadata caching that Steven worked
>>>>> should help in this case.
>>>>> 
>>>>> 
>>>>> On Thu, Sep 24, 2015 at 10:42 PM, Sudheesh Katkam <skatkam@maprtech.com
<mailto:skatkam@maprtech.com>
>>>> 
>>>>> wrote:
>>>>>> For the table below, 33 seconds for execution (includes parquet reader
>>>>> initialization) and 60 seconds for planning.
>>>>>> 
>>>>>>> On Sep 24, 2015, at 10:01 PM, Jinfeng Ni <jinfengni99@gmail.com
<mailto:jinfengni99@gmail.com>>
>>>>> wrote:
>>>>>>> 
>>>>>>> "FragmentExecutor took 1,070,926 ms to create RecordBatch tree."
>>>>>>> 
>>>>>>> 1,070,926 ms ~ 17.x  minutes. In other words, the majority of
18
>>>>>>> minutes of execution in hive case is spent on the initialization
of
>>>>>>> Hive readers. If we want to improve "limit n", we probably should
>>> make
>>>>>>> "lazy" initialization of Hive reader; only when Drill has to
read
>>> rows
>>>>>>> from reader, we do the initialization. Otherwise, to initialize
all
>>>>>>> the readers before reading any single row means long setup time
for
>>>>>>> limit "n" query, when n is relative small.
>>>>>>> 
>>>>>>> For the second case, the 94 seconds query time seems to be too
long
>>> as
>>>>>>> well. I guess most of the time is spent on parquet reader
>>>>>>> initialization (?)
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Sep 24, 2015 at 9:32 PM, Sudheesh Katkam <
>>> skatkam@maprtech.com <mailto:skatkam@maprtech.com>>
>>>>> wrote:
>>>>>>>> Hey y'all,
>>>>>>>> 
>>>>>>>> ### Short Question:
>>>>>>>> 
>>>>>>>> How do we improve performance of SELECT * FROM plugin.table
LIMIT 0?
>>>>>>>> 
>>>>>>>> ### Extended Question:
>>>>>>>> 
>>>>>>>> While investigating DRILL-3623 <
>>>>> https://issues.apache.org/jira/browse/DRILL-3623 <https://issues.apache.org/jira/browse/DRILL-3623>>,
I did an analysis to
>>>>> see where we spend time for SELECT * FROM hive.table LIMIT 0 query.
>>>>>>>> 
>>>>>>>> ## Setup:
>>>>>>>> Copy the drill/sample-data/region.parquet (x 20000) into
a DFS
>>>>> (MapR-FS in my case) directory named region. Create a Hive external
>>> table
>>>>> pointing to region. Run Drill with default configuration.
>>>>>>>> 
>>>>>>>> ## Now there are two ways to query this table:
>>>>>>>> 
>>>>>>>>> SELECT * FROM hive.region LIMIT 0;
>>>>>>>> +--------------+---------+------------+
>>>>>>>> | r_regionkey  | r_name  | r_comment  |
>>>>>>>> +--------------+---------+------------+
>>>>>>>> +--------------+---------+------------+
>>>>>>>> No rows selected (1203.179 seconds)
>>>>>>>> ...
>>>>>>>> 
>>>>>>>>> SELECT * FROM dfs.test.region LIMIT 0;
>>>>>>>> +--------------+---------+------------+
>>>>>>>> | r_regionkey  | r_name  | r_comment  |
>>>>>>>> +--------------+---------+------------+
>>>>>>>> +--------------+---------+------------+
>>>>>>>> No rows selected (94.396 seconds)
>>>>>>>> 
>>>>>>>> Currently, we use HiveRecordReader for the first case and
>>>>> ParquetRecordReader in the second case. With DRILL-3209 <
>>>>> https://issues.apache.org/jira/browse/DRILL-3209 <https://issues.apache.org/jira/browse/DRILL-3209>>,
both queries will
>>> use
>>>>> ParquetRecordReader. However, for formats that are non-native to Drill
>>> or
>>>>> other storage plugins, we still face this problem. Summarizing the query
>>>>> profile,
>>>>>>>> +-------+-----------+---------------+----------------+
>>>>>>>> | Query | Fragments | Planning time | Execution time |
>>>>>>>> +-------+-----------+---------------+----------------+
>>>>>>>> | hive  | 1         | ~2 min        | ~18 min        |
>>>>>>>> | dfs   | 1         | ~1 min        | ~33 sec        |
>>>>>>>> +-------+-----------+---------------+----------------+
>>>>>>>> 
>>>>>>>> ## The time hogs:
>>>>>>>> 
>>>>>>>> # Planning time in both cases needs to improve. How?
>>>>>>>> 
>>>>>>>> # With respect to execution, in the first case
>>> ImplCreator.getExec(…)
>>>>> call in the FragmentExecutor took 1,070,926 ms to create RecordBatch
>>> tree.
>>>>> There are 20,000 readers being initialized in HiveScanBatchCreator. How
>>> do
>>>>> we avoid this? What are the implications of chained impersonation
>>> (opening
>>>>> readers in ctor() rather than in setup())?
>>>>>>>> 
>>>>>>>> ### Extending further:
>>>>>>>> 
>>>>>>>> This can be generalized to any "LIMIT n" query with n is
a small
>>>>> number. For n > 0, we parallelize scanning. So LIMIT 1 query runs
faster
>>>>> than LIMIT 0. However there is a sweet "n" after which parallelization
>>>>> hurts.
>>>>>>>> 
>>>>>>>> ###
>>>>>>>> 
>>>>>>>> Thank you,
>>>>>>>> Sudheesh
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message