calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jesus Camacho Rodriguez <jcama...@apache.org>
Subject Re: Updates on Benchmarking and Optimization Research for Calcite
Date Mon, 05 Mar 2018 17:41:13 GMT
Hi Ashwin,

1) It is important that table/column stats are available, so Calcite can trigger correctly
its cost-based optimizations. You can do that either manually by running ANALYZE... COMPUTE
STATISTICS FOR COLUMNS statement, or enabling hive.stats.autogather indeed.

2) Calcite-based optimizer is enabled by default, hence you do not need to set any other flag.

Calcite will log messages during optimization, so if you set the correct logger level for
Calcite (e.g. DEBUG), you will see messages, e.g., with the Calcite rules that have been triggered.
In turn, optimization time for every optimization stage is recorded using PerfLogger, so you
will be able to see this information in the logs (or you could add your own if you need to).

If you had more questions about Hive optimizer vs Calcte in general, I would suggest that
you use the Hive dev list to ask them, as you may be able to get more help over there.

-Jesús


On 3/3/18, 7:40 AM, "AshwinKumar AshwinKumar" <aashwin@g.clemson.edu> wrote:

    Hello Dev Team,
    
    I am trying to run queries on Apache HIVE by setting the flag
    *hive.cbo.enabled* to true and also to false and then compare the metrics.
    I have a few questions regarding the same -
    1. Do I need to set *hive.stats.autogather(to gather the tables statistics)*
    to true as well before setting turning on the CBO.
    2. Is there any other flags which I need to set to activate the calcite CBO
    .
    
    Also could you please let me know what is best way to obtain any
    instrumentation data from Calcite process.
    
    Thanks,
    Ashwin
    
    On Thu, Mar 1, 2018 at 2:26 AM, Riccardo Tommasini <
    riccardo.tommasini@polimi.it> wrote:
    
    > Hello,
    >
    > I can definitely help if you need me to do something.
    >
    > And I would also like to join the online meeting.
    >
    > Cheers,
    >
    > On 20 Feb 2018, 22:13 +0100, Edmon Begoli <ebegoli@gmail.com>, wrote:
    > Just a quick update on the progress of benchmarking setup for Calcite, and
    > a call to you for feedback and participation:
    >
    > 1. We (Ashwin Vajantri. member of my team) has installed Postgres and Hive
    > on our servers, and he has loaded TPC-DS benchmark data, and ran some test
    > queries. He also installed Calcite on top of Postgres so we can do
    > comparisons of performance for through Calcite vs. native.
    > (we have a full documentation for all this in a Google Doc I shared with
    > those interested in this work. We'll make if public once complete)
    >
    > 2. Another colleague, Dr. Seung-Hwan Lim is ready to look into more
    > detailed benchmarking and optimization aspects, as well as to look into
    > other engines that we work with and know -- MapD, Spark, Druid, Cassandra,
    > or Flink.
    >
    > All this so far is based, and in support of following JIRA issues:
    > https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-2168
    > https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-2169
    >
    > My question to the community is:
    >
    > 1. Does anyone have any feedback on specific queries or engines we want to
    > target, and start with?
    >
    > 2. How can we meaningfully turn on and turn off Hive optimizer to measure
    > the performance?
    >
    > 3. Anyone wants to pitch in help in any area?
    >
    > I am planning to schedule an online meeting next week to connect and
    > discuss for those interested.
    >
    



Mime
View raw message