flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [flink] liyafan82 edited a comment on issue #8397: [FLINK-11421][Table SQL/Runtime]Add compilation options to allow comp…
Date Thu, 16 May 2019 04:47:18 GMT
liyafan82 edited a comment on issue #8397: [FLINK-11421][Table SQL/Runtime]Add compilation
options to allow comp…
URL: https://github.com/apache/flink/pull/8397#issuecomment-492914042
 
 
   > Hi @liyafan82 Thank for your nice works~ I think I have a clearer understanding now.
   > I think your benchmarks illustrates two points:
   > 
   > 1. JCA is much better than Janino in vector computation. (150% - 200%)
   > 2. JCA compilation time is about 0.1 second - 1 second slower.
   > 
   > But we probably don't have any vector computing right now. Can you continue to benchmark
how our table works right now? (It would be better if you could provide reproducible code)
   
   Hi @JingsongLi and @KurtYoung , I have evaluated the benchmark on table work load, as you
suggested. The results are as follows. Please give your valuable feedback:
   
   We use the same benchmark (TPC-H Q1, 1TB) to evaluate the effects of compilation options.
This time, we use the original Flink runtime engine, instead of vectorization. The effects
are notable, but not as significant as with vectorization. 
   
   Likewise, we only consider operators Calc (ID = 5) and LongHashAggregate (ID = 6) in our
analysis. 
   
   The table below shows the average time (in ms) for each operator in processing Q1 in our
cluster:
   
   
   Operator\Compiler | JCA | Janino
   -- | -- | --
   Calc (ID = 5) | 4902.18 | 5482.86
   LongHashAggregate (ID = 6) | 2967.2 | 3257.92
   
   It can be seen that the code compiled by JCA runs about 9.8% faster. 
   
   The following table shows the compilation time (in ms) with different compilers. The results
are similar to the previous benchmark results. 
   
   
   Operator\Compiler | JCA | Janino
   -- | -- | --
   Calc (ID = 5) | 124 | 12
   LongHashAggregate (ID = 6) | 850 | 31
   GlobalHashAggregate (ID = 8) | 225 | 100
   Calc (ID = 11) | 105 | 14
   SinkConversion (ID = 12) | 100 | 5
   
   Investigations on compiled class files show that, different compilers produce different
class files sizes. 
   
   
   Operator\Compiler | JCA | Janino
   -- | -- | --
   Calc (ID = 5) | 4 KB | 3 KB
   LongHashAggregate (ID = 6) | 10 KB | 8 KB
   GlobalHashAggregate (ID = 8) | 17 KB | 12 KB
   Calc (ID = 11) | 4 KB | 3 KB
   SinkConversion (ID = 12) | 2 KB | 2 KB
   
   By analyzing the byte code, we found there are differences in the code structure of bytecodes,
which accounts for the performance differences. For example, the following figure shows the
bytecode of the processElement method of Calc (ID = 5):
   
   ![image](https://user-images.githubusercontent.com/42827532/57826752-360ce700-77d7-11e9-881e-dae138c6f195.png)
   
   To make it more convenient to reproduce the results, we have attached the source code for
generated operators. Just by compiling the code with different compilers and generating some
test data set, the  above results should be reproduced locally (Our results are derived from
the cluster). 
   
   [code.zip](https://github.com/apache/flink/files/3185415/code.zip)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message