spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "冯宝利" <>
Subject spark3.0 read kudu data
Date Tue, 08 Dec 2020 02:45:55 GMT
    Recently, we are upgrading spark from 2.4 to 3.0. We are doing performance testing and
found some performance problems.Through the comparative test, it is found that spark3.0 reads
kudu data much slower than 2.4. Normally, spark2.4 takes 0.1-1s to read the same amount of
data, but spark3.0 takes 1 minute to 2 minutes.Both versions of spark use the same spark submit
parameter and run in local mode. The read kudu clusters, tables and query conditions are consistent.
    The only difference is that the kudu spark package is different, and that for spark2.4
is kudu-spark2_2.11,scala version is  2.11, spark3.0 uses kudu-spark3_2.12 ,scala  version
is  2.12(This package is based on the Java version compiled by kudu 1.13,use spark 3.0.0
and scala 2.12 pom.xml file )
    Our cluster uses CDH 6.3.1 and kudu version is 1.10.In view of this situation, what can
be optimized or suggestions to improve the performance of kudu reading data?
View raw message