spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Artemis User <>
Subject Performance Improvement with Hive/Thrift Server
Date Mon, 12 Jul 2021 15:14:23 GMT
We are trying to switch from Postgres to the Spark's built-in Hive with 
Thrift server as the data sink to persist the ML result data, with the 
hope that Hive would improve the ML pipeline performance. However, it 
turned out that it took significantly longer for Hive to persist 
dataframes (via the SQL's saveAsTable API) for Postgres using JDBC.  
Does anyone have experienced similar problems with Hive?  Any 
recommendations in performance improvement would be highly appreciated.

We are using Spark in standalone mode.   I would assume that running 
Spark on a real Hive database or on simply on Hadoop would be more 
desired.  Has anyone done any performance comparison between running 
Spark with built-in Hive (with just the metastore) vs Spark on a 
full-fledged Hive DB vs Spark with built-in Hive on Hadoop? Thanks!

-- ND

To unsubscribe e-mail:

View raw message