spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manjunath Shetty H <manjunathshe...@live.com>
Subject Re: Saving Spark run stats and run watermark
Date Wed, 18 Mar 2020 12:07:10 GMT
Thanks for suggestion Netanel,

Sorry for less information, I am specifically looking for something inside Hadoop ecosystem.


-
Manjunath
________________________________
From: Netanel Malka <netanel246@gmail.com>
Sent: Wednesday, March 18, 2020 5:26 PM
To: Manjunath Shetty H <manjunathshetty@live.com>
Subject: Re: Saving Spark run stats and run watermark

You can try to use a RDBMS like postgrsql or mysql.
I would use a regular table.
Spark have an built-in integration for that:
https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html


On Wed, Mar 18, 2020, 13:03 Manjunath Shetty H <manjunathshetty@live.com<mailto:manjunathshetty@live.com>>
wrote:
Hi All,

Want to save each spark batch run stats (start, end, ID etc) and watermark ( Last processed
timestamp from external data source).

We have tried Hive JDBC, but it is very slow due MR jobs it will trigger. Cant save to normal
Hive tables as it will create lots of small files in HDFS.

Please suggest what is the recommended way to do this ? Any pointers will be helpful

Thanks and regards
Manjunath

Mime
View raw message