spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <>
Subject Re: Spark RDD + HBase: adoption trend
Date Wed, 20 Jan 2021 14:14:48 GMT
Hi Marco,

IMHO RDD is only for very sophisticated use cases that very few Spark devs
would be capable of. I consider RDD API a sort of Spark assembler and most
Spark devs should stick to Dataset API.

Speaking of HBase, see
where you can find a demo that I worked on last year and made sure that:

"Apache HBase™ Spark Connector implements the DataSource API for Apache
HBase and allows executing relational queries on data stored in Cloud

That makes hbase-rdd even more obsolete but not necessarily unusable (I am
little skilled in the HBase space to comment on this).

I think you should consider merging the project hbase-rdd of yours with the
official Apache HBase™ Spark Connector at (as they seem
to lack active development IMHO).

Jacek Laskowski
"The Internals Of" Online Books <>
Follow me on


On Wed, Jan 20, 2021 at 2:44 PM Marco Firrincieli <>

> Hi, my name is Marco and I'm one of the developers behind
> a project we are currently reviewing for various reasons.
> We were basically wondering if RDD "is still a thing" nowadays (we see
> lots of usage for DataFrames or Datasets) and we're not sure how much of
> the community still works/uses RDDs.
> Also, for lack of time, we always mainly worked using Cloudera-flavored
> Hadoop/HBase & Spark versions. We were thinking the community would then
> help us organize the project in a more "generic" way, but that didn't
> happen.
> So I figured I would ask here what is the gut feeling of the Spark
> community so to better define the future of our little library.
> Thanks
> -Marco
> ---------------------------------------------------------------------
> To unsubscribe e-mail:

View raw message