I'm very late to this party and I get hbase-spark... what's the recommendation for pyspark + hbase? I realize this isn't necessarily a concern of the spark project, but it'd be nice to at least document it here with a very short and sweet response because I haven't found anything useful in the wild besides using the approach in the examples with pythonconverters, which were dropped in 2.0.


On Thu, Apr 21, 2016 at 1:47 PM, Ted Yu <yuzhihong@gmail.com> wrote:
I have mentioned the JIRA numbers in the thread starting with (note the typo in subject of this thread):

RFC: Remove ...

On Thu, Apr 21, 2016 at 1:28 PM, Zhan Zhang <zzhang@hortonworks.com> wrote:
FYI: There are several pending patches for DataFrame support on top of HBase.


Zhan Zhang

On Apr 20, 2016, at 2:43 AM, Saisai Shao <sai.sai.shao@gmail.com> wrote:

+1, HBaseTest in Spark Example is quite old and obsolete, the HBase connector in HBase repo has evolved a lot, it would be better to guide user to refer to that not here in Spark example. So good to remove it.


On Wed, Apr 20, 2016 at 1:41 AM, Josh Rosen <joshrosen@databricks.com> wrote:
+1; I think that it's preferable for code examples, especially third-party integration examples, to live outside of Spark.

On Tue, Apr 19, 2016 at 10:29 AM Reynold Xin <rxin@databricks.com> wrote:
Yea in general I feel examples that bring in a large amount of dependencies should be outside Spark.

On Tue, Apr 19, 2016 at 10:15 AM, Marcelo Vanzin <vanzin@cloudera.com> wrote:
Hey all,

Two reasons why I think we should remove that from the examples:

- HBase now has Spark integration in its own repo, so that really
should be the template for how to use HBase from Spark, making that
example less useful, even misleading.

- It brings up a lot of extra dependencies that make the size of the
Spark distribution grow.

Any reason why we shouldn't drop that example?


To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org