spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sun, Rui" <rui....@intel.com>
Subject RE: Including additional scala libraries in sparkR
Date Mon, 13 Jul 2015 09:27:03 GMT
Hi, Michal,

SparkR comes with a JVM backend that supports Java object instantiation, calling Java instance
and static methods from R side. As defined in https://github.com/apache/spark/blob/master/R/pkg/R/backend.R,
newJObject() is to create an instance of a Java class;
callJMethod() is to call an instance method of a Java object;
callJStatic() is to call a static method of a Java class.

If the thing is as simple as data visualization, you can use the above low-level functions
to create an instance of your HBASE RDD in JVM side, collect the data to R side, and visualize
it.

However, if you want to do HBASE RDD transformation and HBASE table update, things are quite
complex now. SparkR supports majority of RDD API (though not exposed publicly in 1.4 release)
allowing transformation functions in R code, but currently it only supports RDD source from
text files and SparkR Data Frames, so your HBASE RDDs can't be used by SparkR RDD API for
further processing.

You can use --jars to include your scala library to be accessed by the JVM backend.

________________________________
From: Michal Haris [michal.haris@visualdna.com]
Sent: Sunday, July 12, 2015 6:39 PM
To: user@spark.apache.org
Subject: Including additional scala libraries in sparkR

I have spark program with a custom optimised rdd for hbase scans and updates. I have a small
library of objects in scala to support efficient serialisation, partitioning etc. I would
like to use R as an analysis and visualisation front-end. I have tried to use rJava (i.e.
not using sparkR) and I got as far as initialising the spark context but I have encountered
problems with hbase dependencies (HBaseConfiguration : Unsupported major.minor version 51.0)
so tried sparkR but I can't figure out how to make my custom scala classes available to sparkR
other than re-implementing them in R. Is there a way to include and invoke additional scala
objects and RDDs within sparkR shell/job ? Something similar to additional jars and init script
in normal spark submit/shell..

--
Michal Haris
Technical Architect
direct line: +44 (0) 207 749 0229
www.visualdna.com<http://www.visualdna.com> | t: +44 (0) 207 734 7033
31 Old Nichol Street
London
E2 7HR

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message