Hi, yes you can, also I've developed an engine to perform ETL.

I've build a Rest service with Akka, with a method called "execute" that recibe a JSON structure representing the ETL. 
You just need to configure your embedded standalone Spark, I did something like this, this is in scala:

val spark = SparkSession
      .config("spark.sql.warehouse.dir", config.getString("cache.directory") )
      .config("spark.driver.memory", "11g")
      .config("spark.executor.memory", "11g")
      .config("spark.cores.max", "12")
      .config("spark.deploy.defaultCores", "3")
      .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
      .config("spark.kryo.unsafe", true)

(This will crash if you have already a Spark running ...)

And use the spark variable as you wish.

On Thu, Dec 6, 2018 at 9:23 PM sparkuser99 wrote:

I have a use case to process simple ETL like jobs. The data volume is very
less (less than few GB), and can fit easily on my running java application's
memory. I would like to take advantage of Spark dataset api, but don't need
any spark setup (Standalone / Cluster ). Can I embed spark in existing Java
application and still use ?

I heard local spark mode is only for testing. For small data sets like, can
this still be used in production? Please advice if any disadvantages.


Ing. Ivaldi Andres