spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wdaehn <>
Subject Scala code as "spark view"
Date Tue, 19 Jul 2016 08:26:21 GMT
Using Spark via the Thrift server is fine and good but it limits yourself to
simple SQL queries. For all complex Spark logic you have to submit a job
first, write the result into a table and then query the table.
This has obviously the limitation that 
a) The user executing the query cannot pass in information
b) The user executing the query has no idea how current the intermediate
table is
c) Requires to compile the analytic logic into a jar file, upload it, submit

Using spark-shell all is much more interactive obviously, you write the
Scala code line by line and visualize the result. But obviously not via

Wouldn't it make sense to support a syntax like 

create temporary table myview using Scala options (sourcecode
'val dataframe = sqlContext......
%table dataframe'

There is a directory with millions of files.
A trained MLlib model is used to categorize these files, output is a
Via JDBC you want to get the categorization of all files with the name
text_2016_07_*.txt only.

Does it make sense? I can't see how this could be done today without a lot
of disadvantages but I am far from being an expert, so please bare with me.

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe e-mail:

View raw message