spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AJT <>
Subject Spark SQL query
Date Thu, 06 Oct 2016 13:40:34 GMT
>From what I have read on Spark SQL - you need to already have a dataframe
which you can then query on - e.g. select * from myDataframe where
Where the dataframe is either a Hive table or Avro file etc.

What if you want to create a dataframe from your underlying data on the fly
with input parameters passed into your job. 
1. Read my data files (e.g. avro) into a dataframe dependent on what
arguments are passed (e.g. date range)
2. perform map / mapPartitions / filter / GroupBy functions on the dataframe
to create a new dataframe
3. output this dataframe

I can see how to do this in a standard spark application (e.g. run via
spark-submit) but what if I want to use one of the myriad of tools
(Tableau/Qlik etc) that are SparkSQL compliant and run my job from there? Is
there a way I can do:

select * from

Appreciate any help

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe e-mail:

View raw message