spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Նարեկ Գալստեան <>
Subject Interactively search Parquet-stored data using Spark Streaming and DataFrames
Date Mon, 28 Sep 2015 15:45:13 GMT
I have significant amount of data stored on my Hadoop HDFS as Parquet files
I am using Spark streaming to interactively receive queries from a web
server and transform the received queries into SQL to run on my data using

In this process I need to run several SQL queries and then return some
aggregate result by merging or subtracting the results of individual

Are there any ways I could optimize and increase the speed of the process
by, for example, running queries on already received dataframes rather than
the whole database?

Is there a better way to interactively query the Parquet stored data and
give results?

Thank you!

Narek Galstyan

Նարեկ Գալստյան

View raw message