spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Նարեկ Գալստեան <ngalsty...@gmail.com>
Subject Interactively search Parquet-stored data using Spark Streaming and DataFrames
Date Mon, 28 Sep 2015 15:45:13 GMT
I have significant amount of data stored on my Hadoop HDFS as Parquet files
I am using Spark streaming to interactively receive queries from a web
server and transform the received queries into SQL to run on my data using
SparkSQL.

In this process I need to run several SQL queries and then return some
aggregate result by merging or subtracting the results of individual
queries.

Are there any ways I could optimize and increase the speed of the process
by, for example, running queries on already received dataframes rather than
the whole database?

Is there a better way to interactively query the Parquet stored data and
give results?

Thank you!



Narek Galstyan

Նարեկ Գալստյան

Mime
View raw message