spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Donni Khan <prince.don...@googlemail.com>
Subject run huge number of queries in Spark
Date Wed, 04 Apr 2018 08:56:46 GMT
Hi all,

I want to run huge number of queries on Dataframe in Spark. I have a big
data of text documents, I loded all documents into SparkDataFrame and
create a temp table.

dataFrame.registerTempTable("table1");

I have more than 50,000 terms, I want to get the document frequency for
each by using the "table1".

I use the follwing:

DataFrame df=sqlContext.sql("select count(ID) from table1 where text like
'%"+term+"%'");

but this scenario needs much time to finish because I have t run it from
Spark Driver for each term.


Does anyone has idea how I can run all queries in distributed way?

Thank you && Best Regards,

Donni

Mime
View raw message