spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Tremblay <paulhtremb...@gmail.com>
Subject Re: Alternatives for dataframe collectAsList()
Date Tue, 04 Apr 2017 00:44:49 GMT
What do you want to do with the results of the query?

Henry

On Wed, Mar 29, 2017 at 12:00 PM, szep.laszlo.it <szep.laszlo.it@gmail.com>
wrote:

> Hi,
>
> after I created a dataset
>
> Dataset<Row> df = sqlContext.sql("query");
>
> I need to have a result values and I call a method: collectAsList()
>
> List<Row> list = df.collectAsList();
>
> But it's very slow, if I work with large datasets (20-30 million records).
> I
> know, that the result isn't presented in driver app, that's why it takes
> long time, because collectAsList() collect all data from worker nodes.
>
> But then what is the right way to get result values? Is there an other
> solution to iterate over a result dataset rows, or get values? Can anyone
> post a small & working example?
>
> Thanks & Regards,
> Laszlo Szep
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Alternatives-for-dataframe-
> collectAsList-tp28547.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>


-- 
Paul Henry Tremblay
Robert Half Technology

Mime
View raw message