spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eike von Seggern <eike.segg...@sevenval.com>
Subject Re: Alternatives for dataframe collectAsList()
Date Tue, 04 Apr 2017 14:38:34 GMT
Hi,

depending on what you're trying to achieve `RDD.toLocalIterator()` might
help you.

Best

Eike


2017-03-29 21:00 GMT+02:00 szep.laszlo.it <szep.laszlo.it@gmail.com>:

> Hi,
>
> after I created a dataset
>
> Dataset<Row> df = sqlContext.sql("query");
>
> I need to have a result values and I call a method: collectAsList()
>
> List<Row> list = df.collectAsList();
>
> But it's very slow, if I work with large datasets (20-30 million records).
> I
> know, that the result isn't presented in driver app, that's why it takes
> long time, because collectAsList() collect all data from worker nodes.
>
> But then what is the right way to get result values? Is there an other
> solution to iterate over a result dataset rows, or get values? Can anyone
> post a small & working example?
>
> Thanks & Regards,
> Laszlo Szep
>

Mime
View raw message