depending on what you're trying to achieve `RDD.toLocalIterator()` might help you.



2017-03-29 21:00 GMT+02:00 szep.laszlo.it <szep.laszlo.it@gmail.com>:

after I created a dataset

Dataset<Row> df = sqlContext.sql("query");

I need to have a result values and I call a method: collectAsList()

List<Row> list = df.collectAsList();

But it's very slow, if I work with large datasets (20-30 million records). I
know, that the result isn't presented in driver app, that's why it takes
long time, because collectAsList() collect all data from worker nodes.

But then what is the right way to get result values? Is there an other
solution to iterate over a result dataset rows, or get values? Can anyone
post a small & working example?

Thanks & Regards,
Laszlo Szep