spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 15313776907 <>
Subject Re: how to generate a larg dataset paralleled
Date Fri, 14 Dec 2018 08:39:45 GMT

I also have this problem, hope to be able to solve here, thank you
On 12/14/2018 10:38,lk_spark<> wrote:
    I want't to generate some test data , which contained about one hundred million rows .
    I create a dataset have ten rows ,and I do df.union operation in 'for' circulation , but
this will case the operation only happen on driver node.
    how can I do it on the whole cluster.
View raw message