spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean Georges Perrin <...@jgp.net>
Subject Re: how to generate a larg dataset paralleled
Date Fri, 14 Dec 2018 03:10:04 GMT
You just want to generate some data in Spark or ingest a large dataset outside of Spark? What’s
the ultimate goal you’re pursuing?

jg


> On Dec 13, 2018, at 21:38, lk_spark <lk_spark@163.com> wrote:
> 
> hi,all:
>     I want't to generate some test data , which contained about one hundred million rows
.
>     I create a dataset have ten rows ,and I do df.union operation in 'for' circulation
, but this will case the operation only happen on driver node.
>     how can I do it on the whole cluster.
>  
> 2018-12-14
> lk_spark

Mime
View raw message