spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "lk_spark"<lk_sp...@163.com>
Subject Re: Re: how to generate a larg dataset paralleled
Date Fri, 14 Dec 2018 04:04:45 GMT
generate some data in Spark .

2018-12-14 

lk_spark 



发件人:Jean Georges Perrin <jgp@jgp.net>
发送时间:2018-12-14 11:10
主题:Re: how to generate a larg dataset paralleled
收件人:"lk_spark"<lk_spark@163.com>
抄送:"user.spark"<user@spark.apache.org>

You just want to generate some data in Spark or ingest a large dataset outside of Spark? What’s
the ultimate goal you’re pursuing?


jg



On Dec 13, 2018, at 21:38, lk_spark <lk_spark@163.com> wrote:


hi,all:
    I want't to generate some test data , which contained about one hundred million rows .
    I create a dataset have ten rows ,and I do df.union operation in 'for' circulation , but
this will case the operation only happen on driver node.
    how can I do it on the whole cluster.

2018-12-14


lk_spark 
Mime
View raw message