spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cesar Flores <ces...@gmail.com>
Subject Dataframe random permutation?
Date Mon, 01 Jun 2015 19:49:32 GMT
I would like to know what will be the best approach to randomly permute a
Data Frame. I have tried:

    df.sample(false,1.0,x).show(100)

where x is the seed. However, it gives the same result no matter the value
of x (it only gives different values when the fraction is smaller than 1.0)
. I have tried also:

    hc.createDataFrame(df.rdd.repartition(100),df.schema)

which appears to be a random permutation. Can some one confirm me that the
last line is in fact a random permutation, or point me out to a better
approach?


Thanks!!!!
-- 
Cesar Flores

Mime
View raw message