spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Mingrui" <>
Subject Cannot convert from JavaRDD to Dataframe
Date Sun, 23 Apr 2017 16:13:47 GMT
Hello everyone!

I am a new Spark learner and trying to do a task seems very simple. I want to read a text
file, save the content to JavaRDD and convert it to Dataframe, so I can use it for Word2Vec
Model in the future. The code looks pretty simple but I cannot make it work:

SparkSession spark = SparkSession.builder().appName("Word2Vec").getOrCreate();
JavaRDD<String> lines = spark.sparkContext().textFile("input.txt", 10).toJavaRDD();
JavaRDD<Row> rows = Function<String, Row>(){
public Row call(String line){
return RowFactory.create(Arrays.asList(line.split(" ")));
StructType schema = new StructType(new StructField[] {
new StructField("text", new ArrayType(DataTypes.StringType, true), false, Metadata.empty())
Dataset<Row> input = spark.createDataFrame(rows, schema);;

It throws an exception at

Caused by: java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy
to field$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq
in instance of org.apache.spark.rdd.MapPartitionsRDD

Seems it has problem converting the JavaRDD<Row> to Dataframe. However I cannot figure
out what mistake I make here and the exception message is hard to understand. Anyone can help?

View raw message