spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yong Zhang <java8...@hotmail.com>
Subject Re: Java to show struct field from a Dataframe
Date Sun, 18 Dec 2016 03:28:16 GMT
Why not you just return the struct you defined, instead of an array?


            @Override
            public Row call(Double x, Double y) throws Exception {
                Row row = RowFactory.create(x, y);
                return row;
            }


________________________________
From: Richard Xin <richardxin168@yahoo.com>
Sent: Saturday, December 17, 2016 8:53 PM
To: Yong Zhang; zjp_jdev@163.com; user
Subject: Re: Java to show struct field from a Dataframe

I tried to transform
root
 |-- latitude: double (nullable = false)
 |-- longitude: double (nullable = false)
 |-- name: string (nullable = true)

to:
root
 |-- name: string (nullable = true)
 |-- location: struct (nullable = true)
 |    |-- longitude: double (nullable = true)
 |    |-- latitude: double (nullable = true)

Code snippet is as followings:

        sqlContext.udf().register("toLocation", new UDF2<Double, Double, Row>() {
            @Override
            public Row call(Double x, Double y) throws Exception {
                Row row = RowFactory.create(new double[] { x, y });
                return row;
            }
        }, DataTypes.createStructType(new StructField[] {
                new StructField("longitude", DataTypes.DoubleType, true, Metadata.empty()),
                new StructField("latitude", DataTypes.DoubleType, true, Metadata.empty())
            }));

        DataFrame transformedDf1 = citiesDF.withColumn("location",
                callUDF("toLocation", col("longitude"), col("latitude")));
        transformedDf1.drop("latitude").drop("longitude").schema().printTreeString();  //
prints schema tree OK as expected

        transformedDf.show();  // java.lang.ClassCastException: [D cannot be cast to java.lang.Double


seems to me that the ReturnType of the UDF2 might be the root cause. but not sure how to correct.

Thanks,
Richard




On Sunday, December 18, 2016 7:15 AM, Yong Zhang <java8964@hotmail.com> wrote:


"[D" type means a double array type. So this error simple means you have double[] data, but
Spark needs to cast it to Double, as your schema defined.

The error message clearly indicates the data doesn't match with  the type specified in the
schema.

I wonder how you are so sure about your data? Do you check it under other tool?

Yong


________________________________
From: Richard Xin <richardxin168@yahoo.com.INVALID>
Sent: Saturday, December 17, 2016 10:56 AM
To: zjp_jdev@163.com; user
Subject: Re: Java to show struct field from a Dataframe

data is good


On Saturday, December 17, 2016 11:50 PM, "zjp_jdev@163.com" <zjp_jdev@163.com> wrote:


I think the causation is your invanlid Double data , have u checked your data ?

________________________________
zjp_jdev@163.com

From: Richard Xin<mailto:richardxin168@yahoo.com.INVALID>
Date: 2016-12-17 23:28
To: User<mailto:user@spark.apache.org>
Subject: Java to show struct field from a Dataframe
let's say I have a DataFrame with schema of followings:
root
 |-- name: string (nullable = true)
 |-- location: struct (nullable = true)
 |    |-- longitude: double (nullable = true)
 |    |-- latitude: double (nullable = true)

df.show(); throws following exception:

java.lang.ClassCastException: [D cannot be cast to java.lang.Double
    at scala.runtime.BoxesRunTime.unboxToDouble(BoxesRunTime.java:119)
    at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getDouble(rows.scala:44)
    at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getDouble(rows.scala:221)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
Source)
....

Any advise?
Thanks in advance.
Richard





Mime
View raw message