spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Stewart <paul.stew...@imperva.com>
Subject Spark 2.0 Encoder().schema() is sorting StructFields
Date Wed, 12 Oct 2016 14:56:17 GMT
Hi all,

I am using Spark 2.0 to read a CSV file into a Dataset in Java.  This works fine if i define
the StructType with the StructField array ordered by hand.  What I would like to do is use
a bean class for both the schema and Dataset row type.  For example,

Dataset<Bean> beanDS = spark.read().schema( Encoders.bean(Bean.class).schema()).as(Encoders.bean(Bean.class));

When using the Encoder(Bean.class).schema() method to generate the StructType array
of StructFields the class attributes are returned as a sorted list and not
in the defined order within the Bean.class.  This makes the schema unusable
for reading from a CSV file where the ordering of the attributes is
significant.

Is there anyway to cause the Encoder().schema() method to return the array
of StructFields in the original bean class definition?  (Aside from prefix every attribute
name to maintain order)

Would this be considered a bug/enhancement?

Regards,
Paul


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message