spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ewan Leith <ewan.le...@realitymine.com>
Subject Re: Spark 2.0.0 - Apply schema on few columns of dataset
Date Mon, 08 Aug 2016 05:56:50 GMT
Looking at the encoders api documentation at

http://spark.apache.org/docs/latest/api/java/

== Java == Encoders are specified by calling static methods on Encoders<http://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Encoders.html>.

List<String> data = Arrays.asList("abc", "abc", "xyz"); Dataset<String> ds = context.createDataset(data,
Encoders.STRING());

I think you should be calling

.as((Encoders.STRING(), Encoders.STRING()))

or similar

Ewan

On 8 Aug 2016 06:10, Aseem Bansal <asmbansal2@gmail.com> wrote:
Hi All

Has anyone done this with Java API?

On Fri, Aug 5, 2016 at 5:36 PM, Aseem Bansal <asmbansal2@gmail.com<mailto:asmbansal2@gmail.com>>
wrote:
I need to use few columns out of a csv. But as there is no option to read few columns out
of csv so
 1. I am reading the whole CSV using SparkSession.csv()
 2.  selecting few of the columns using DataFrame.select()
 3. applying schema using the .as() function of Dataset<Row>.  I tried to extent org.apache.spark.sql.Encoder
as the input for as function

But I am getting the following exception

Exception in thread "main" java.lang.RuntimeException: Only expression encoders are supported
today

So my questions are -
1. Is it possible to read few columns instead of whole CSV? I cannot change the CSV as that
is upstream data
2. How do I apply schema to few columns if I cannot write my encoder?


Mime
View raw message