spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammad Tariq <donta...@gmail.com>
Subject Does DataFrame.collect() maintain the underlying schema?
Date Wed, 02 Mar 2016 23:21:39 GMT
Hi list,

*Scenario :*
I am creating a DStream by reading an Avro object from a Kafka topic and
then converting it into a DataFrame to perform some operations on the data.
I call DataFrame.collect() and perform the intended operation on each Row
of Array[Row] returned by DataFrame.collect().

*Problem : *
Calling DataFrame.collect() changes the schema of the underlying record,
thus making it impossible to get the columns by index(as the order gets
changed).

*Query :*
Is it the way DataFrame.collect() behaves or am I doing something wrong
here? In former case is there any way I can maintain the schema while
getting each Row?

Any pointers/suggestions would be really helpful. Many thanks!


[image: http://]

Tariq, Mohammad
about.me/mti
[image: http://]
<http://about.me/mti>

Mime
View raw message