spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammad Tariq <donta...@gmail.com>
Subject Re: Does DataFrame.collect() maintain the underlying schema?
Date Wed, 02 Mar 2016 23:40:01 GMT
Hi Sainath,

Thank you for the prompt response!

Could you please elaborate your answer a bit? I'm sorry I didn't quite get
this. What kind of operation I can perform using SQLContext? It just helps
us during things like DF creation, schema application etc, IMHO.



[image: http://]

Tariq, Mohammad
about.me/mti
[image: http://]
<http://about.me/mti>


On Thu, Mar 3, 2016 at 4:59 AM, Sainath Palla <pallasainath@gmail.com>
wrote:

> Instead of collecting the data frame, you can try using a sqlContext on
> the data frame. But it depends on what kind of operations are you trying to
> perform.
>
> On Wed, Mar 2, 2016 at 6:21 PM, Mohammad Tariq <dontariq@gmail.com> wrote:
>
>> Hi list,
>>
>> *Scenario :*
>> I am creating a DStream by reading an Avro object from a Kafka topic and
>> then converting it into a DataFrame to perform some operations on the data.
>> I call DataFrame.collect() and perform the intended operation on each Row
>> of Array[Row] returned by DataFrame.collect().
>>
>> *Problem : *
>> Calling DataFrame.collect() changes the schema of the underlying record,
>> thus making it impossible to get the columns by index(as the order gets
>> changed).
>>
>> *Query :*
>> Is it the way DataFrame.collect() behaves or am I doing something wrong
>> here? In former case is there any way I can maintain the schema while
>> getting each Row?
>>
>> Any pointers/suggestions would be really helpful. Many thanks!
>>
>>
>> [image: http://]
>>
>> Tariq, Mohammad
>> about.me/mti
>> [image: http://]
>> <http://about.me/mti>
>>
>>
>
>

Mime
View raw message