spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pranav Agrawal <pranav.mn...@gmail.com>
Subject Re: [Spark SQL] error in performing dataset union with complex data type (struct, list)
Date Mon, 04 Jun 2018 12:04:52 GMT
yes, issue is with array type only, I have confirmed that.
I exploded array<struct> to struct but still getting the same error,


*Exception in thread "main" org.apache.spark.sql.AnalysisException: Union
can only be performed on tables with the compatible column types.
struct<id:int,booking_id:int,shifting_status:int,shifting_reason:int,shifting_metadata:int>
<>
struct<id:int,booking_id:int,shifting_status:int,shifting_reason:int,shifting_metadata:int>
at the 21th column of the second table;;*

On Mon, Jun 4, 2018 at 2:55 PM, Jorge Machado <jomach@me.com> wrote:

> Have you tryed to narrow down the problem so that we can be 100% sure that
> it lies on the array types ? Just exclude them for sake of testing.
> If we know 100% that it is on this array stuff try to explode that columns
> into simple types.
>
> Jorge Machado
>
>
>
>
>
>
> On 4 Jun 2018, at 11:09, Pranav Agrawal <pranav.mnnit@gmail.com> wrote:
>
> I am ordering the columns before doing union, so I think that should not
> be an issue,
>
>
>
>
>
>
>
>
>
>
> *         String[] columns_original_order = baseDs.columns();
> String[] columns = baseDs.columns();        Arrays.sort(columns);
> baseDs=baseDs.selectExpr(columns);
> incDsForPartition=incDsForPartition.selectExpr(columns);        if
> (baseDs.count() > 0) {            return
> baseDs.union(incDsForPartition).selectExpr(columns_original_order);
> } else {            return
> incDsForPartition.selectExpr(columns_original_order);*
>
>
> On Mon, Jun 4, 2018 at 2:31 PM, Jorge Machado <jomach@me.com> wrote:
>
>> Try the same union with a dataframe without the arrays types. Could be
>> something strange there like ordering or so.
>>
>> Jorge Machado
>>
>>
>>
>>
>>
>> On 4 Jun 2018, at 10:17, Pranav Agrawal <pranav.mnnit@gmail.com> wrote:
>>
>> schema is exactly the same, not sure why it is failing though.
>>
>> root
>>  |-- booking_id: integer (nullable = true)
>>  |-- booking_rooms_room_category_id: integer (nullable = true)
>>  |-- booking_rooms_room_id: integer (nullable = true)
>>  |-- booking_source: integer (nullable = true)
>>  |-- booking_status: integer (nullable = true)
>>  |-- cancellation_reason: integer (nullable = true)
>>  |-- checkin: string (nullable = true)
>>  |-- checkout: string (nullable = true)
>>  |-- city_id: integer (nullable = true)
>>  |-- cluster_id: integer (nullable = true)
>>  |-- company_id: integer (nullable = true)
>>  |-- created_at: string (nullable = true)
>>  |-- discount: integer (nullable = true)
>>  |-- feedback_created_at: string (nullable = true)
>>  |-- feedback_id: integer (nullable = true)
>>  |-- hotel_id: integer (nullable = true)
>>  |-- hub_id: integer (nullable = true)
>>  |-- month: integer (nullable = true)
>>  |-- no_show_reason: integer (nullable = true)
>>  |-- oyo_rooms: integer (nullable = true)
>>  |-- selling_amount: integer (nullable = true)
>>  |-- shifting: array (nullable = true)
>>  |    |-- element: struct (containsNull = true)
>>  |    |    |-- id: integer (nullable = true)
>>  |    |    |-- booking_id: integer (nullable = true)
>>  |    |    |-- shifting_status: integer (nullable = true)
>>  |    |    |-- shifting_reason: integer (nullable = true)
>>  |    |    |-- shifting_metadata: integer (nullable = true)
>>  |-- suggest_oyo: integer (nullable = true)
>>  |-- tickets: array (nullable = true)
>>  |    |-- element: struct (containsNull = true)
>>  |    |    |-- ticket_source: integer (nullable = true)
>>  |    |    |-- ticket_status: string (nullable = true)
>>  |    |    |-- ticket_instance_source: integer (nullable = true)
>>  |    |    |-- ticket_category: string (nullable = true)
>>  |-- updated_at: timestamp (nullable = true)
>>  |-- year: integer (nullable = true)
>>  |-- zone_id: integer (nullable = true)
>>
>> root
>>  |-- booking_id: integer (nullable = true)
>>  |-- booking_rooms_room_category_id: integer (nullable = true)
>>  |-- booking_rooms_room_id: integer (nullable = true)
>>  |-- booking_source: integer (nullable = true)
>>  |-- booking_status: integer (nullable = true)
>>  |-- cancellation_reason: integer (nullable = true)
>>  |-- checkin: string (nullable = true)
>>  |-- checkout: string (nullable = true)
>>  |-- city_id: integer (nullable = true)
>>  |-- cluster_id: integer (nullable = true)
>>  |-- company_id: integer (nullable = true)
>>  |-- created_at: string (nullable = true)
>>  |-- discount: integer (nullable = true)
>>  |-- feedback_created_at: string (nullable = true)
>>  |-- feedback_id: integer (nullable = true)
>>  |-- hotel_id: integer (nullable = true)
>>  |-- hub_id: integer (nullable = true)
>>  |-- month: integer (nullable = true)
>>  |-- no_show_reason: integer (nullable = true)
>>  |-- oyo_rooms: integer (nullable = true)
>>  |-- selling_amount: integer (nullable = true)
>>  |-- shifting: array (nullable = true)
>>  |    |-- element: struct (containsNull = true)
>>  |    |    |-- id: integer (nullable = true)
>>  |    |    |-- booking_id: integer (nullable = true)
>>  |    |    |-- shifting_status: integer (nullable = true)
>>  |    |    |-- shifting_reason: integer (nullable = true)
>>  |    |    |-- shifting_metadata: integer (nullable = true)
>>  |-- suggest_oyo: integer (nullable = true)
>>  |-- tickets: array (nullable = true)
>>  |    |-- element: struct (containsNull = true)
>>  |    |    |-- ticket_source: integer (nullable = true)
>>  |    |    |-- ticket_status: string (nullable = true)
>>  |    |    |-- ticket_instance_source: integer (nullable = true)
>>  |    |    |-- ticket_category: string (nullable = true)
>>  |-- updated_at: timestamp (nullable = false)
>>  |-- year: integer (nullable = true)
>>  |-- zone_id: integer (nullable = true)
>>
>> On Sun, Jun 3, 2018 at 8:05 PM, Alessandro Solimando <
>> alessandro.solimando@gmail.com> wrote:
>>
>>> Hi Pranav,
>>> I don´t have an answer to your issue, but what I generally do in this
>>> cases is to first try to simplify it to a point where it is easier to check
>>> what´s going on, and then adding back ¨pieces¨ one by one until I spot the
>>> error.
>>>
>>> In your case I can suggest to:
>>>
>>> 1) project the dataset to the problematic column only (column 21 from
>>> your log)
>>> 2) use explode function to have one element of the array per line
>>> 3) flatten the struct
>>>
>>> At each step use printSchema() to double check if the types are as you
>>> expect them to be, and if they are the same for both datasets.
>>>
>>> Best regards,
>>> Alessandro
>>>
>>> On 2 June 2018 at 19:48, Pranav Agrawal <pranav.mnnit@gmail.com> wrote:
>>>
>>>> can't get around this error when performing union of two datasets
>>>> (ds1.union(ds2)) having complex data type (struct, list),
>>>>
>>>>
>>>> *18/06/02 15:12:00 INFO ApplicationMaster: Final app status: FAILED,
>>>> exitCode: 15, (reason: User class threw exception:
>>>> org.apache.spark.sql.AnalysisException: Union can only be performed on
>>>> tables with the compatible column types.
>>>> array<struct<id:int,booking_id:int,shifting_status:int,shifting_reason:int,shifting_metadata:string>>
>>>> <>
>>>> array<struct<id:int,booking_id:int,shifting_status:int,shifting_reason:int,shifting_metadata:string>>
>>>> at the 21th column of the second table;;*
>>>> As far as I can tell, they are the same. What am I doing wrong? Any
>>>> help / workaround appreciated!
>>>>
>>>> spark version: 2.2.1
>>>>
>>>> Thanks,
>>>> Pranav
>>>>
>>>
>>>
>>
>>
>
>

Mime
View raw message