spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From max square <max2subscr...@gmail.com>
Subject Re: Getting a TreeNode Exception while saving into Hadoop
Date Mon, 08 Aug 2016 21:07:54 GMT
Here's the complete stacktrace -
https://gist.github.com/rohann/649b0fcc9d5062ef792eddebf5a315c1

For reference, here's the complete function -

  def saveAsLatest(df: DataFrame, fileSystem: FileSystem, bakDir: String) =
{

    fileSystem.rename(new Path(bakDir + latest), new Path(bakDir + "/" +
ScalaUtil.currentDateTimeString))

    df.write.format("com.databricks.spark.csv").options(Map("mode" ->
"DROPMALFORMED", "delimiter" -> "\t", "header" -> "true")).save(bakDir +
latest)

  }

On Mon, Aug 8, 2016 at 3:41 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> Mind showing the complete stack trace ?
>
> Thanks
>
> On Mon, Aug 8, 2016 at 12:30 PM, max square <max2subscribe@gmail.com>
> wrote:
>
>> Thanks Ted for the prompt reply.
>>
>> There are three or four DFs that are coming from various sources and I'm
>> doing a unionAll on them.
>>
>> val placesProcessed = placesUnchanged.unionAll(placesAddedWithMerchantId
>> ).unionAll(placesUpdatedFromHotelsWithMerchantId).unionAll(placesUpdat
>> edFromRestaurantsWithMerchantId).unionAll(placesChanged)
>>
>> I'm using Spark 1.6.2.
>>
>> On Mon, Aug 8, 2016 at 3:11 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>
>>> Can you show the code snippet for unionAll operation ?
>>>
>>> Which Spark release do you use ?
>>>
>>> BTW please use user@spark.apache.org in the future.
>>>
>>> On Mon, Aug 8, 2016 at 11:47 AM, max square <max2subscribe@gmail.com>
>>> wrote:
>>>
>>>> Hey guys,
>>>>
>>>> I'm trying to save Dataframe in CSV format after performing unionAll
>>>> operations on it.
>>>> But I get this exception -
>>>>
>>>> Exception in thread "main" org.apache.spark.sql.catalyst.
>>>> errors.package$TreeNodeException: execute, tree:
>>>> TungstenExchange hashpartitioning(mId#430,200)
>>>>
>>>> I'm saving it by
>>>>
>>>> df.write.format("com.databricks.spark.csv").options(Map("mode" ->
>>>> "DROPMALFORMED", "delimiter" -> "\t", "header" -> "true")).save(bakDir
>>>> + latest)
>>>>
>>>> It works perfectly if I don't do the unionAll operation.
>>>> I see that the format isn't different by printing the part of the
>>>> results.
>>>>
>>>> Any help regarding this would be appreciated.
>>>>
>>>>
>>>
>>
>

Mime
View raw message