spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernard Jesop <bernard.je...@gmail.com>
Subject Re: Dataset API Question
Date Wed, 25 Oct 2017 17:39:05 GMT
Actually, I realized keeping the info would not be enough as I need to find
back the checkpoint files to delete them :/

2017-10-25 19:07 GMT+02:00 Bernard Jesop <bernard.jesop@gmail.com>:

> As far as I understand, Dataset.rdd is not the same as InternalRDD.
> It is just another RDD representation of the same Dataset and is created
> on demand (lazy val) when Dataset.rdd is called.
> This totally explains the observed behavior.
>
> But how would would it be possible to know that a Dataset have been
> checkpointed?
> Should I manually keep track of that info?
>
> 2017-10-25 15:51 GMT+02:00 Bernard Jesop <bernard.jesop@gmail.com>:
>
>> Hello everyone,
>>
>> I have a question about checkpointing on dataset.
>>
>> It seems in 2.1.0 that there is a Dataset.checkpoint(), however unlike
>> RDD there is no Dataset.isCheckpointed().
>>
>> I wonder if Dataset.checkpoint is a syntactic sugar for
>> Dataset.rdd.checkpoint.
>> When I do :
>>
>> Dataset.checkpoint; Dataset.count
>> Dataset.rdd.isCheckpointed // result: false
>>
>> However, when I explicitly do:
>> Dataset.rdd.checkpoint; Dataset.rdd.count
>> Dataset.rdd.isCheckpointed // result: true
>>
>> Could someone explain this behavior to me, or provide some references?
>>
>> Best regards,
>> Bernard
>>
>
>

Mime
View raw message