spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Feature request: split dataset based on condition
Date Sat, 02 Feb 2019 14:29:40 GMT
I think the problem is that can't produce multiple Datasets from one source
in one operation - consider that reproducing one of them would mean
reproducing all of them. You can write a method that would do the filtering
multiple times but it wouldn't be faster. What do you have in mind that's
different?

On Sat, Feb 2, 2019 at 12:19 AM Moein Hosseini <moein7tl@gmail.com> wrote:

> I've seen many application need to split dataset to multiple datasets
> based on some conditions. As there is no method to do it in one place,
> developers use *filter *method multiple times. I think it can be useful
> to have method to split dataset based on condition in one iteration,
> something like *partition* method of scala (of-course scala partition
> just split list into two list, but something more general can be more
> useful).
> If you think it can be helpful, I can create Jira issue and work on it to
> send PR.
>
> Best Regards
> Moein
>
> --
>
> Moein Hosseini
> Data Engineer
> mobile: +98 912 468 1859 <+98+912+468+1859>
> site: www.moein.xyz
> email: moein7tl@gmail.com
> [image: linkedin] <https://www.linkedin.com/in/moeinhm>
> [image: twitter] <https://twitter.com/moein7tl>
>
>

Mime
View raw message