kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adar Lieber-Dembo <a...@cloudera.com>
Subject Re: Check existing range partitions using the Java API
Date Wed, 06 Mar 2019 09:07:20 GMT
FWIW, you can use a newer Kudu client with an older server as we take care
to preserve backwards compatibility. The decoupling of client and server
artifacts sort of makes sense anyway, because the server artifacts are
found on the cluster nodes and the client artifacts are typically
distributed along with the application.

In any case, I agree that I don't see an obvious way to get at the
underlying per-row errors if you're using the KuduContext. Maybe someone
more familiar with the Kudu Spark bindings can chime in with suggestions.

On Wed, Mar 6, 2019 at 12:57 AM Nabeelah Harris <nabeelah.harris@impact.com>
wrote:

> Hi Adar
>
> Thanks
>
> Option 1 isn't really viable, since we're running Cloudera with Kudu 1.7,
> thus using the 1.7 client libraries. Option 2 seems to be the way to go,
> though since I am using KuduContext, I'm not sure that there is a clean way
> for me to check for errors row by row. Based on naively wrapping my
> kukuContext.upsert call in a try...catch, and running an alterTable if a
> SparkException is caught - I'm able to catch the SparkException that occurs
> with 'java.lang.RuntimeException: failed to write 1 rows from DataFrame to
> Kudu; sample errors: Not found: non-covered range' on the tasks, but of
> course I still end up with a bunch of failed tasks, and the partition is
> only added once all my tasks have failed.
>
> Do you perhaps have some guidance in this regard?
>
> On Wed, Mar 6, 2019 at 7:58 AM Adar Lieber-Dembo <adar@cloudera.com>
> wrote:
>
>> Here are some other options:
>> 1. Use the new KuduPartitioner class, available in master but not yet
>> in any releases. Given a PartialRow (i.e. a row to be inserted), you
>> can find its "partition index" and, more importantly for your use
>> case, receive an exception if no partition exists for the row.
>> 2. Insert the data anyway, and rely on per-row errors to tell you that
>> a partition is missing. This is a more "optimistic" approach, but a
>> somewhat expensive one at that.
>>
>> Would either of these work for you?
>>
>> On Tue, Mar 5, 2019 at 6:33 AM Nabeelah Harris
>> <nabeelah.harris@impact.com> wrote:
>> >
>> > Hi there
>> >
>> > Currently, the only method available on KuduTable to check which
>> > partitions already exist is 'KuduTable.getFormattedRangePartitions'.
>> > This however looks to be experimental and only intended for use by
>> > Impala. Other than replicating the logic used in the above-mentioned
>> > method, is there any way I can easily retrieve the range partitions
>> > (or partitions at all) using the Java API? My use-case at the moment
>> > is to create range partitions based on the data I am about to insert,
>> > and to do so I want to first check if that range partition already
>> > exists, to prevent errors.
>> >
>> > Thanks
>> > Nabeelah
>>
>
>
> --
> Nabeelah Harris
> nabeelah.harris@impact.com |
> https://impact.com
> <https://www.linkedin.com/company/impact-martech/>
> <https://www.facebook.com/ImpactMarTech/>
> <https://twitter.com/impactmartech>
> <https://www.youtube.com/c/impactmartech>
> <https://impactgrowth.com/>
>

Mime
View raw message