kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nabeelah Harris <nabeelah.har...@impact.com>
Subject Re: Check existing range partitions using the Java API
Date Wed, 06 Mar 2019 08:57:14 GMT
Hi Adar

Thanks

Option 1 isn't really viable, since we're running Cloudera with Kudu 1.7,
thus using the 1.7 client libraries. Option 2 seems to be the way to go,
though since I am using KuduContext, I'm not sure that there is a clean way
for me to check for errors row by row. Based on naively wrapping my
kukuContext.upsert call in a try...catch, and running an alterTable if a
SparkException is caught - I'm able to catch the SparkException that occurs
with 'java.lang.RuntimeException: failed to write 1 rows from DataFrame to
Kudu; sample errors: Not found: non-covered range' on the tasks, but of
course I still end up with a bunch of failed tasks, and the partition is
only added once all my tasks have failed.

Do you perhaps have some guidance in this regard?

On Wed, Mar 6, 2019 at 7:58 AM Adar Lieber-Dembo <adar@cloudera.com> wrote:

> Here are some other options:
> 1. Use the new KuduPartitioner class, available in master but not yet
> in any releases. Given a PartialRow (i.e. a row to be inserted), you
> can find its "partition index" and, more importantly for your use
> case, receive an exception if no partition exists for the row.
> 2. Insert the data anyway, and rely on per-row errors to tell you that
> a partition is missing. This is a more "optimistic" approach, but a
> somewhat expensive one at that.
>
> Would either of these work for you?
>
> On Tue, Mar 5, 2019 at 6:33 AM Nabeelah Harris
> <nabeelah.harris@impact.com> wrote:
> >
> > Hi there
> >
> > Currently, the only method available on KuduTable to check which
> > partitions already exist is 'KuduTable.getFormattedRangePartitions'.
> > This however looks to be experimental and only intended for use by
> > Impala. Other than replicating the logic used in the above-mentioned
> > method, is there any way I can easily retrieve the range partitions
> > (or partitions at all) using the Java API? My use-case at the moment
> > is to create range partitions based on the data I am about to insert,
> > and to do so I want to first check if that range partition already
> > exists, to prevent errors.
> >
> > Thanks
> > Nabeelah
>


-- 
Nabeelah Harris
nabeelah.harris@impact.com |
https://impact.com
<https://www.linkedin.com/company/impact-martech/>
<https://www.facebook.com/ImpactMarTech/>
<https://twitter.com/impactmartech>
<https://www.youtube.com/c/impactmartech>
<https://impactgrowth.com/>

Mime
View raw message