kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Otto <ao...@wikimedia.org>
Subject Re: integrate Camus and Hive?
Date Wed, 11 Mar 2015 14:24:59 GMT
> Hive provides the ability to provide custom patterns for partitions. You
> can use this in combination with MSCK REPAIR TABLE to automatically detect
> and load the partitions into the metastore.

I tried this yesterday, and as far as I can tell it doesn’t work with a custom partition
layout.  At least not with external tables.  MSCK REPAIR TABLE reports that there are directories
in the table’s location that are not partitions of the table, but it wouldn’t actually
add the partition unless the directory layout matched Hive’s default (key1=value1/key2=value2,
etc.)



> On Mar 9, 2015, at 17:16, Pradeep Gollakota <pradeepg26@gmail.com> wrote:
> 
> If I understood your question correctly, you want to be able to read the
> output of Camus in Hive and be able to know partition values. If my
> understanding is right, you can do so by using the following.
> 
> Hive provides the ability to provide custom patterns for partitions. You
> can use this in combination with MSCK REPAIR TABLE to automatically detect
> and load the partitions into the metastore.
> 
> Take a look at this SO
> http://stackoverflow.com/questions/24289571/hive-0-13-external-table-dynamic-partitioning-custom-pattern
> 
> Does that help?
> 
> 
> On Mon, Mar 9, 2015 at 1:42 PM, Yang <teddyyyy123@gmail.com> wrote:
> 
>> I believe many users like us would export the output from camus as a hive
>> external table. but the dir structure of camus is like
>> /YYYY/MM/DD/xxxxxx
>> 
>> while hive generally expects /year=YYYY/month=MM/day=DD/xxxxxx if you
>> define that table to be
>> partitioned by (year, month, day). otherwise you'd have to add those
>> partitions created by camus through a separate command. but in the latter
>> case, would a camus job create >1 partitions ? how would we find out the
>> YYYY/MM/DD values from outside ? ---- well you could always do something by
>> hadoop dfs -ls and then grep the output, but it's kind of not clean....
>> 
>> 
>> thanks
>> yang
>> 


Mime
View raw message