kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhavesh Mistry <mistry.p.bhav...@gmail.com>
Subject Re: integrate Camus and Hive?
Date Wed, 11 Mar 2015 15:29:09 GMT
HI Yang,

We do this today camus to hive (without the Avro) just plain old tab
separated log line.

We use the hive -f command to add dynamic partition to hive table:

Bash Shell Scripts add time buckets into HIVE table before camus job runs:

for partition in "${@//\//,}"; do
   echo "ALTER TABLE ${env:TABLE_NAME} ADD IF NOT EXISTS PARTITION
($partition);"
done | hive -f


e.g File produce by the camus job:  /user/[hive.user]/output/
*partition_month_utc=2015-03/partition_day_utc=2015-03-11/partition_minute_bucket=2015-03-11-02-09/*

Above will add hive dynamic partition before camus job runs.  It works, and
you can have any schema:

CREATE EXTERNAL TABLE IF NOT EXISTS ${env:TABLE_NAME} (
  SOME Table FIELDS...
  )
  PARTITIONED BY (
    partition_month_utc STRING,
    partition_day_utc STRING,
    partition_minute_bucket STRING
  )
  ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
  STORED AS SEQUENCEFILE
  LOCATION '${env:TABLE_LOCATION_CAMUS_OUTPUT}'
;


I hope this will help !   You will have to construct  hive query according
to partition define.

Thanks,

Bhavesh

On Wed, Mar 11, 2015 at 7:24 AM, Andrew Otto <aotto@wikimedia.org> wrote:

> > Hive provides the ability to provide custom patterns for partitions. You
> > can use this in combination with MSCK REPAIR TABLE to automatically
> detect
> > and load the partitions into the metastore.
>
> I tried this yesterday, and as far as I can tell it doesn’t work with a
> custom partition layout.  At least not with external tables.  MSCK REPAIR
> TABLE reports that there are directories in the table’s location that are
> not partitions of the table, but it wouldn’t actually add the partition
> unless the directory layout matched Hive’s default
> (key1=value1/key2=value2, etc.)
>
>
>
> > On Mar 9, 2015, at 17:16, Pradeep Gollakota <pradeepg26@gmail.com>
> wrote:
> >
> > If I understood your question correctly, you want to be able to read the
> > output of Camus in Hive and be able to know partition values. If my
> > understanding is right, you can do so by using the following.
> >
> > Hive provides the ability to provide custom patterns for partitions. You
> > can use this in combination with MSCK REPAIR TABLE to automatically
> detect
> > and load the partitions into the metastore.
> >
> > Take a look at this SO
> >
> http://stackoverflow.com/questions/24289571/hive-0-13-external-table-dynamic-partitioning-custom-pattern
> >
> > Does that help?
> >
> >
> > On Mon, Mar 9, 2015 at 1:42 PM, Yang <teddyyyy123@gmail.com> wrote:
> >
> >> I believe many users like us would export the output from camus as a
> hive
> >> external table. but the dir structure of camus is like
> >> /YYYY/MM/DD/xxxxxx
> >>
> >> while hive generally expects /year=YYYY/month=MM/day=DD/xxxxxx if you
> >> define that table to be
> >> partitioned by (year, month, day). otherwise you'd have to add those
> >> partitions created by camus through a separate command. but in the
> latter
> >> case, would a camus job create >1 partitions ? how would we find out the
> >> YYYY/MM/DD values from outside ? ---- well you could always do
> something by
> >> hadoop dfs -ls and then grep the output, but it's kind of not clean....
> >>
> >>
> >> thanks
> >> yang
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message