hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rekha Joshi (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-10084) Hcat alter table add parttition: add skip header/row feature
Date Tue, 05 Nov 2013 11:39:19 GMT
Rekha Joshi created HADOOP-10084:
------------------------------------

             Summary: Hcat alter table add parttition: add skip header/row feature
                 Key: HADOOP-10084
                 URL: https://issues.apache.org/jira/browse/HADOOP-10084
             Project: Hadoop Common
          Issue Type: Improvement
          Components: conf
    Affects Versions: 0.5.0
            Reporter: Rekha Joshi
            Priority: Minor


Creating hcatalog table using creating tables and alter table add partition is most used approach.However
at times the incoming files can come with header row/column names.

In such cases it would be good feature to be able skip header/rows.
Suggestions below:

hcat "alter table rawevents add partition (ds='20100819') location 'hdfs://data/rawevents/20100819/data'
-skip header"

hcat "alter table rawevents add partition (ds='20100819') location 'hdfs://data/rawevents/20100819/data'
-skip [n]"

hcat "alter table rawevents add partition (ds='20100819') location 'hdfs://data/rawevents/20100819/data'"
-DskipRow=1

-- can choose with bounded array (rows) for selecting rows for table
hcat "alter table rawevents add partition (ds='20100819') location 'hdfs://data/rawevents/20100819/data'
-rows[2:]"  // from first row till all

hcat "alter table rawevents add partition (ds='20100819') location 'hdfs://data/rawevents/20100819/data'
-rows[2:100]"  // from first row till 100 rows

Correct place for this feature in hive or hcat?or with -D can be handled in hcat?

Thanks
Rekha



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message