falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venkatesh Seetharam (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-143) Enable Late data handling for hive tables
Date Fri, 01 Nov 2013 04:41:17 GMT

    [ https://issues.apache.org/jira/browse/FALCON-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811036#comment-13811036

Venkatesh Seetharam commented on FALCON-143:

Thanks [~sriksun] for reviewing the patches.

bq. AbstractRerunHandler, LateRerunEvent, RetryHandler: since the number of parameters is
down to 7, can the checkstyle be re-enabled ?

bq. I see two properties being added to the coord: falconFeedStorageType, falconInputFeedStorageTypes
in OozieFeedMapper. They seem to hold similar values. Are they redundant, if so can we avoid?
They are not the same and are used in different contexts. 
* falconFeedStorageType is used in FeedEvictor and FeedReplicator for a particuler feed in
* falconInputFeedStorageTypes is used in LateDataHandler - This holds the storage type for
each input feed in a given process

> Enable Late data handling for hive tables
> -----------------------------------------
>                 Key: FALCON-143
>                 URL: https://issues.apache.org/jira/browse/FALCON-143
>             Project: Falcon
>          Issue Type: Sub-task
>    Affects Versions: 0.3
>            Reporter: Venkatesh Seetharam
>            Assignee: Venkatesh Seetharam
>         Attachments: FALCON-143-r0.patch, FALCON-143.patch
> HCat nor Hive APIs expose internal stats about a given partition. The only way to get
the partition size is to get the location of the partition on HDFS and then use globStatus
and contentSummary APIs. 
> With the addition of HIVE-5317, this is going to get more complicated with deltas and
minor and major compactions with no locking.
> Need to work with hive to see if there will be an API or Falcon needs to understand the
structure of the layout of the data on the file system.

This message was sent by Atlassian JIRA

View raw message