falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venkatesh Seetharam (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-169) multiple "/" in target for replication for multi target feed
Date Wed, 06 Nov 2013 18:58:17 GMT

    [ https://issues.apache.org/jira/browse/FALCON-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815150#comment-13815150
] 

Venkatesh Seetharam commented on FALCON-169:
--------------------------------------------

I cannot reproduce this issue in my environment and it beats me how this could happen. With,
FALCON-168, I made the change to add this to the base method:

{code}
        private String getPathsWithPartitions(Cluster srcCluster, Cluster trgCluster,
                                              Feed feed) throws FalconException {
            String srcPart = FeedHelper.normalizePartitionExpression(
                    FeedHelper.getCluster(feed, srcCluster.getName()).getPartition());
            srcPart = FeedHelper.evaluateClusterExp(srcCluster, srcPart);

            String targetPart = FeedHelper.normalizePartitionExpression(
                    FeedHelper.getCluster(feed, trgCluster.getName()).getPartition());
            targetPart = FeedHelper.evaluateClusterExp(trgCluster, targetPart);

            StringBuilder pathsWithPartitions = new StringBuilder();
            pathsWithPartitions.append("${coord:dataIn('input')}/")
                    .append(FeedHelper.normalizePartitionExpression(srcPart, targetPart));

            String parts = pathsWithPartitions.toString().replaceAll("//+", "/");
            parts = StringUtils.stripEnd(parts, "/");
            return parts;
        }
{code}

Generated replication action is below:
{code}
<java xmlns="uri:oozie:workflow:0.3">
  <job-tracker>localhost:20300</job-tracker>
  <name-node>hdfs://localhost:20020</name-node>
  <configuration>
    <property>
      <name>mapred.job.queue.name</name>
      <value>default</value>
    </property>
    <property>
      <name>oozie.launcher.mapred.job.priority</name>
      <value>NORMAL</value>
    </property>
  </configuration>
  <main-class>org.apache.falcon.replication.FeedReplicator</main-class>
  <arg>-Dfalcon.include.path=hftp://localhost:10070/falcon/test/primary-cluster/customer_raw/2012-10-01-12/west-coast</arg>
  <arg>-Dmapred.job.queue.name=default</arg>
  <arg>-Dmapred.job.priority=NORMAL</arg>
  <arg>-maxMaps</arg>
  <arg>5</arg>
  <arg>-sourcePaths</arg>
  <arg>hftp://localhost:10070/falcon/test/primary-cluster/customer_raw/2012-10-01-12</arg>
  <arg>-targetPath</arg>
  <arg>hdfs://localhost:20020/localDC/rc/billing/ua1/2012-10-01-12/</arg>
  <arg>-falconFeedStorageType</arg>
  <arg>FILESYSTEM</arg>
  <file>/apps/falcon/target-cluster-alpha/working/lib/hadoop-distcp.jar</file>
  <file>hdfs://localhost:20020/apps/falcon/target-cluster-alpha/working/libext/kahadb.jar</file>
  <file>hdfs://localhost:20020/apps/falcon/target-cluster-alpha/working/libext/FEED/replication/falcon-hadoop-dependencies-0.4-incubating-SNAPSHOT.jar</file>
</java>
{code}

Notice that *-Dfalcon.include.path=hftp://localhost:10070/falcon/test/primary-cluster/customer_raw/2012-10-01-12/west-coast*
does not have // or the trailing /.

> multiple "/" in target for replication for multi target feed 
> -------------------------------------------------------------
>
>                 Key: FALCON-169
>                 URL: https://issues.apache.org/jira/browse/FALCON-169
>             Project: Falcon
>          Issue Type: Bug
>          Components: replication
>         Environment: QA
>            Reporter: Samarth Gupta
>            Assignee: Venkatesh Seetharam
>
> multiple "/" are getting appended to target dir, before concatenating partition exp postfix.

> For example while running single source multi target test, following is the value being
passed to distCp which can be viewed in tasktracker logs: 
> ** for patch from FALCON-163
> {quote} 
> -Dfalcon.include.path=hdfs://gs1001.grid.corp.inmobi.com:54310/localDC/rc/billing/2012/10/01/12/10//ua3
> at the bottom of logs is can be seen:
> 2013-11-05 06:33:20,219 INFO  - Inclusion pattern = hdfs://gs1001.grid.corp.inmobi.com:54310/localDC/rc/billing/2012/10/01/12/10//ua3
(FilteredCopyListing:59)
> 2013-11-05 06:33:20,219 INFO  - Regex pattern = (hdfs://gs1001\.grid\.corp\.inmobi\.com:54310/localDC/rc/billing/2012/10/01/12/10//ua3/)|(hdfs://gs1001\.grid\.corp\.inmobi\.com:54310/localDC/rc/billing/2012/10/01/12/10//ua3$)
(FilteredCopyListing:60)
> 2013-11-05 06:33:20,460 INFO  - Number of paths considered for copy: 0 (CustomReplicator:57)
> 2013-11-05 06:33:20,461 INFO  - Number of bytes considered for copy: 0 (Actual number
of bytes copied depends on whether any files are skipped or overwritten.) (CustomReplicator:58)
> 2013-11-05 06:33:21,212 INFO  - DistCp job-id: job_201310290719_0445 (DistCp:146)
> 2013-11-05 06:33:21,213 INFO  - DistCp job may be tracked at: http://ivoryqa-1.corp.inmobi.com:50030/jobdetails.jsp?jobid=job_201310290719_0445
(DistCp:147)
> 2013-11-05 06:33:21,213 INFO  - To cancel, run the following command:	hadoop job -kill
job_201310290719_0445 (DistCp:148)
> 2013-11-05 06:33:21,213 INFO  - Running job: job_201310290719_0445 (JobClient:1315)
> 2013-11-05 06:33:22,216 INFO  -  map 0% reduce 0% (JobClient:1328)
> 2013-11-05 06:33:33,244 INFO  - Job complete: job_201310290719_0445 (JobClient:1383)
> 2013-11-05 06:33:33,252 INFO  - Counters: 4 (JobClient:589)
> 2013-11-05 06:33:33,252 INFO  -   Job Counters  (JobClient:591)
> 2013-11-05 06:33:33,253 INFO  -     SLOTS_MILLIS_MAPS=5822 (JobClient:593)
> 2013-11-05 06:33:33,253 INFO  -     Total time spent by all reduces waiting after reserving
slots (ms)=0 (JobClient:593)
> 2013-11-05 06:33:33,254 INFO  -     Total time spent by all maps waiting after reserving
slots (ms)=0 (JobClient:593)
> 2013-11-05 06:33:33,255 INFO  -     SLOTS_MILLIS_REDUCES=0 (JobClient:593)
> 2013-11-05 06:33:33,307 INFO  - No files present in path: hdfs://ivoryqa-1.corp.inmobi.com:8020/localDC/rc/billing/ua2/2012/10/01/12/10/ua3
(FeedReplicator:146)
> 2013-11-05 06:33:33,308 INFO  - Completed DistCp (FeedReplicator:77)
> {quote}
> where as if same is run on the current code from trunk, following are the values in task
tracker: 
> {quote}
> -Dfalcon.include.path=hdfs://gs1001.grid.corp.inmobi.com:54310/localDC/rc/billing/2012/10/01/12/10/ua3
> {quote}
> and replication is successful ..... 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message