[ https://issues.apache.org/jira/browse/FALCON-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815150#comment-13815150
]
Venkatesh Seetharam commented on FALCON-169:
--------------------------------------------
I cannot reproduce this issue in my environment and it beats me how this could happen. With,
FALCON-168, I made the change to add this to the base method:
{code}
private String getPathsWithPartitions(Cluster srcCluster, Cluster trgCluster,
Feed feed) throws FalconException {
String srcPart = FeedHelper.normalizePartitionExpression(
FeedHelper.getCluster(feed, srcCluster.getName()).getPartition());
srcPart = FeedHelper.evaluateClusterExp(srcCluster, srcPart);
String targetPart = FeedHelper.normalizePartitionExpression(
FeedHelper.getCluster(feed, trgCluster.getName()).getPartition());
targetPart = FeedHelper.evaluateClusterExp(trgCluster, targetPart);
StringBuilder pathsWithPartitions = new StringBuilder();
pathsWithPartitions.append("${coord:dataIn('input')}/")
.append(FeedHelper.normalizePartitionExpression(srcPart, targetPart));
String parts = pathsWithPartitions.toString().replaceAll("//+", "/");
parts = StringUtils.stripEnd(parts, "/");
return parts;
}
{code}
Generated replication action is below:
{code}
<java xmlns="uri:oozie:workflow:0.3">
<job-tracker>localhost:20300</job-tracker>
<name-node>hdfs://localhost:20020</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>default</value>
</property>
<property>
<name>oozie.launcher.mapred.job.priority</name>
<value>NORMAL</value>
</property>
</configuration>
<main-class>org.apache.falcon.replication.FeedReplicator</main-class>
<arg>-Dfalcon.include.path=hftp://localhost:10070/falcon/test/primary-cluster/customer_raw/2012-10-01-12/west-coast</arg>
<arg>-Dmapred.job.queue.name=default</arg>
<arg>-Dmapred.job.priority=NORMAL</arg>
<arg>-maxMaps</arg>
<arg>5</arg>
<arg>-sourcePaths</arg>
<arg>hftp://localhost:10070/falcon/test/primary-cluster/customer_raw/2012-10-01-12</arg>
<arg>-targetPath</arg>
<arg>hdfs://localhost:20020/localDC/rc/billing/ua1/2012-10-01-12/</arg>
<arg>-falconFeedStorageType</arg>
<arg>FILESYSTEM</arg>
<file>/apps/falcon/target-cluster-alpha/working/lib/hadoop-distcp.jar</file>
<file>hdfs://localhost:20020/apps/falcon/target-cluster-alpha/working/libext/kahadb.jar</file>
<file>hdfs://localhost:20020/apps/falcon/target-cluster-alpha/working/libext/FEED/replication/falcon-hadoop-dependencies-0.4-incubating-SNAPSHOT.jar</file>
</java>
{code}
Notice that *-Dfalcon.include.path=hftp://localhost:10070/falcon/test/primary-cluster/customer_raw/2012-10-01-12/west-coast*
does not have // or the trailing /.
> multiple "/" in target for replication for multi target feed
> -------------------------------------------------------------
>
> Key: FALCON-169
> URL: https://issues.apache.org/jira/browse/FALCON-169
> Project: Falcon
> Issue Type: Bug
> Components: replication
> Environment: QA
> Reporter: Samarth Gupta
> Assignee: Venkatesh Seetharam
>
> multiple "/" are getting appended to target dir, before concatenating partition exp postfix.
> For example while running single source multi target test, following is the value being
passed to distCp which can be viewed in tasktracker logs:
> ** for patch from FALCON-163
> {quote}
> -Dfalcon.include.path=hdfs://gs1001.grid.corp.inmobi.com:54310/localDC/rc/billing/2012/10/01/12/10//ua3
> at the bottom of logs is can be seen:
> 2013-11-05 06:33:20,219 INFO - Inclusion pattern = hdfs://gs1001.grid.corp.inmobi.com:54310/localDC/rc/billing/2012/10/01/12/10//ua3
(FilteredCopyListing:59)
> 2013-11-05 06:33:20,219 INFO - Regex pattern = (hdfs://gs1001\.grid\.corp\.inmobi\.com:54310/localDC/rc/billing/2012/10/01/12/10//ua3/)|(hdfs://gs1001\.grid\.corp\.inmobi\.com:54310/localDC/rc/billing/2012/10/01/12/10//ua3$)
(FilteredCopyListing:60)
> 2013-11-05 06:33:20,460 INFO - Number of paths considered for copy: 0 (CustomReplicator:57)
> 2013-11-05 06:33:20,461 INFO - Number of bytes considered for copy: 0 (Actual number
of bytes copied depends on whether any files are skipped or overwritten.) (CustomReplicator:58)
> 2013-11-05 06:33:21,212 INFO - DistCp job-id: job_201310290719_0445 (DistCp:146)
> 2013-11-05 06:33:21,213 INFO - DistCp job may be tracked at: http://ivoryqa-1.corp.inmobi.com:50030/jobdetails.jsp?jobid=job_201310290719_0445
(DistCp:147)
> 2013-11-05 06:33:21,213 INFO - To cancel, run the following command: hadoop job -kill
job_201310290719_0445 (DistCp:148)
> 2013-11-05 06:33:21,213 INFO - Running job: job_201310290719_0445 (JobClient:1315)
> 2013-11-05 06:33:22,216 INFO - map 0% reduce 0% (JobClient:1328)
> 2013-11-05 06:33:33,244 INFO - Job complete: job_201310290719_0445 (JobClient:1383)
> 2013-11-05 06:33:33,252 INFO - Counters: 4 (JobClient:589)
> 2013-11-05 06:33:33,252 INFO - Job Counters (JobClient:591)
> 2013-11-05 06:33:33,253 INFO - SLOTS_MILLIS_MAPS=5822 (JobClient:593)
> 2013-11-05 06:33:33,253 INFO - Total time spent by all reduces waiting after reserving
slots (ms)=0 (JobClient:593)
> 2013-11-05 06:33:33,254 INFO - Total time spent by all maps waiting after reserving
slots (ms)=0 (JobClient:593)
> 2013-11-05 06:33:33,255 INFO - SLOTS_MILLIS_REDUCES=0 (JobClient:593)
> 2013-11-05 06:33:33,307 INFO - No files present in path: hdfs://ivoryqa-1.corp.inmobi.com:8020/localDC/rc/billing/ua2/2012/10/01/12/10/ua3
(FeedReplicator:146)
> 2013-11-05 06:33:33,308 INFO - Completed DistCp (FeedReplicator:77)
> {quote}
> where as if same is run on the current code from trunk, following are the values in task
tracker:
> {quote}
> -Dfalcon.include.path=hdfs://gs1001.grid.corp.inmobi.com:54310/localDC/rc/billing/2012/10/01/12/10/ua3
> {quote}
> and replication is successful .....
--
This message was sent by Atlassian JIRA
(v6.1#6144)
|