falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Balu Vellanki (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-740) Entity kill job calls OozieClient.kill on bundle coord job ids before calling kill on bundle job id
Date Thu, 18 Sep 2014 17:40:34 GMT

    [ https://issues.apache.org/jira/browse/FALCON-740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139229#comment-14139229

Balu Vellanki commented on FALCON-740:

[~shwethags] - The bug Venkatesh referred to is an internal bug. When a falcon user tries
to delete an entity, Falcon catches and throws the  following exception being thrown by Oozie.
This causes entity delete to fail. 

2014-09-10 07:50:06,260 ERROR BundleJobChangeXCommand:540 - USER- GROUP- TOKEN[] APP- JOB0000743-140910031253668-oozie-oozi-B
ACTION- XException, 
org.apache.oozie.command.CommandException: E1320: Bundle Job change error, [[ 0000744-140910031253668-oozie-oozi-C
: Coord is in killed state ]]
at org.apache.oozie.command.bundle.BundleJobChangeXCommand.execute(BundleJobChangeXCommand.java:208)
at org.apache.oozie.command.bundle.BundleJobChangeXCommand.execute(BundleJobChangeXCommand.java:50)
at org.apache.oozie.command.XCommand.call(XCommand.java:281)
at org.apache.oozie.BundleEngine.change(BundleEngine.java:85)
at org.apache.oozie.servlet.V1JobServlet.changeBundleJob(V1JobServlet.java:585)

Bowen from Oozie team confirmed that  this is caused by Falcon killing coord_jobs of a bundle,
and then trying to change the bundle job endtime, followed by falcon killing the bundle job.
 This is caused because Oozie changed how it handles bundle change command. The related oozie
jira is https://issues.apache.org/jira/browse/OOZIE-1807

Since you confirmed that we can now remove set end time code block - I will do that, create
a patch and test it before submitting the patch.


> Entity kill job calls OozieClient.kill on bundle coord job ids before calling kill on
bundle job id
> ---------------------------------------------------------------------------------------------------
>                 Key: FALCON-740
>                 URL: https://issues.apache.org/jira/browse/FALCON-740
>             Project: Falcon
>          Issue Type: Bug
>          Components: webapp
>    Affects Versions: 0.6
>            Reporter: Balu Vellanki
>            Assignee: Balu Vellanki
> When Falcon user makes an entity kill API call, Falcon does the following in org.apache.falcon.workflow.engine.OozieWorkflowEngine.killBundle(String
clusterName, BundleJob job)
> {code}
>  //kill all coords
>             for (CoordinatorJob coord : job.getCoordinators()) {
>                 client.kill(coord.getId());
>                 LOG.debug("Killed coord {} on cluster {}", coord.getId(), clusterName);
>             }
>             //set end time of bundle
>             client.change(job.getId(), OozieClient.CHANGE_VALUE_ENDTIME + "=" + SchemaHelper.formatDateUTC(new
>             LOG.debug("Changed end time of bundle {} on cluster {}", job.getId(), clusterName);
>             //kill bundle
>             client.kill(job.getId());
>             LOG.debug("Killed bundle {} on cluster {}", job.getId(), clusterName);
> {code}
> Two questions.
> 1. Why should we kill the coordinator jobs before killing the bundle job? OozieClient.kill(bundle_job_id)
should kill all the bundle's coord jobs.
> 2. Why is the endtime changed for  bundle job? https://oozie.apache.org/docs/4.0.1/DG_CommandLineTool.html#Changing_pausetime_of_a_Bundle_Job
does not say that endtime can be changed for bundlejob. 
> I think this code should be updated, please comment if you think I made any wrong assumptions.
> Thank you

This message was sent by Atlassian JIRA

View raw message