sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Veena Basavaraj (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SQOOP-1803) JobManager and Execution Engine changes: Support for a injecting and pulling out configs and job output in connectors
Date Mon, 16 Mar 2015 21:56:39 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363551#comment-14363551
] 

Veena Basavaraj edited comment on SQOOP-1803 at 3/16/15 9:56 PM:
-----------------------------------------------------------------

To clarify my earlier point I made a few days ago that I do not see caught your attention
[~jarcec], Here are the details.

MutableContext today is not persisted, it allows certain types like int/ long/boolean/String.
My question was we should allow even a list/ map or any object to be stored in here. The key
- value pairs are already uniquely identified, so any config is underneath a key /value pair
and we can keep this interface to update or overwrite any of these config values. I do not
see a need for a special API for doing this. 

The only additional change is to look up this context map that the intiializer has already
set and then persist them. We can add a new property to indicate if this "transient or persistent"
value for the context so we dont end up shoving everything in this object into the repository.
Makes sense?

Second, most important point, the code I posted above in the JobManager class ..happens only
when the job has completed successfully so there is no need to worry about any synchronization
issues at this point
{code}
      RepositoryManager.getInstance().getRepository().updateJobConfig( ...)

{code}

few more details, after thinking through this more,  My thought when I first used the distributed
cache, was to do this update in the "output committer" since it is ensured to be called "once".
Similar to how the current SqoopDestroyerExecutor is invoked, we need to have the MutableContextPesistExecutor
or something along those lines that will invoke the code to persist the context into the repository.
This probably is the only point in the job flow where we are ensured to run once.  The advantage
of storing the state in hdfs/cache files would mean that we have access to this context/ state
in the JobContext object, but it should not be too hard to pass the "MutableContext" object
in this at the beginning of the job.


was (Author: vybs):
To clarify my earlier point I made a few days ago that I do not see caught your attention
[~jarcec], Here are the details.

MutableContext today is not persisted, it allows certain types like int/ long/boolean/String.
My question was we should allow even a list/ map or any object to be stored in here. The key
- value pairs are already uniquely identified, so any config is underneath a key /value pair
and we can keep this interface to update or overwrite any of these config values. I do not
see a need for a special API for doing this. 

The only additional change is to look up this context map that the intiializer has already
set and then persist them. We can add a new property to indicate if this "transient or persistent"
value for the context so we dont end up shoving everything in this object into the repository.
Makes sense?

Second, most important point, the code I posted above in the JobManager class ..happens only
when the job has completed successfully so there is no need to worry about any synchronization
issues at this point
{code}
      RepositoryManager.getInstance().getRepository().updateJobConfig( ...)

{code}

few more details, after thinking through. My thought when I first used the distributed cache,
was to do this update in the "output committer" since it is ensured to be called "once", similar
to how the current SqoopDestroyerExecutor is invoked, we need to have the MutableContextPesistExecutor
or something along those lines.


For this 

> JobManager and Execution Engine changes: Support for a injecting and pulling out configs
and job output in connectors 
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-1803
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1803
>             Project: Sqoop
>          Issue Type: Sub-task
>            Reporter: Veena Basavaraj
>            Assignee: Veena Basavaraj
>             Fix For: 1.99.6
>
>
> The details are in the design wiki, as the implementation happens more discussions can
happen here.
> https://cwiki.apache.org/confluence/display/SQOOP/Delta+Fetch+And+Merge+Design#DeltaFetchAndMergeDesign-Howtogetoutputfromconnectortosqoop?
> The goal is to dynamically inject a IncrementalConfig instance into the FromJobConfiguration.
The current MFromConfig and MToConfig can already hold a list of configs, and a strong sentiment
was expressed to keep it as a list, why not for the first time actually make use of it and
group the incremental related configs in one config object
> This task will prepare the FromJobConfiguration from the job config data, ExtractorContext
with the relevant values from the prev job run 
> This task will prepare the ToJobConfiguration from the job config data, LoaderContext
with the relevant values from the prev job run if any
> We will use DistributedCache to get State information from the Extractor and Loader out
and finally persist it into the sqoop repository depending on SQOOP-1804 once the outputcommitter
commit is called



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message