sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Jarcec Cecho <jar...@apache.org>
Subject Re: Implementing a new connector
Date Sun, 25 May 2014 15:24:44 GMT
Hi Rob,
I'm excited to hear that you're interested to start working on new Sqoop2 connector. The guide
that you've linked is indeed the correct one containing information how to build connector.
Sadly the connector interfaces are still not yet closed and hence the guide do have some rough
edges. I hope that we will close them soon.

Couple of answers:

> Firstly I'd like to understand whether you must implement both Importer and
> Exporter or whether you can just do one?

You can implement only subset. There is no need for connectors to support both import and
export path.

> transforming to and from an intermediate format which is discussed on the

This is one of the areas that is currently under development actually, check out SQOOP-777.

> How exactly is portioning expected to work particularly with regards to the
> relationship to the Extractor?  The documentation says that a partitioner

It's up to the connector. The partitioner's job is to partition the job into up to X partitions
that each will be executed by single Extractor.

Jarcec

On Thu, May 22, 2014 at 12:11:57PM +0100, Rob Vesse wrote:
> Hey all
> 
> I'm looking to get a better understanding of exactly what is involved in
> implemented a new connector for Sqoop 2.  I've read through the
> documentation at 
> http://sqoop.apache.org/docs/1.99.3/ConnectorDevelopment.html but it seems a
> little light on detail in places so I'd appreciate if people could fill in
> the gaps in my understanding or share their own experiences of creating a
> connector.
> 
> Firstly I'd like to understand whether you must implement both Importer and
> Exporter or whether you can just do one?  The connector I'm interested in
> developing would initially be intended for use only as an output target I.e.
> taking data from relational databases using existing connectors and then
> outputting them in a suitable format for the databases I'm looking to
> support.
> 
> Whether both are needed or not the documentation makes reference to
> transforming to and from an intermediate format which is discussed on the
> wiki at 
> https://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Intermediate+repres
> entation - has the project actually made a decision on what the intermediate
> format looks like?  Is is the CSV style format described on that page?
> 
> How exactly is portioning expected to work particularly with regards to the
> relationship to the Extractor?  The documentation says that a partitioner
> creates the partitions and then the extractor gets passed the partition to
> process.  I assume Partition can be defined fairly freely (other than the
> need to be Writable and toString()-able) as the needs of a connector
> dictate.
> 
> The documentation glosses over ConnectionConfiguration (some of the sections
> are empty) but I assume this is the class I would use to pass in connection
> configuration and also whatever mapping rules are necessary to translate the
> data to my target format.  Can I safely sub-class ConnectionConfiguration or
> are there other pre-defined mechanisms for passing connection specific
> configuration?
> 
> Thanks for putting up with so many questions,
> 
> Regards,
> 
> Rob
> 
> 

Mime
View raw message