sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Vesse <rve...@dotnetrdf.org>
Subject Implementing a new connector
Date Thu, 22 May 2014 11:11:57 GMT
Hey all

I'm looking to get a better understanding of exactly what is involved in
implemented a new connector for Sqoop 2.  I've read through the
documentation at 
http://sqoop.apache.org/docs/1.99.3/ConnectorDevelopment.html but it seems a
little light on detail in places so I'd appreciate if people could fill in
the gaps in my understanding or share their own experiences of creating a
connector.

Firstly I'd like to understand whether you must implement both Importer and
Exporter or whether you can just do one?  The connector I'm interested in
developing would initially be intended for use only as an output target I.e.
taking data from relational databases using existing connectors and then
outputting them in a suitable format for the databases I'm looking to
support.

Whether both are needed or not the documentation makes reference to
transforming to and from an intermediate format which is discussed on the
wiki at 
https://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Intermediate+repres
entation - has the project actually made a decision on what the intermediate
format looks like?  Is is the CSV style format described on that page?

How exactly is portioning expected to work particularly with regards to the
relationship to the Extractor?  The documentation says that a partitioner
creates the partitions and then the extractor gets passed the partition to
process.  I assume Partition can be defined fairly freely (other than the
need to be Writable and toString()-able) as the needs of a connector
dictate.

The documentation glosses over ConnectionConfiguration (some of the sections
are empty) but I assume this is the class I would use to pass in connection
configuration and also whatever mapping rules are necessary to translate the
data to my target format.  Can I safely sub-class ConnectionConfiguration or
are there other pre-defined mechanisms for passing connection specific
configuration?

Thanks for putting up with so many questions,

Regards,

Rob



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message