sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Masatake Iwasaki (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SQOOP-390) PostgreSQL connector for direct export with pg_bulkload
Date Fri, 04 Nov 2011 08:43:00 GMT
PostgreSQL connector for direct export with pg_bulkload
-------------------------------------------------------

                 Key: SQOOP-390
                 URL: https://issues.apache.org/jira/browse/SQOOP-390
             Project: Sqoop
          Issue Type: New Feature
            Reporter: Masatake Iwasaki


h1. Features

* Fast data export with pg_bulkload.
  ** http://pgbulkload.projects.postgresql.org/index.html
* User can get benefit of functionality of pg_bulkload such as
** bypassing shared bufferes and WAL,
** ignoring data causing parse error,
** ETL feature with filter functions.


h1. Implementation

* Each map tasks create their own staging table with names based on task attempt id. Then
export data with pg_bulkload invoked as outer process.
* Staging tables are erased on
** Successful completion of export, 
** or Exception in map task.
* Reduce task migrate data from staging tables into destination table
** Number of reduce tasks is internally set to 1 .


h1. Requirements

* pg_bulkload must be installed on DB server and all slave nodes.
* Also PostgreSQL JDBC is required on client. (Same as PostgresqlManager)
* Superuser role of PostgreSQL database is required for pg_bulkload.


h1. Usage

Currently there is no Factory class. Specify connection manager class name with --connection-manager
option to use.
  
{noformat} 
sqoop export --connect jdbc:postgresql://localhost:5432/test \
             --connection-manager com.cloudera.sqoop.manager.PGBulkloadManager \
             --table test --username postgres --export-dir=/test -m 1
{noformat} 

You can also specify pg_bulkload configuration with Hadoop Configuration properties.

{noformat} 
-Dpgbulkload.bin="/usr/bin/pg_bulkload"
-Dpgbulkload.input.field.delim=$'\t'
-Dpgbulkload.check.constraints="YES"
-Dpgbulkload.parse.errors="INFINITE"
-Dpgbulkload.duplicate.errors="INFINITE"
{noformat} 


h1. Test

There is test class named PGBulkloadManagerTest extending TestExport.

{noformat} 
 ant -Dtestcase=PGBulkloadManagerTest test
{noformat} 

This test requires 
* PostgreSQL running on localhost:5432,
* database named sqooptest,
* super user role named sqooptest with no password, or .pgpass created


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message