sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Seetharam Venkatesh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-390) PostgreSQL connector for direct export with pg_bulkload
Date Thu, 26 Jul 2012 22:57:35 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423528#comment-13423528
] 

Seetharam Venkatesh commented on SQOOP-390:
-------------------------------------------

bq. pg_bulkload must be installed on DB server and all slave nodes.
Is there a way to use DistributedCache to distribute the pg_bulkload tar file to all the slave
nodes instead of requiring an install?
                
> PostgreSQL connector for direct export with pg_bulkload
> -------------------------------------------------------
>
>                 Key: SQOOP-390
>                 URL: https://issues.apache.org/jira/browse/SQOOP-390
>             Project: Sqoop
>          Issue Type: New Feature
>            Reporter: Masatake Iwasaki
>         Attachments: SQOOP-390-1.patch
>
>
> h1. Features
> * Fast data export into PostgreSQL database with [pg_bulkload|http://pgbulkload.projects.postgresql.org/index.html].
> * User can get benefit of functionality of pg_bulkload such as
> ** fast export bypassing shared bufferes and WAL,
> ** removing invalid data which cause parse error,
> ** ETL feature with filter functions.
> h1. Implementation
> h2. PGBulkloadExportMapper
> At first, each map tasks create their own staging table with names based on task attempt
id because pg_bulkload holds table level lock.
> Arguments of pg_bulkload command can be passed via configuration object.
> The Mapper export data by invoking pg_bulkload command as external process.
> Commnad execution is done in the same way as DirectPostgresqlManager.
> h2. PGBulkloadExportReducer
> Reducer migrates data from staging tables into destination table.
> Reducer gets the names of staging tables as map output values.
> In order to do migration in a transaction, the number of reducers must be 1. (It is set
by ConnectionManager internally).
> Migration is done in same way as Sqoop defalt connectors using "INSERT INTO dst ( SELECT
* FROM src )".
> In the cleanup method, staging tables are dropped.
> If exception is raised in the reducer, garbage staging tables are left.
> User can delete them by executing this connector with --clear-staging-table option.
> h1. Requirements
> * pg_bulkload must be installed on DB server and all slave nodes.
> ** RPM for RedHat or CentOS is available in [download page|http://pgfoundry.org/frs/?group_id=1000261].
> * [PostgreSQL JDBC| http://jdbc.postgresql.org/download.html] is also required on client
node (same as PostgresqlManager).
> * Superuser role of PostgreSQL database is required for execution of pg_bulkload.
> h1. Usage
> Currently there is no Factory class.
> Specify connection manager class name with --connection-manager option to use.
>   
> {noformat}
> $ sqoop export --connect jdbc:postgresql://localhost:5432/test \
>                --connection-manager com.cloudera.sqoop.manager.PGBulkloadManager \
>                --table test --username postgres --export-dir=/test -m 1
> {noformat}
> You can also specify pg_bulkload configuration with Hadoop configuration properties.
> {noformat}
> $ sqoop export \
>     -Dpgbulkload.bin="/usr/local/bin/pg_bulkload" \
>     -Dpgbulkload.input.field.delim=$'\t' \
>     -Dpgbulkload.check.constraints="YES" \
>     -Dpgbulkload.parse.errors="INFINITE" \
>     -Dpgbulkload.duplicate.errors="INFINITE" \
>     --connect jdbc:postgresql://localhost:5432/test \
>     --connection-manager com.cloudera.sqoop.manager.PGBulkloadManager \
>     --table test --username postgres --export-dir=/test -m 1
> {noformat}
> h1. Test
> There is test class named PGBulkloadManagerManualTest extending TestExport.
> This test expects that
> * The major version of PostgreSQL is 9.0,
> * The version of pg_bulkload is 3.1.1,
> * PostgreSQL server is running on localhost:5432 of testing node,
> * there is database named sqooptest,
> * super user role named sqooptest exists,
> * no password is set for the role, or .pgpass file is created.
> Tests can be invoked as below.
> {noformat}
> # ant -Dtestcase=PGBulkloadManagerManualTest test
> {noformat}
> h2. Test on CentOS 6
> Install JDK and Ant:
> {noformat}
> # rpm -ivh jdk-6u29-linux-amd64.rpm
> # yum install ant-junit
> {noformat}
> Setup for PostgreSQL:
> {noformat}
> # wget http://ftp.postgresql.org/pub/source/v9.0.7/postgresql-9.0.7.tar.bz2
> # tar jxf postgresql-9.0.7.tar.bz2
> # cd postgresql-9.0.7
> # ./configure --prefix=/usr/local
> # make
> # make install
> # useradd postgres
> # mkdir /var/pgdata
> # chown postgres:postgres /var/pgdata
> # chmod 700 /var/pgdata
> # sudo -u postgres /usr/local/bin/initdb  -D /var/pgdata -E utf-8
> # sudo -u postgres /usr/local/bin/pg_ctl start -D /var/pgdata -l /var/pgdata/postgresql.log
> # createuser -U postgres -s sqooptest
> # createdb -U sqooptest sqooptest
> {noformat}
> Setup for pg_bulkload and PostgreSQL JDBC Driver:
> {noformat}
> # cd ..
> # wget http://pgfoundry.org/frs/download.php/3176/pg_bulkload-3.1.1.tar.gz
> # tar zxf pg_bulkload-3.1.1.tar.gz
> # mv pg_bulkload-3.1.1 postgresql-9.0.7/contrib/
> # cd postgresql-9.0.7/contrib/pg_bulkload-3.1.1
> # make
> # make install
> # psql -U sqooptest -f /usr/local/share/postgresql/contrib/pg_bulkload.sql sqooptest
> # ln -s /usr/local/bin/pg_bulkload /usr/bin/
> {noformat}
> Setup for PostgreSQL JDBC Driver:
> {noformat}
> # wget http://jdbc.postgresql.org/download/postgresql-9.0-802.jdbc4.jar
> # cp postgresql-9.0-802.jdbc4.jar /usr/local/src/sqoop-trunk/lib/
> {noformat}
> run test.
> {noformat}
> # cd /usr/local/src/sqoop-trunk
> # ant -Dtestcase=PGBulkloadManagerManualTest test
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message