sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Masatake Iwasaki" <iwasak...@nttdata.co.jp>
Subject Re: Review Request: SQOOP-390: PostgreSQL connector for direct export with pg_bulkload
Date Wed, 01 Aug 2012 09:46:41 GMT


> On July 27, 2012, 4:34 p.m., Jarek Cecho wrote:
> > /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java, lines 82-86
> > <https://reviews.apache.org/r/2724/diff/3/?file=129306#file129306line82>
> >
> >     Could add option to create those temporary tables in different database?
> 
> Masatake Iwasaki wrote:
>     As far as PostgreSQL concerned, staging across databases is inefficient because it
causes network data transfer via client (slave node). Also this change requires handling of
multiple connections and causes a lot of code modifications.  I would like to leave this as
a future improvement.
>     It may be more preferable to handle the feature connecting to multiple databases
for staging in a independent JIRA issue about Sqoop global functionality.
>
> 
> Jarek Cecho wrote:
>     I do not have strong PostgreSQL background, so please excuse me if this will be stupid
question. The way how we're doing it in other connectors for explicit temporary tables is
that we're using just one connection (to the target database specified on the command line)
and we're  using explicit database name in case that user wants data stored in different database.
Something like "create table tmp_database.tmp_table like exported_table" and "insert into
exported_table select * from tmp_database.tmp_table". Is something like this possible in PostgreSQL?

In PostgreSQL, users can use "schema" in the same way and using "tablespace" enables physical
data separation of staging table and destination table. Though default PostgresSQL has no
problem for use of schema and tablespace, pg_bulkload connector needs fix because each map
task of PGBulkloadExportJob create their own staging table on the fly. I am going to try adding
a option for it.

references for scheam and tablespace:
 http://www.postgresql.org/docs/9.0/interactive/ddl-schemas.html
 http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html


- Masatake


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2724/#review9540
-----------------------------------------------------------


On July 26, 2012, 10:41 a.m., Masatake Iwasaki wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2724/
> -----------------------------------------------------------
> 
> (Updated July 26, 2012, 10:41 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> Patch for SQOOP-390
> https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> This addresses bug SQOOP-390.
>     https://issues.apache.org/jira/browse/SQOOP-390
> 
> 
> Diffs
> -----
> 
>   /src/java/org/apache/sqoop/manager/PGBulkloadManager.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/AutoProgressReducer.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportJob.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportMapper.java PRE-CREATION 
>   /src/java/org/apache/sqoop/mapreduce/PGBulkloadExportReducer.java PRE-CREATION 
>   /src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java PRE-CREATION

> 
> Diff: https://reviews.apache.org/r/2724/diff/
> 
> 
> Testing
> -------
> 
> This patch include the test class PGBulkloadManagerTest.
> I've tested "ant test" and passed.
> 
> 
> Thanks,
> 
> Masatake Iwasaki
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message