sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Paris <nicolas.pa...@riseup.net>
Subject Export PostgreSQL Direct
Date Sat, 10 Nov 2018 16:10:01 GMT
Hi

I guess have spotted some minor error in the sqoop documentation:
[1] : in the table, it says direct mode is enable for postgres only for
import. That's wrong, export too is enabled.
[2] : the psql is not needed for both import/export

----------
Now my questions:

I have been able to load data from all formats (including orc) to
postgresql with sqoop export in **no direct** mode. While robust, it
uses the jdbc insert prepared statement and it is way too slow, even
parallelized.

I have been able to load data from **csv only** format with sqoop export
in **direct mode**. While very fast (parallel copy statements!),  the
method is not robust in case the data do have varchar columns.  In
particular a varchar column may contain **newlines** and this breaks the
mapper job, since it splits the csv by newlines.
That's too bad, because the *copy* statement can handle *newlined csv*.

1) Is there any way to only send a whole hdfs file per mapper instead of
splitting them ? That would work well.

2) Any plan to allow sqoop export from orc in direct mode ?


Thanks,


[1]: https://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html#_supported_databases
[2]: https://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html#_requirements_2

-- 
nicolas

Mime
View raw message