sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Preethi Krishnan <pkrish...@pandora.com>
Subject Re: Parquet Format Sqoop
Date Mon, 25 Feb 2019 18:31:27 GMT
Thanks for the response Markus and Grzegorz.

Grzegorz, I’m trying to copy the entire tables from postgres to Google Cloud. Did you mean
the approach should be the following?

  *   Use Sqoop to copy the data onto to Cloud in text format.
  *   Run a spark on Cloud (Hadoop cluster) that converts the data from text to parquet.

From: Grzegorz Solecki <gsolecki8@gmail.com>
Reply-To: "user@sqoop.apache.org" <user@sqoop.apache.org>
Date: Monday, February 25, 2019 at 11:23 AM
To: "user@sqoop.apache.org" <user@sqoop.apache.org>
Subject: Re: Parquet Format Sqoop

Based on our experience, it's better not to use Sqoop to create Parquet files.
Even if you manage to achieve that you create a parquet file then you will have ridiculous
data type problems when it comes to working with Hive metastore.
I recommend Spark SQL when it comes to creating Parquet files.
It works very flawlessly.

On Mon, Feb 25, 2019 at 12:54 PM Markus Kemper <markus@cloudera.com<mailto:markus@cloudera.com>>
To the best of my knowledge the only way to use Sqoop export with Parquet is via the --hcat
options, sample below

sqoop export --connect $MYSQL_CONN --username $MYSQL_USER --password $MYSQL_PSWD --table t2
--num-mappers 1 --hcatalog-database default --hcatalog-table t1_parquet_table

Markus Kemper
Cloudera Support

On Mon, Feb 25, 2019 at 12:36 PM Preethi Krishnan <pkrishnan@pandora.com<mailto:pkrishnan@pandora.com>>


I’m using the scoop Hadoop jar to scoop the data from Postgres to Google Cloud (GCS). It
is working fine for text format. But I’m unable to load it in the parquet format. It does
not fail but it does not load the data either.The jar file I’m using is sqoop-1.4.7-hadoop260.jar.

Is there any specific way I should be loading the data in parquet format using sqoop?


View raw message