sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Attila Szabó <mau...@apache.org>
Subject Re: Parquet Format Sqoop
Date Mon, 25 Feb 2019 19:36:22 GMT
Though I would caution you to use Spark to ingest Parquet files if
partitioning is important for your use case, as it's impossible to control
the number of files written onto the storage...

My2cents,
Attila

On Mon, Feb 25, 2019, 7:23 PM Grzegorz Solecki <gsolecki8@gmail.com> wrote:

> Based on our experience, it's better not to use Sqoop to create Parquet
> files.
> Even if you manage to achieve that you create a parquet file then you will
> have ridiculous data type problems when it comes to working with Hive
> metastore.
> I recommend Spark SQL when it comes to creating Parquet files.
> It works very flawlessly.
>
>
> On Mon, Feb 25, 2019 at 12:54 PM Markus Kemper <markus@cloudera.com>
> wrote:
>
>> To the best of my knowledge the only way to use Sqoop export with Parquet
>> is via the --hcat options, sample below
>>
>> sqoop export --connect $MYSQL_CONN --username $MYSQL_USER --password
>> $MYSQL_PSWD --table t2 --num-mappers 1 --hcatalog-database default
>> --hcatalog-table t1_parquet_table
>>
>>
>> Markus Kemper
>> Cloudera Support
>>
>>
>>
>>
>> On Mon, Feb 25, 2019 at 12:36 PM Preethi Krishnan <pkrishnan@pandora.com>
>> wrote:
>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> I’m using the scoop Hadoop jar to scoop the data from Postgres to Google
>>> Cloud (GCS). It is working fine for text format. But I’m unable to load it
>>> in the parquet format. It does not fail but it does not load the data
>>> either.The jar file I’m using is sqoop-1.4.7-hadoop260.jar.
>>>
>>>
>>>
>>> Is there any specific way I should be loading the data in parquet format
>>> using sqoop?
>>>
>>>
>>>
>>>
>>>
>>> Thanks
>>>
>>> Preethi
>>>
>>>
>>>
>>>
>>>
>>

Mime
View raw message