sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <philippe.gib...@orange.com>
Subject sqoop import from mysql and files not compressed problem
Date Wed, 22 Apr 2015 09:33:13 GMT
I try to sqoop import from mysql  and compress the stored  files on hive -->  HDFS but
I do not succeded  :(

Any idea ?
 SW version Sqoop  on HDP 2.2

The command
sqoop import -Dmapreduce.output.fileoutputformat.compress=true -Dmapreduce.output.fileoutputformat.compress.type=BLOCK
--verbose --connect jdbc:mysql://xxxxx/my_db --username sqoop --password sqoop  --table indicators
--hcatalog-database omy_db  --hcatalog-table indicators --hcatalog-storage-stanza "STORED
AS ORC TBLPROPERTIES ('orc.compress'='ZLIB')"  -m 4  --create-hcatalog-table

The files stay uncompressed on the HDFS :
hadoop fs -ls   /apps/hive/warehouse/my_db.db/indicators
Found 4 items
-rw-r--r--   3 hive hdfs       1032 2015-04-22 09:09 /apps/hive/warehouse/my_db.db/indicators/part-m-00000
-rw-r--r--   3 hive hdfs        848 2015-04-22 09:09 /apps/hive/warehouse/mydb.db/indicators/part-m-00001
-rw-r--r--   3 hive hdfs       1192 2015-04-22 09:09 /apps/hive/warehouse/my_db.db/indicators/part-m-00002
-rw-r--r--   3 hive hdfs        999 2015-04-22 09:09 /apps/hive/warehouse/my_db.db/indicators/part-m-00003

I have checked carefully .. they are not compressed ..:-(

Hive desc seems OK  for zlib  compression -->
hive> desc formatted  indicators;
# col_name              data_type               comment
id_indicators           bigint
cd_indicators           varchar(32)
lb_indicators           varchar(64)
descrip_indicators      varchar(255)
# Detailed Table Information
Database:               my_db
Owner:                  hive
CreateTime:             Wed Apr 22 09:08:22 UTC 2015
LastAccessTime:         UNKNOWN
Protect Mode:           None
Retention:              0
Location:               hdfs://xxxxx:8020/apps/hive/warehouse/my_db.db/indicators
Table Type:             MANAGED_TABLE
Table Parameters:
        orc.compress            ZLIB
        transient_lastDdlTime   1429693702

# Storage Information
SerDe Library:          org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:            org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:
        serialization.format    1
Time taken: 0.742 seconds, Fetched: 30 row(s)

[Description : Description : Description : cid:image002.gif@01CDFAED.D49218F0]<http://www.orange.com/>
Philippe Gibert
Ingénieur R&D
tél. +33 4 92 94 53 70
mob. +33 6 73 41 11 18


Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees
et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par
erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant
susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may
be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message
and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed
or falsified.
Thank you.

View raw message