sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Grover (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SQOOP-2165) Can't use warehouse-dir with parquet
Date Tue, 03 Mar 2015 23:01:06 GMT

     [ https://issues.apache.org/jira/browse/SQOOP-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mark Grover updated SQOOP-2165:
-------------------------------
    Description: 
Gwen and I were working on some code for Data Warehousing that uses sqoop and we found something
interesting.

At one place:
Sqoop1 claims warehouse-dir and target-dir are incompatible:
https://github.com/apache/sqoop/blob/trunk/src/java/org/apache/sqoop/tool/ImportTool.java#L1006

(We should mention this in the docs btw)

But, then if we only put the warehouse-dir (and don't specify the target dir), it complains
that the target-dir needs to be specified. See here:
https://github.com/apache/sqoop/blob/trunk/src/java/org/apache/sqoop/tool/ImportTool.java#L1019

And, fyi, here is the query we ran:
{code}
sqoop job --create user_upserts_import --meta-connect jdbc:hsqldb:hsql://${SQOOP_METASTORE_HOST}:16000/sqoop
\
-- import --connect jdbc:mysql://<MYSQL>:3306/oltp --username root \
-m 8 --incremental append --check-column last_modified --split-by last_modified --as-parquetfile
\
--query 'SELECT user.id, user.age, user.gender,
occupation.occupation, zipcode, last_modified FROM user JOIN occupation
ON (user.occupation_id = occupation.id) WHERE $CONDITIONS' \--hive-import --hive-table user_upserts
--warehouse-dir /etl/movielens/
{code}

If we specify just the target-dir, we get a warning about writing to target-dir and the data
goes to default warehouse directory (/usr/hive...), which is pretty unexpected:

15/03/03 14:47:22 WARN util.AppendUtils: Cannot append files to target dir; no such directory:
_sqoop/03144643000000600_30456_mgrover-haa2-4.vpc.cloudera.com_f8bf8ac4

obviously the directory in the warning is not the target dir we specified... this looks like
something internal to the Kite/Parquet code.

  was:
Gwen and I were working on some code for Data Warehousing that uses sqoop and we found something
interesting.

At one place:
Sqoop1 claims warehouse-dir and target-dir are incompatible:
https://github.com/apache/sqoop/blob/trunk/src/java/org/apache/sqoop/tool/ImportTool.java#L1006

(We should mention this in the docs btw)

But, then if we only put the warehouse-dir (and don't specify the target dir), it complains
that the target-dir needs to be specified. See here:
https://github.com/apache/sqoop/blob/trunk/src/java/org/apache/sqoop/tool/ImportTool.java#L1019

And, fyi, here is the query we ran:
{code}
sqoop job --create user_upserts_import --meta-connect jdbc:hsqldb:hsql://${SQOOP_METASTORE_HOST}:16000/sqoop
\
-- import --connect jdbc:mysql://mgrover-haa-2.vpc.cloudera.com:3306/oltp --username root
\
-m 8 --incremental append --check-column last_modified --split-by last_modified --as-parquetfile
\
--query 'SELECT user.id, user.age, user.gender,
occupation.occupation, zipcode, last_modified FROM user JOIN occupation
ON (user.occupation_id = occupation.id) WHERE $CONDITIONS' \--hive-import --hive-table user_upserts
--warehouse-dir /etl/movielens/
{code}

If we specify just the target-dir, we get a warning about writing to target-dir and the data
goes to default warehouse directory (/usr/hive...), which is pretty unexpected:

15/03/03 14:47:22 WARN util.AppendUtils: Cannot append files to target dir; no such directory:
_sqoop/03144643000000600_30456_mgrover-haa2-4.vpc.cloudera.com_f8bf8ac4

obviously the directory in the warning is not the target dir we specified... this looks like
something internal to the Kite/Parquet code.


> Can't use warehouse-dir with parquet
> ------------------------------------
>
>                 Key: SQOOP-2165
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2165
>             Project: Sqoop
>          Issue Type: Bug
>          Components: hive-integration
>    Affects Versions: 1.4.5
>            Reporter: Mark Grover
>
> Gwen and I were working on some code for Data Warehousing that uses sqoop and we found
something interesting.
> At one place:
> Sqoop1 claims warehouse-dir and target-dir are incompatible:
> https://github.com/apache/sqoop/blob/trunk/src/java/org/apache/sqoop/tool/ImportTool.java#L1006
> (We should mention this in the docs btw)
> But, then if we only put the warehouse-dir (and don't specify the target dir), it complains
that the target-dir needs to be specified. See here:
> https://github.com/apache/sqoop/blob/trunk/src/java/org/apache/sqoop/tool/ImportTool.java#L1019
> And, fyi, here is the query we ran:
> {code}
> sqoop job --create user_upserts_import --meta-connect jdbc:hsqldb:hsql://${SQOOP_METASTORE_HOST}:16000/sqoop
\
> -- import --connect jdbc:mysql://<MYSQL>:3306/oltp --username root \
> -m 8 --incremental append --check-column last_modified --split-by last_modified --as-parquetfile
\
> --query 'SELECT user.id, user.age, user.gender,
> occupation.occupation, zipcode, last_modified FROM user JOIN occupation
> ON (user.occupation_id = occupation.id) WHERE $CONDITIONS' \--hive-import --hive-table
user_upserts --warehouse-dir /etl/movielens/
> {code}
> If we specify just the target-dir, we get a warning about writing to target-dir and the
data goes to default warehouse directory (/usr/hive...), which is pretty unexpected:
> 15/03/03 14:47:22 WARN util.AppendUtils: Cannot append files to target dir; no such directory:
_sqoop/03144643000000600_30456_mgrover-haa2-4.vpc.cloudera.com_f8bf8ac4
> obviously the directory in the warning is not the target dir we specified... this looks
like something internal to the Kite/Parquet code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message