sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Lin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-3150) issue with sqoop hive import with partitions
Date Sun, 16 Apr 2017 10:58:41 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15970334#comment-15970334

Eric Lin commented on SQOOP-3150:

Hi Ankit,

I just did some review on the issue you raised, and I noticed that the --target-dir is not
used to control where the hive table will be created, or the destination of the target partition
data will be stored. Rather, the --target-dir is used to control ONLY the data that is generated
before loading into Hive table.

For example, you specified --target-dir as "/user/hdfs/landing/staging/Hive/partitioned/EMPLOYEES",
so the data will be stored into this directory and the final Hive query that will import data
into Hive will be something like below:

LOAD DATA INPATH 'hdfs://localhost:9000/user/hdfs/landing/staging/Hive/partitioned/EMPLOYEES'
OVERWRITE INTO TABLE `employees_p` PARTITION (date='10-03-2017');

You will have no control of where the final directory that the partition goes into in Hive.

Hope that makes sense to you. So this is not a bug, but work as expected.

> issue with sqoop hive import with partitions
> --------------------------------------------
>                 Key: SQOOP-3150
>                 URL: https://issues.apache.org/jira/browse/SQOOP-3150
>             Project: Sqoop
>          Issue Type: Bug
>          Components: hive-integration
>    Affects Versions: 1.4.6
>         Environment: Cent-Os
>            Reporter: Ankit Kumar
>            Assignee: Eric Lin
>              Labels: features
> Sqoop Command:
> 	sqoop import \
> 	...
>   --hive-import  \
>   --hive-overwrite  \
>   --hive-table employees_p  \
>   --hive-partition-key date  \
>   --hive-partition-value 10-03-2017  \
>   --target-dir ..\
>   -m 1  
>   hive-table script:
>   employees_p is a partitioned table on date(string) column
>   Issue:- 
>   Case1: When  --target-dir /user/hdfs/landing/staging/Hive/partitioned/EMPLOYEES \
>   while running above sqoop command, gets an error "directory already exissts".
>   When : --target-dir /user/hdfs/landing/staging/Hive/partitioned/EMPLOYEES/anyname 
>   2. Above sqoop command creates a hive partition (date=10-03-2017) and directory as
> 	'/user/hdfs/landing/staging/Hive/partitioned/EMPLOYEES/date=10-03-2017'
> Expected Behaviour:- As in sqoop command  --hive-partition-key and  --hive-partition-value
is present, so it should auto create partioned directory inside EMPLOYEES.
> ie. '/user/hdfs/landing/staging/Hive/partitioned/EMPLOYEES/date=10-03-2017'

This message was sent by Atlassian JIRA

View raw message