sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gwen Shapira (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SQOOP-1277) Import not splitted when using --boundary-query
Date Wed, 06 Aug 2014 03:58:12 GMT

     [ https://issues.apache.org/jira/browse/SQOOP-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gwen Shapira reassigned SQOOP-1277:
-----------------------------------

    Assignee: Gwen Shapira

> Import not splitted when using --boundary-query
> -----------------------------------------------
>
>                 Key: SQOOP-1277
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1277
>             Project: Sqoop
>          Issue Type: Bug
>          Components: hive-integration
>    Affects Versions: 1.4.4
>         Environment: Amazon AWS
>            Reporter: Porati S├ębastien
>            Assignee: Gwen Shapira
>            Priority: Critical
>
> I try to import Mysql Data into a hive table. I would like to use a custom boundary query.
Results : sqoop does not split the load into multiple query and the import takes too long
time.
> My creation command :
> {code:none}
> sqoop job -Dsqoop.metastore.client.record.password=true \
>     --create importJobName -- import \
>     --connect jdbc:mysql://some_jdbc_pram \
>     --username user_name \
>     --password MyPassword \
>     --table my_table \
>     --columns "collect_id,collected_data_id,value" \
>     --boundary-query "SELECT min_value, max_value FROM sqoop_boundaries WHERE key_name
= 'key.name'" \
>     --split-by column_name \
>     --num-mappers X \
>     --hive-import \
>     --hive-overwrite \
>     --hive-table hivedb.hibetable --as-textfile --null-string \\\\N --null-non-string
\\\\N
> {code}
>     
> The following message is displayed :
> {code:none}
> WARN db.DataDrivenDBInputFormat: Could not find $CONDITIONS token in query: SELECT min_value,
max_value FROM sqoop_boundaries WHERE key_name = 'key.name'; splits may not partition data.
> {code}
> I tried to add the $CONDITION to the creation command
> {code:none}
> --boundary-query "SELECT min_value, max_value FROM sqoop_boundaries WHERE key_name =
'key.name' AND \$CONDITION" \
> {code}
> But the job execution failed:
> {code:none}
> INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT min_value, max_value FROM
sqoop_boundaries WHERE key_name = 'key.name' AND $CONDITIONS
> INFO mapred.JobClient: Cleaning up the staging area hdfs://10.34.140.108:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201401311408_0025
> ERROR security.UserGroupInformation: PriviledgedActionException as:hadoop cause:java.io.IOException:
com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Unknown column '$CONDITIONS' in 'where
clause'
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message