sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Venkat Ranganathan <vranganat...@hortonworks.com>
Subject Re: Import Partitions from Oracle to Hive Partitions
Date Tue, 19 Aug 2014 18:09:27 GMT
You can use oraoop with hcatalog imports and exports.  My only guess
is launcher is failing because your path is not setup right.

You can precreate the hive table with the partition keys similar to
the oracle table (not all Oracle DB partitioning modes will be good
and if you have composite partitioning that may not map well in all
cases).

Here is a simple example.  So, your partitioning key is country_code
and you have one for partition for each country code.   Then you will
create the hive table with the same partitioning key and use hcatalog
imports.

If you want hcatalog imports to create the table, you have to
explicitly pass --create-hcatalog-table option.

Can you post the full log with --verbose option from sqoop also to get
better idea on what the issue is

Thanks

Venkat

On Tue, Aug 19, 2014 at 11:02 AM, Venkat, Ankam
<Ankam.Venkat@centurylink.com> wrote:
> Thanks Venkat for the inputs.  I just tried HCat import and got errors...
>
> sqoop import  \
> -Doraoop.disabled=true  \
> --connect jdbc:oracle:thin:@XXXXXXX:1521/YYYYYY \
> --table ENS.CUSTOMER \
> --hcatalog-table CUSTOMER \
> --username myid \
> --password mypwd
>
> 14/08/19 11:57:33 INFO mapreduce.Job: Running job: job_1405966899667_0869
> 14/08/19 11:57:53 INFO mapreduce.Job: Job job_1405966899667_0869 running in uber mode
: false
> 14/08/19 11:57:53 INFO mapreduce.Job:  map 0% reduce 0%
> 14/08/19 11:57:53 INFO mapreduce.Job: Job job_1405966899667_0869 failed with state FAILED
due to: Application application_1405966899667_0869 failed 2 times due to AM Container for
appattempt_1405966899667_0869_000002 exited with  exitCode: 1 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
> org.apache.hadoop.util.Shell$ExitCodeException:
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
>         at org.apache.hadoop.util.Shell.run(Shell.java:418)
>         at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
>         at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
>
> Can you please send me the syntax for importing Oracle partitions to Hive partitions?
 Also, Can I use Oraoop?
>
> Regards,
> Venkat
>
> -----Original Message-----
> From: Venkat Ranganathan [mailto:vranganathan@hortonworks.com]
> Sent: Tuesday, August 19, 2014 10:45 AM
> To: user@sqoop.apache.org
> Subject: Re: Import Partitions from Oracle to Hive Partitions
>
> What is the partitioning key in your Oracle table?   If you specify
> the same as partitioning keys on the hive table, then doing a simple hcatalog import
from the oracle table to hive will automatically handle all the partitioning (even multiple
level).
>
> You are right that Sqoop supported only one level of partitioning key but with 1.4.5
we allowed nesting (multiple static partition keys).
> Multiple dynamic partition keys were supported with hcatalog from
> 1.4.4
>
> Thanks
>
> Venkat
>
> On Tue, Aug 19, 2014 at 8:30 AM, Venkat, Ankam <Ankam.Venkat@centurylink.com> wrote:
>> David/Gwen,
>>
>> I have an issue with importing Oracle partitions to match Hadoop Files.
>>
>> Sqoop command:
>>
>> sqoop import  \
>> -Doraoop.disabled=false  \
>> -Doraoop.import.partitions='CUSTOMER_P01,CUSTOMER_P02,CUSTOMER_P03,CUS
>> TOMER_P04,CUSTOMER_P05,CUSTOMER_P06,CUSTOMER_P07,CUSTOMER_P08,CUSTOMER
>> _P09,CUSTOMER_P10,CUSTOMER_P11,CUSTOMER_P12,CUSTOMER_P13,CUSTOMER_P14,
>> CUSTOMER_P15,CUSTOMER_P16,CUSTOMER_P17,CUSTOMER_P18,CUSTOMER_P19,CUSTO
>> MER_P20,CUSTOMER_P21,CUSTOMER_P22,CUSTOMER_P23,CUSTOMER_P24,CUSTOMER_P
>> 25,CUSTOMER_P26,CUSTOMER_P27,CUSTOMER_P28,CUSTOMER_P29,CUSTOMER_P30,CU
>> STOMER_P31,CUSTOMER_P32,CUSTOMER_P33,CUSTOMER_P34,CUSTOMER_P35,CUSTOME
>> R_P36,CUSTOMER_P37,CUSTOMER_P38,CUSTOMER_P39,CUSTOMER_P40,CUSTOMER_P41
>> ,CUSTOMER_P42,CUSTOMER_P43,CUSTOMER_P44,CUSTOMER_P45,CUSTOMER_P46,CUST
>> OMER_P47,CUSTOMER_P48,CUSTOMER_P49,CUSTOMER_P50,CUSTOMER_P51,CUSTOMER_
>> P52,CUSTOMER_P53,CUSTOMER_P54,CUSTOMER_P55,CUSTOMER_P56,CUSTOMER_P57,C
>> USTOMER_P58,CUSTOMER_P59,CUSTOMER_P60,CUSTOMER_P61,CUSTOMER_P62,CUSTOM
>> ER_P63,CUSTOMER_P64,CUSTOMER_P65,CUSTOMER_P66,CUSTOMER_P67,CUSTOMER_P6
>> 8,CUSTOMER_P69,CUSTOMER_P70,CUSTOMER_P71,CUSTOMER_P72,CUSTOMER_P73,CUS
>> TOMER_P74,CUSTOMER_P75' \ -Doraoop.chunk.method=PARTITION \
>> -Doraoop.import.consistent.read=true \ --connect
>> jdbc:oracle:thin:@XXXXXXXXXX:1521/YYYYYY \ --table Schema.TableName \
>> --username MyID \ --password MyPWD \ -m 75
>>
>> Issues:
>> 1) There are 75 partitions on Oracle Table.  But, 61 are being imported because 14
partitions are empty. I have no way to identify the file belongs to which partition.
>> 2) All the partitions are imported with names as PART-00001, PART-00002 and so on.
 I thought -Doraoop.chunk.method=PARTITION option will match the hadoop files to Oracle Partions.
 I mean import as HDFS files with same name.
>>
>> Any workaround for this?
>>
>> Regards,
>> Venkat
>>
>>
>> -----Original Message-----
>> From: David Robson [mailto:David.Robson@software.dell.com]
>> Sent: Tuesday, August 05, 2014 6:08 PM
>> To: user@sqoop.apache.org
>> Subject: RE: Import Partitions from Oracle to Hive Partitions
>>
>> Yes now that you mention Sqoop is limited to one partition in Hive I do remember
that! I would think we could modify Sqoop to create subfolders for each partition - instead
of how it now creates a separate file for each partition? This would probably be limited to
the direct (OraOop) connector as it is aware of partitions (existing connector doesn't read
data dictionary directly).
>>
>> In the meantime Venkat - you could look at the option I mentioned - then manually
move the files into separate folders - at least you'll have each partition in a separate file
rather than spread throughout all files. The other thing you could look at is the option below
- you could run one Sqoop job per partition:
>>
>> Specify The Partitions To Import
>>
>> -Doraoop.import.partitions=PartitionA,PartitionB --table
>> OracleTableName
>>
>> Imports PartitionA and PartitionB of OracleTableName.
>>
>> Notes:
>> You can enclose an individual partition name in double quotes to retain the letter
case or if the name has special characters.
>> -Doraoop.import.partitions='"PartitionA",PartitionB' --table OracleTableName If the
partition name is not double quoted then its name will be automatically converted to upper
case, PARTITIONB for above.
>> When using double quotes the entire list of partition names must be enclosed in single
quotes.
>> If the last partition name in the list is double quoted then there
>> must be a comma at the end of the list.
>> -Doraoop.import.partitions='"PartitionA","PartitionB",' --table
>> OracleTableName
>>
>> Name each partition to be included. There is no facility to provide a range of partition
names.
>>
>> There is no facility to define sub partitions. The entire partition is included/excluded
as per the filter.
>>
>>
>> -----Original Message-----
>> From: Gwen Shapira [mailto:gshapira@cloudera.com]
>> Sent: Wednesday, 6 August 2014 8:44 AM
>> To: user@sqoop.apache.org
>> Subject: Re: Import Partitions from Oracle to Hive Partitions
>>
>> Hive expects a directory for each partition, so getting data with OraOop will require
some post-processing - copy files into properly named directories and adding the new partitions
to a hive table.
>>
>> Sqoop has the --hive-partition-key and --hive-partition-value, but this assumes that
all the data sqooped will fit into a single partition.
>>
>>
>> On Tue, Aug 5, 2014 at 3:40 PM, David Robson <David.Robson@software.dell.com>
wrote:
>>> Hi Venkat,
>>>
>>>
>>>
>>> I’m not sure what this will do in regards to Hive partitions – I’ll
>>> test it out when I get into the office and get back to you. But this
>>> option will make it so there is one file for each Oracle partition –
>>> which might be of interest to you.
>>>
>>>
>>>
>>> Match Hadoop Files to Oracle Table Partitions
>>>
>>>
>>>
>>> -Doraoop.chunk.method={ROWID|PARTITION}
>>>
>>>
>>>
>>> To import data from a partitioned table in such a way that the
>>> resulting HDFS folder structure in
>>>
>>> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
>>> The alternative
>>>
>>> (default) chunk method is ROWID.
>>>
>>>
>>>
>>> Notes:
>>>
>>> l For the number of Hadoop files to match the number of Oracle
>>> partitions, set the number
>>>
>>> of mappers to be greater than or equal to the number of partitions.
>>>
>>> l If the table is not partitioned then value PARTITION will lead to
>>> an error.
>>>
>>>
>>>
>>> David
>>>
>>>
>>>
>>>
>>>
>>> From: Venkat, Ankam [mailto:Ankam.Venkat@centurylink.com]
>>> Sent: Wednesday, 6 August 2014 3:56 AM
>>> To: 'user@sqoop.apache.org'
>>> Subject: Import Partitions from Oracle to Hive Partitions
>>>
>>>
>>>
>>> I am trying to import  partitions from Oracle table to Hive partitions.
>>>
>>>
>>>
>>> Can somebody provide the syntax using regular JDBC connector and
>>> Oraoop connector?
>>>
>>>
>>>
>>> Thanks in advance.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Venkat
>>>
>>>
>>>
>>>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it
is addressed and may contain information that is confidential, privileged and exempt from
disclosure under applicable law. If the reader of this message is not the intended recipient,
you are hereby notified that any printing, copying, dissemination, distribution, disclosure
or forwarding of this communication is strictly prohibited. If you have received this communication
in error, please contact the sender immediately and delete it from your system. Thank You.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Mime
View raw message