sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venkat, Ankam" <Ankam.Ven...@centurylink.com>
Subject RE: Import Partitions from Oracle to Hive Partitions
Date Wed, 06 Aug 2014 14:56:54 GMT
Thanks for the response.  

I was thinking to use Oraoop to automatically import Oracle partitions to Hive partitions.
 But, based on conversation below, I just learned its not possible.  

From automation perspective, I think running one Sqoop job per partition and create same partition
in Hive is better option.  

Gwen/David:  Yes, it will be a good feature to have Oracle Partitions to Hive partitions.
 Any idea why there are no commits to Oraoop since 2012?

Regards,
Venkat

-----Original Message-----
From: Gwen Shapira [mailto:gshapira@cloudera.com] 
Sent: Tuesday, August 05, 2014 6:24 PM
To: user@sqoop.apache.org
Subject: Re: Import Partitions from Oracle to Hive Partitions

Having OraOop automatically handle partitions in Hive will be a cool feature. I agree that
this will be limited to OraOop for now.

On Tue, Aug 5, 2014 at 5:08 PM, David Robson <David.Robson@software.dell.com> wrote:
> Yes now that you mention Sqoop is limited to one partition in Hive I do remember that!
I would think we could modify Sqoop to create subfolders for each partition - instead of how
it now creates a separate file for each partition? This would probably be limited to the direct
(OraOop) connector as it is aware of partitions (existing connector doesn't read data dictionary
directly).
>
> In the meantime Venkat - you could look at the option I mentioned - then manually move
the files into separate folders - at least you'll have each partition in a separate file rather
than spread throughout all files. The other thing you could look at is the option below -
you could run one Sqoop job per partition:
>
> Specify The Partitions To Import
>
> -Doraoop.import.partitions=PartitionA,PartitionB --table 
> OracleTableName
>
> Imports PartitionA and PartitionB of OracleTableName.
>
> Notes:
> You can enclose an individual partition name in double quotes to 
> retain the letter case or if the name has special characters.
> -Doraoop.import.partitions='"PartitionA",PartitionB' --table 
> OracleTableName If the partition name is not double quoted then its 
> name will be automatically converted to upper case, PARTITIONB for 
> above.
> When using double quotes the entire list of partition names must be 
> enclosed in single quotes.
> If the last partition name in the list is double quoted then there 
> must be a comma at the end of the list. 
> -Doraoop.import.partitions='"PartitionA","PartitionB",' --table 
> OracleTableName
>
> Name each partition to be included. There is no facility to provide a range of partition
names.
>
> There is no facility to define sub partitions. The entire partition is included/excluded
as per the filter.
>
>
> -----Original Message-----
> From: Gwen Shapira [mailto:gshapira@cloudera.com]
> Sent: Wednesday, 6 August 2014 8:44 AM
> To: user@sqoop.apache.org
> Subject: Re: Import Partitions from Oracle to Hive Partitions
>
> Hive expects a directory for each partition, so getting data with OraOop will require
some post-processing - copy files into properly named directories and adding the new partitions
to a hive table.
>
> Sqoop has the --hive-partition-key and --hive-partition-value, but this assumes that
all the data sqooped will fit into a single partition.
>
>
> On Tue, Aug 5, 2014 at 3:40 PM, David Robson <David.Robson@software.dell.com> wrote:
>> Hi Venkat,
>>
>>
>>
>> I’m not sure what this will do in regards to Hive partitions – I’ll 
>> test it out when I get into the office and get back to you. But this 
>> option will make it so there is one file for each Oracle partition – 
>> which might be of interest to you.
>>
>>
>>
>> Match Hadoop Files to Oracle Table Partitions
>>
>>
>>
>> -Doraoop.chunk.method={ROWID|PARTITION}
>>
>>
>>
>> To import data from a partitioned table in such a way that the 
>> resulting HDFS folder structure in
>>
>> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
>> The alternative
>>
>> (default) chunk method is ROWID.
>>
>>
>>
>> Notes:
>>
>> l For the number of Hadoop files to match the number of Oracle 
>> partitions, set the number
>>
>> of mappers to be greater than or equal to the number of partitions.
>>
>> l If the table is not partitioned then value PARTITION will lead to 
>> an error.
>>
>>
>>
>> David
>>
>>
>>
>>
>>
>> From: Venkat, Ankam [mailto:Ankam.Venkat@centurylink.com]
>> Sent: Wednesday, 6 August 2014 3:56 AM
>> To: 'user@sqoop.apache.org'
>> Subject: Import Partitions from Oracle to Hive Partitions
>>
>>
>>
>> I am trying to import  partitions from Oracle table to Hive partitions.
>>
>>
>>
>> Can somebody provide the syntax using regular JDBC connector and 
>> Oraoop connector?
>>
>>
>>
>> Thanks in advance.
>>
>>
>>
>> Regards,
>>
>> Venkat
>>
>>
>>
>>
Mime
View raw message