sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venkat, Ankam" <Ankam.Ven...@centurylink.com>
Subject RE: Import Partitions from Oracle to Hive Partitions
Date Thu, 07 Aug 2014 16:56:54 GMT
Gwen,

Created a jira at https://issues.apache.org/jira/browse/SQOOP-1415.

Yes, I would love to contribute a patch for this.  

Regards,
Venkat

-----Original Message-----
From: Gwen Shapira [mailto:gshapira@cloudera.com] 
Sent: Wednesday, August 06, 2014 9:45 AM
To: user@sqoop.apache.org
Subject: Re: Import Partitions from Oracle to Hive Partitions

Venkat,

Running one sqoop job and moving files to different directories should be faster than a sqoop
job per partition (at least it was for my customers).

If you are interested in a new OraOop feature, why not open a Jira in issues.apache.org?
You can even contribute a patch if you are so inclined :)

Gwen

On Wed, Aug 6, 2014 at 7:56 AM, Venkat, Ankam <Ankam.Venkat@centurylink.com> wrote:
> Thanks for the response.
>
> I was thinking to use Oraoop to automatically import Oracle partitions to Hive partitions.
 But, based on conversation below, I just learned its not possible.
>
> From automation perspective, I think running one Sqoop job per partition and create same
partition in Hive is better option.
>
> Gwen/David:  Yes, it will be a good feature to have Oracle Partitions to Hive partitions.
 Any idea why there are no commits to Oraoop since 2012?
>
> Regards,
> Venkat
>
> -----Original Message-----
> From: Gwen Shapira [mailto:gshapira@cloudera.com]
> Sent: Tuesday, August 05, 2014 6:24 PM
> To: user@sqoop.apache.org
> Subject: Re: Import Partitions from Oracle to Hive Partitions
>
> Having OraOop automatically handle partitions in Hive will be a cool feature. I agree
that this will be limited to OraOop for now.
>
> On Tue, Aug 5, 2014 at 5:08 PM, David Robson <David.Robson@software.dell.com> wrote:
>> Yes now that you mention Sqoop is limited to one partition in Hive I do remember
that! I would think we could modify Sqoop to create subfolders for each partition - instead
of how it now creates a separate file for each partition? This would probably be limited to
the direct (OraOop) connector as it is aware of partitions (existing connector doesn't read
data dictionary directly).
>>
>> In the meantime Venkat - you could look at the option I mentioned - then manually
move the files into separate folders - at least you'll have each partition in a separate file
rather than spread throughout all files. The other thing you could look at is the option below
- you could run one Sqoop job per partition:
>>
>> Specify The Partitions To Import
>>
>> -Doraoop.import.partitions=PartitionA,PartitionB --table 
>> OracleTableName
>>
>> Imports PartitionA and PartitionB of OracleTableName.
>>
>> Notes:
>> You can enclose an individual partition name in double quotes to 
>> retain the letter case or if the name has special characters.
>> -Doraoop.import.partitions='"PartitionA",PartitionB' --table 
>> OracleTableName If the partition name is not double quoted then its 
>> name will be automatically converted to upper case, PARTITIONB for 
>> above.
>> When using double quotes the entire list of partition names must be 
>> enclosed in single quotes.
>> If the last partition name in the list is double quoted then there 
>> must be a comma at the end of the list.
>> -Doraoop.import.partitions='"PartitionA","PartitionB",' --table 
>> OracleTableName
>>
>> Name each partition to be included. There is no facility to provide a range of partition
names.
>>
>> There is no facility to define sub partitions. The entire partition is included/excluded
as per the filter.
>>
>>
>> -----Original Message-----
>> From: Gwen Shapira [mailto:gshapira@cloudera.com]
>> Sent: Wednesday, 6 August 2014 8:44 AM
>> To: user@sqoop.apache.org
>> Subject: Re: Import Partitions from Oracle to Hive Partitions
>>
>> Hive expects a directory for each partition, so getting data with OraOop will require
some post-processing - copy files into properly named directories and adding the new partitions
to a hive table.
>>
>> Sqoop has the --hive-partition-key and --hive-partition-value, but this assumes that
all the data sqooped will fit into a single partition.
>>
>>
>> On Tue, Aug 5, 2014 at 3:40 PM, David Robson <David.Robson@software.dell.com>
wrote:
>>> Hi Venkat,
>>>
>>>
>>>
>>> I’m not sure what this will do in regards to Hive partitions – I’ll 
>>> test it out when I get into the office and get back to you. But this 
>>> option will make it so there is one file for each Oracle partition – 
>>> which might be of interest to you.
>>>
>>>
>>>
>>> Match Hadoop Files to Oracle Table Partitions
>>>
>>>
>>>
>>> -Doraoop.chunk.method={ROWID|PARTITION}
>>>
>>>
>>>
>>> To import data from a partitioned table in such a way that the 
>>> resulting HDFS folder structure in
>>>
>>> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
>>> The alternative
>>>
>>> (default) chunk method is ROWID.
>>>
>>>
>>>
>>> Notes:
>>>
>>> l For the number of Hadoop files to match the number of Oracle 
>>> partitions, set the number
>>>
>>> of mappers to be greater than or equal to the number of partitions.
>>>
>>> l If the table is not partitioned then value PARTITION will lead to 
>>> an error.
>>>
>>>
>>>
>>> David
>>>
>>>
>>>
>>>
>>>
>>> From: Venkat, Ankam [mailto:Ankam.Venkat@centurylink.com]
>>> Sent: Wednesday, 6 August 2014 3:56 AM
>>> To: 'user@sqoop.apache.org'
>>> Subject: Import Partitions from Oracle to Hive Partitions
>>>
>>>
>>>
>>> I am trying to import  partitions from Oracle table to Hive partitions.
>>>
>>>
>>>
>>> Can somebody provide the syntax using regular JDBC connector and 
>>> Oraoop connector?
>>>
>>>
>>>
>>> Thanks in advance.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Venkat
>>>
>>>
>>>
>>>
Mime
View raw message