sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gwen Shapira <gshap...@cloudera.com>
Subject Re: Import Partitions from Oracle to Hive Partitions
Date Thu, 07 Aug 2014 17:19:50 GMT
Go for it, Venkat!

If you have questions about writing a patch, feel free to ask on the
dev@sqoop.apache.org mailing list.



On Thu, Aug 7, 2014 at 9:56 AM, Venkat, Ankam
<Ankam.Venkat@centurylink.com> wrote:
> Gwen,
>
> Created a jira at https://issues.apache.org/jira/browse/SQOOP-1415.
>
> Yes, I would love to contribute a patch for this.
>
> Regards,
> Venkat
>
> -----Original Message-----
> From: Gwen Shapira [mailto:gshapira@cloudera.com]
> Sent: Wednesday, August 06, 2014 9:45 AM
> To: user@sqoop.apache.org
> Subject: Re: Import Partitions from Oracle to Hive Partitions
>
> Venkat,
>
> Running one sqoop job and moving files to different directories should be faster than
a sqoop job per partition (at least it was for my customers).
>
> If you are interested in a new OraOop feature, why not open a Jira in issues.apache.org?
> You can even contribute a patch if you are so inclined :)
>
> Gwen
>
> On Wed, Aug 6, 2014 at 7:56 AM, Venkat, Ankam <Ankam.Venkat@centurylink.com> wrote:
>> Thanks for the response.
>>
>> I was thinking to use Oraoop to automatically import Oracle partitions to Hive partitions.
 But, based on conversation below, I just learned its not possible.
>>
>> From automation perspective, I think running one Sqoop job per partition and create
same partition in Hive is better option.
>>
>> Gwen/David:  Yes, it will be a good feature to have Oracle Partitions to Hive partitions.
 Any idea why there are no commits to Oraoop since 2012?
>>
>> Regards,
>> Venkat
>>
>> -----Original Message-----
>> From: Gwen Shapira [mailto:gshapira@cloudera.com]
>> Sent: Tuesday, August 05, 2014 6:24 PM
>> To: user@sqoop.apache.org
>> Subject: Re: Import Partitions from Oracle to Hive Partitions
>>
>> Having OraOop automatically handle partitions in Hive will be a cool feature. I agree
that this will be limited to OraOop for now.
>>
>> On Tue, Aug 5, 2014 at 5:08 PM, David Robson <David.Robson@software.dell.com>
wrote:
>>> Yes now that you mention Sqoop is limited to one partition in Hive I do remember
that! I would think we could modify Sqoop to create subfolders for each partition - instead
of how it now creates a separate file for each partition? This would probably be limited to
the direct (OraOop) connector as it is aware of partitions (existing connector doesn't read
data dictionary directly).
>>>
>>> In the meantime Venkat - you could look at the option I mentioned - then manually
move the files into separate folders - at least you'll have each partition in a separate file
rather than spread throughout all files. The other thing you could look at is the option below
- you could run one Sqoop job per partition:
>>>
>>> Specify The Partitions To Import
>>>
>>> -Doraoop.import.partitions=PartitionA,PartitionB --table
>>> OracleTableName
>>>
>>> Imports PartitionA and PartitionB of OracleTableName.
>>>
>>> Notes:
>>> You can enclose an individual partition name in double quotes to
>>> retain the letter case or if the name has special characters.
>>> -Doraoop.import.partitions='"PartitionA",PartitionB' --table
>>> OracleTableName If the partition name is not double quoted then its
>>> name will be automatically converted to upper case, PARTITIONB for
>>> above.
>>> When using double quotes the entire list of partition names must be
>>> enclosed in single quotes.
>>> If the last partition name in the list is double quoted then there
>>> must be a comma at the end of the list.
>>> -Doraoop.import.partitions='"PartitionA","PartitionB",' --table
>>> OracleTableName
>>>
>>> Name each partition to be included. There is no facility to provide a range of
partition names.
>>>
>>> There is no facility to define sub partitions. The entire partition is included/excluded
as per the filter.
>>>
>>>
>>> -----Original Message-----
>>> From: Gwen Shapira [mailto:gshapira@cloudera.com]
>>> Sent: Wednesday, 6 August 2014 8:44 AM
>>> To: user@sqoop.apache.org
>>> Subject: Re: Import Partitions from Oracle to Hive Partitions
>>>
>>> Hive expects a directory for each partition, so getting data with OraOop will
require some post-processing - copy files into properly named directories and adding the new
partitions to a hive table.
>>>
>>> Sqoop has the --hive-partition-key and --hive-partition-value, but this assumes
that all the data sqooped will fit into a single partition.
>>>
>>>
>>> On Tue, Aug 5, 2014 at 3:40 PM, David Robson <David.Robson@software.dell.com>
wrote:
>>>> Hi Venkat,
>>>>
>>>>
>>>>
>>>> I’m not sure what this will do in regards to Hive partitions – I’ll
>>>> test it out when I get into the office and get back to you. But this
>>>> option will make it so there is one file for each Oracle partition –
>>>> which might be of interest to you.
>>>>
>>>>
>>>>
>>>> Match Hadoop Files to Oracle Table Partitions
>>>>
>>>>
>>>>
>>>> -Doraoop.chunk.method={ROWID|PARTITION}
>>>>
>>>>
>>>>
>>>> To import data from a partitioned table in such a way that the
>>>> resulting HDFS folder structure in
>>>>
>>>> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
>>>> The alternative
>>>>
>>>> (default) chunk method is ROWID.
>>>>
>>>>
>>>>
>>>> Notes:
>>>>
>>>> l For the number of Hadoop files to match the number of Oracle
>>>> partitions, set the number
>>>>
>>>> of mappers to be greater than or equal to the number of partitions.
>>>>
>>>> l If the table is not partitioned then value PARTITION will lead to
>>>> an error.
>>>>
>>>>
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> From: Venkat, Ankam [mailto:Ankam.Venkat@centurylink.com]
>>>> Sent: Wednesday, 6 August 2014 3:56 AM
>>>> To: 'user@sqoop.apache.org'
>>>> Subject: Import Partitions from Oracle to Hive Partitions
>>>>
>>>>
>>>>
>>>> I am trying to import  partitions from Oracle table to Hive partitions.
>>>>
>>>>
>>>>
>>>> Can somebody provide the syntax using regular JDBC connector and
>>>> Oraoop connector?
>>>>
>>>>
>>>>
>>>> Thanks in advance.
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Venkat
>>>>
>>>>
>>>>
>>>>

Mime
View raw message