sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gwen Shapira <gshap...@cloudera.com>
Subject Re: Import Partitions from Oracle to Hive Partitions
Date Wed, 06 Aug 2014 15:44:46 GMT
Venkat,

Running one sqoop job and moving files to different directories should
be faster than a sqoop job per partition (at least it was for my
customers).

If you are interested in a new OraOop feature, why not open a Jira in
issues.apache.org?
You can even contribute a patch if you are so inclined :)

Gwen

On Wed, Aug 6, 2014 at 7:56 AM, Venkat, Ankam
<Ankam.Venkat@centurylink.com> wrote:
> Thanks for the response.
>
> I was thinking to use Oraoop to automatically import Oracle partitions to Hive partitions.
 But, based on conversation below, I just learned its not possible.
>
> From automation perspective, I think running one Sqoop job per partition and create same
partition in Hive is better option.
>
> Gwen/David:  Yes, it will be a good feature to have Oracle Partitions to Hive partitions.
 Any idea why there are no commits to Oraoop since 2012?
>
> Regards,
> Venkat
>
> -----Original Message-----
> From: Gwen Shapira [mailto:gshapira@cloudera.com]
> Sent: Tuesday, August 05, 2014 6:24 PM
> To: user@sqoop.apache.org
> Subject: Re: Import Partitions from Oracle to Hive Partitions
>
> Having OraOop automatically handle partitions in Hive will be a cool feature. I agree
that this will be limited to OraOop for now.
>
> On Tue, Aug 5, 2014 at 5:08 PM, David Robson <David.Robson@software.dell.com> wrote:
>> Yes now that you mention Sqoop is limited to one partition in Hive I do remember
that! I would think we could modify Sqoop to create subfolders for each partition - instead
of how it now creates a separate file for each partition? This would probably be limited to
the direct (OraOop) connector as it is aware of partitions (existing connector doesn't read
data dictionary directly).
>>
>> In the meantime Venkat - you could look at the option I mentioned - then manually
move the files into separate folders - at least you'll have each partition in a separate file
rather than spread throughout all files. The other thing you could look at is the option below
- you could run one Sqoop job per partition:
>>
>> Specify The Partitions To Import
>>
>> -Doraoop.import.partitions=PartitionA,PartitionB --table
>> OracleTableName
>>
>> Imports PartitionA and PartitionB of OracleTableName.
>>
>> Notes:
>> You can enclose an individual partition name in double quotes to
>> retain the letter case or if the name has special characters.
>> -Doraoop.import.partitions='"PartitionA",PartitionB' --table
>> OracleTableName If the partition name is not double quoted then its
>> name will be automatically converted to upper case, PARTITIONB for
>> above.
>> When using double quotes the entire list of partition names must be
>> enclosed in single quotes.
>> If the last partition name in the list is double quoted then there
>> must be a comma at the end of the list.
>> -Doraoop.import.partitions='"PartitionA","PartitionB",' --table
>> OracleTableName
>>
>> Name each partition to be included. There is no facility to provide a range of partition
names.
>>
>> There is no facility to define sub partitions. The entire partition is included/excluded
as per the filter.
>>
>>
>> -----Original Message-----
>> From: Gwen Shapira [mailto:gshapira@cloudera.com]
>> Sent: Wednesday, 6 August 2014 8:44 AM
>> To: user@sqoop.apache.org
>> Subject: Re: Import Partitions from Oracle to Hive Partitions
>>
>> Hive expects a directory for each partition, so getting data with OraOop will require
some post-processing - copy files into properly named directories and adding the new partitions
to a hive table.
>>
>> Sqoop has the --hive-partition-key and --hive-partition-value, but this assumes that
all the data sqooped will fit into a single partition.
>>
>>
>> On Tue, Aug 5, 2014 at 3:40 PM, David Robson <David.Robson@software.dell.com>
wrote:
>>> Hi Venkat,
>>>
>>>
>>>
>>> I’m not sure what this will do in regards to Hive partitions – I’ll
>>> test it out when I get into the office and get back to you. But this
>>> option will make it so there is one file for each Oracle partition –
>>> which might be of interest to you.
>>>
>>>
>>>
>>> Match Hadoop Files to Oracle Table Partitions
>>>
>>>
>>>
>>> -Doraoop.chunk.method={ROWID|PARTITION}
>>>
>>>
>>>
>>> To import data from a partitioned table in such a way that the
>>> resulting HDFS folder structure in
>>>
>>> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
>>> The alternative
>>>
>>> (default) chunk method is ROWID.
>>>
>>>
>>>
>>> Notes:
>>>
>>> l For the number of Hadoop files to match the number of Oracle
>>> partitions, set the number
>>>
>>> of mappers to be greater than or equal to the number of partitions.
>>>
>>> l If the table is not partitioned then value PARTITION will lead to
>>> an error.
>>>
>>>
>>>
>>> David
>>>
>>>
>>>
>>>
>>>
>>> From: Venkat, Ankam [mailto:Ankam.Venkat@centurylink.com]
>>> Sent: Wednesday, 6 August 2014 3:56 AM
>>> To: 'user@sqoop.apache.org'
>>> Subject: Import Partitions from Oracle to Hive Partitions
>>>
>>>
>>>
>>> I am trying to import  partitions from Oracle table to Hive partitions.
>>>
>>>
>>>
>>> Can somebody provide the syntax using regular JDBC connector and
>>> Oraoop connector?
>>>
>>>
>>>
>>> Thanks in advance.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Venkat
>>>
>>>
>>>
>>>

Mime
View raw message