sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gwen Shapira <gshap...@cloudera.com>
Subject Re: Import Partitions from Oracle to Hive Partitions
Date Wed, 06 Aug 2014 00:24:21 GMT
Having OraOop automatically handle partitions in Hive will be a cool
feature. I agree that this will be limited to OraOop for now.

On Tue, Aug 5, 2014 at 5:08 PM, David Robson
<David.Robson@software.dell.com> wrote:
> Yes now that you mention Sqoop is limited to one partition in Hive I do remember that!
I would think we could modify Sqoop to create subfolders for each partition - instead of how
it now creates a separate file for each partition? This would probably be limited to the direct
(OraOop) connector as it is aware of partitions (existing connector doesn't read data dictionary
directly).
>
> In the meantime Venkat - you could look at the option I mentioned - then manually move
the files into separate folders - at least you'll have each partition in a separate file rather
than spread throughout all files. The other thing you could look at is the option below -
you could run one Sqoop job per partition:
>
> Specify The Partitions To Import
>
> -Doraoop.import.partitions=PartitionA,PartitionB --table OracleTableName
>
> Imports PartitionA and PartitionB of OracleTableName.
>
> Notes:
> You can enclose an individual partition name in double quotes to retain the letter case
or
> if the name has special characters.
> -Doraoop.import.partitions='"PartitionA",PartitionB' --table
> OracleTableName
> If the partition name is not double quoted then its name will be automatically converted
> to upper case, PARTITIONB for above.
> When using double quotes the entire list of partition names must be enclosed in
> single quotes.
> If the last partition name in the list is double quoted then there must be a comma at
the end of the list. -Doraoop.import.partitions='"PartitionA","PartitionB",' --table OracleTableName
>
> Name each partition to be included. There is no facility to provide a range of partition
names.
>
> There is no facility to define sub partitions. The entire partition is included/excluded
as per the filter.
>
>
> -----Original Message-----
> From: Gwen Shapira [mailto:gshapira@cloudera.com]
> Sent: Wednesday, 6 August 2014 8:44 AM
> To: user@sqoop.apache.org
> Subject: Re: Import Partitions from Oracle to Hive Partitions
>
> Hive expects a directory for each partition, so getting data with OraOop will require
some post-processing - copy files into properly named directories and adding the new partitions
to a hive table.
>
> Sqoop has the --hive-partition-key and --hive-partition-value, but this assumes that
all the data sqooped will fit into a single partition.
>
>
> On Tue, Aug 5, 2014 at 3:40 PM, David Robson <David.Robson@software.dell.com> wrote:
>> Hi Venkat,
>>
>>
>>
>> I’m not sure what this will do in regards to Hive partitions – I’ll
>> test it out when I get into the office and get back to you. But this
>> option will make it so there is one file for each Oracle partition –
>> which might be of interest to you.
>>
>>
>>
>> Match Hadoop Files to Oracle Table Partitions
>>
>>
>>
>> -Doraoop.chunk.method={ROWID|PARTITION}
>>
>>
>>
>> To import data from a partitioned table in such a way that the
>> resulting HDFS folder structure in
>>
>> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
>> The alternative
>>
>> (default) chunk method is ROWID.
>>
>>
>>
>> Notes:
>>
>> l For the number of Hadoop files to match the number of Oracle
>> partitions, set the number
>>
>> of mappers to be greater than or equal to the number of partitions.
>>
>> l If the table is not partitioned then value PARTITION will lead to an
>> error.
>>
>>
>>
>> David
>>
>>
>>
>>
>>
>> From: Venkat, Ankam [mailto:Ankam.Venkat@centurylink.com]
>> Sent: Wednesday, 6 August 2014 3:56 AM
>> To: 'user@sqoop.apache.org'
>> Subject: Import Partitions from Oracle to Hive Partitions
>>
>>
>>
>> I am trying to import  partitions from Oracle table to Hive partitions.
>>
>>
>>
>> Can somebody provide the syntax using regular JDBC connector and
>> Oraoop connector?
>>
>>
>>
>> Thanks in advance.
>>
>>
>>
>> Regards,
>>
>> Venkat
>>
>>
>>
>>

Mime
View raw message