sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Boris Tyukin (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SQOOP-3185) Oracle clustered tables not supported by oraoop (direct Oracle mode)
Date Mon, 08 May 2017 15:51:10 GMT
Boris Tyukin created SQOOP-3185:
-----------------------------------

             Summary: Oracle clustered tables not supported by oraoop (direct Oracle mode)
                 Key: SQOOP-3185
                 URL: https://issues.apache.org/jira/browse/SQOOP-3185
             Project: Sqoop
          Issue Type: Bug
          Components: connectors/oracle
    Affects Versions: 1.4.6
         Environment: CDH 5.9
            Reporter: Boris Tyukin


Oracle clustered tables not supported by oraoop (direct Oracle mode)
------------------------------------------------------------------------

More about clustered tables here https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_5001.htm)

While ingesting data using sqoop's direct mode, we noticed that it does not support these
tables and would error out with error, stating that there are no records in the source table.

The issue is happening because sqoop is unable to determine database extents for such tables
and hence chunks of data for splits in src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java

This code would generate SQL similar to below and dba_extents table would not have segment_name
named as object name. 

For example in our case, ENCOUNTER table is clustered and dba_extents segment_name would differ
from object_name. 

This query below would not return any data:

SELECT * FROM dba_objects o, dba_extents e
WHERE                o.owner               = 'V500'
                              AND o.object_name     = 'ENCOUNTER'
                              AND e.owner = 'V500'
                              AND e.segment_name = 'ENCOUNTER'
                              AND (o.subobject_name = e.partition_name
                              OR (o.subobject_name IS NULL
                              AND e.partition_name IS NULL))
;                             

but this query will (note how Oracle DB named object_name and segment_name):

Here is a modified query that returns data:
SELECT * FROM dba_objects o, dba_extents e
WHERE                o.owner               = 'V500'
                              AND o.object_name     = 'CLU_ENCOUNTER0077'
                              AND e.owner = 'V500'
                              AND e.segment_name = 'CLU_ENCOUNTER0077'
                              AND (o.subobject_name = e.partition_name
                              OR (o.subobject_name IS NULL
                              AND e.partition_name IS NULL))
;                             



sqoop import --direct --connect ....

--- cut from log ---

17/05/01 09:35:09 DEBUG oracle.OraOopOracleQueries: getOracleDataChunksExtent() SQL Query
=
SELECT data_object_id, file_id, relative_fno, file_batch, MIN (start_block_id) start_block_id,
MAX (end_block_id) end_block_id, SUM (blocks) blocks FROM (SELECT o.data_object_id, e.file_id,
e.relative_fno, e.block_id start_block_id, e.block_id + e.blocks - 1 end_block_id, e.blocks,
CEIL (    SUM (       e.blocks)    OVER (PARTITION BY o.data_object_id, e.file_id        
 ORDER BY e.block_id ASC)    / (SUM (e.blocks)          OVER (PARTITION BY o.data_object_id,
e.file_id)       / 17))    file_batch FROM dba_extents e, dba_objects o, dba_tab_subpartitions
tsp WHERE     o.owner = V500 AND o.object_name = ENCOUNTER AND e.owner = V500 AND e.segment_name
= ENCOUNTER AND o.owner = e.owner AND o.object_name = e.segment_name AND (o.subobject_name
= e.partition_name      OR (o.subobject_name IS NULL AND e.partition_name IS NULL)) AND o.owner
= tsp.table_owner(+) AND o.object_name = tsp.table_name(+) AND o.subobject_name = tsp.subpartition_name(+)
) GROUP BY data_object_id,          file_id,          relative_fno,          file_batch ORDER
BY data_object_id,          file_id,          relative_fno,          file_batch
17/05/01 09:35:09 FATAL oracle.OraOopDataDrivenDBInputFormat: The table ENCOUNTER does not
contain any data.
17/05/01 09:35:09 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/oracle/.staging/job_1493213922849_0970
17/05/01 09:35:09 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@2f7a2457
17/05/01 09:35:09 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException:
The table ENCOUNTER does not contain any data.
java.lang.RuntimeException: The table ENCOUNTER does not contain any data.
        at org.apache.sqoop.manager.oracle.OraOopDataDrivenDBInputFormat.getSplits(OraOopDataDrivenDBInputFormat.java:108)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:305)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:322)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1325)
        at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:203)
        at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:176)
        at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:273)
        at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
        at org.apache.sqoop.manager.oracle.OraOopConnManager.importTable(OraOopConnManager.java:284)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:507)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message