sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "praveen m (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SQOOP-1302) Doesn't run the mapper for remaining splits, when split-by ROWNUM
Date Wed, 02 Apr 2014 19:50:19 GMT
praveen m created SQOOP-1302:
--------------------------------

             Summary: Doesn't run the mapper for remaining splits, when split-by ROWNUM
                 Key: SQOOP-1302
                 URL: https://issues.apache.org/jira/browse/SQOOP-1302
             Project: Sqoop
          Issue Type: Bug
          Components: build
    Affects Versions: 1.4.3
         Environment: CDH 4.6.0-1
            Reporter: praveen m
             Fix For: 1.4.5


when trying to sqoop, import the table from  Oracle database to HDFS, I am  using the ROWNUM
to split by, since the table doesn't have a primary key. I am just getting the exactly the
1/4 of the data, remaining 3/4 not sqooped in with default 4 mappers. It is splitting correctly
into 4 pieces but it runs mapper on only one.
Eg: I have total of 28 records in table, The records ingested is is only 8
 records in hdfs.

HDFS OUTPUT:

part-m-00000	544 bytes	
part-m-00001	0 bytes	
part-m-00002	0 bytes	
part-m-00003	0 bytes	

LOG:
Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /hdp2014poc/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/bin/../lib/sqoop/../accumulo
does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
14/04/02 12:38:22 INFO sqoop.Sqoop: Running Sqoop version: 1.4.3-cdh4.6.0
14/04/02 12:38:22 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure.
Consider using -P instead.
14/04/02 12:38:22 INFO manager.SqlManager: Using default fetchSize of 1000
14/04/02 12:38:22 INFO tool.CodeGenTool: Beginning code generation
14/04/02 12:38:23 INFO manager.OracleManager: Time zone has been set to GMT
14/04/02 12:38:23 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM BILL_FEE_CDE_DIM
t WHERE 1=0
14/04/02 12:38:23 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hdp2014poc/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/bin/../lib/hadoop-0.20-mapreduce
Note: /tmp/sqoop-dmadmin/compile/6a8ac204650278b4235d199bb6059358/BILL_FEE_CDE_DIM.java uses
or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/04/02 12:38:24 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-dmadmin/compile/6a8ac204650278b4235d199bb6059358/BILL_FEE_CDE_DIM.jar
14/04/02 12:38:24 INFO mapreduce.ImportJobBase: Beginning import of BILL_FEE_CDE_DIM
14/04/02 12:38:25 INFO manager.OracleManager: Time zone has been set to GMT
14/04/02 12:38:25 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments.
Applications should implement Tool for the same.
14/04/02 12:38:26 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(ROWNUM),
MAX(ROWNUM) FROM BILL_FEE_CDE_DIM
14/04/02 12:38:26 INFO mapred.JobClient: Running job: job_201403281357_0126
14/04/02 12:38:27 INFO mapred.JobClient:  map 0% reduce 0%
14/04/02 12:38:38 INFO mapred.JobClient:  map 50% reduce 0%
14/04/02 12:38:39 INFO mapred.JobClient:  map 75% reduce 0%
14/04/02 12:38:47 INFO mapred.JobClient:  map 100% reduce 0%
14/04/02 12:38:49 INFO mapred.JobClient: Job complete: job_201403281357_0126
14/04/02 12:38:49 INFO mapred.JobClient: Counters: 23
14/04/02 12:38:49 INFO mapred.JobClient:   File System Counters
14/04/02 12:38:49 INFO mapred.JobClient:     FILE: Number of bytes read=0
14/04/02 12:38:49 INFO mapred.JobClient:     FILE: Number of bytes written=765188
14/04/02 12:38:49 INFO mapred.JobClient:     FILE: Number of read operations=0
14/04/02 12:38:49 INFO mapred.JobClient:     FILE: Number of large read operations=0
14/04/02 12:38:49 INFO mapred.JobClient:     FILE: Number of write operations=0
14/04/02 12:38:49 INFO mapred.JobClient:     HDFS: Number of bytes read=435
14/04/02 12:38:49 INFO mapred.JobClient:     HDFS: Number of bytes written=544
14/04/02 12:38:49 INFO mapred.JobClient:     HDFS: Number of read operations=4
14/04/02 12:38:49 INFO mapred.JobClient:     HDFS: Number of large read operations=0
14/04/02 12:38:49 INFO mapred.JobClient:     HDFS: Number of write operations=4
14/04/02 12:38:49 INFO mapred.JobClient:   Job Counters
14/04/02 12:38:49 INFO mapred.JobClient:     Launched map tasks=4
14/04/02 12:38:49 INFO mapred.JobClient:     Total time spent by all maps in occupied slots
(ms)=34570
14/04/02 12:38:49 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots
(ms)=0
14/04/02 12:38:49 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving
slots (ms)=0
14/04/02 12:38:49 INFO mapred.JobClient:     Total time spent by all reduces waiting after
reserving slots (ms)=0
14/04/02 12:38:49 INFO mapred.JobClient:   Map-Reduce Framework
14/04/02 12:38:49 INFO mapred.JobClient:     Map input records=8
14/04/02 12:38:49 INFO mapred.JobClient:     Map output records=8
14/04/02 12:38:49 INFO mapred.JobClient:     Input split bytes=435
14/04/02 12:38:49 INFO mapred.JobClient:     Spilled Records=0
14/04/02 12:38:49 INFO mapred.JobClient:     CPU time spent (ms)=8030
14/04/02 12:38:49 INFO mapred.JobClient:     Physical memory (bytes) snapshot=1160265728
14/04/02 12:38:49 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=6454796288
14/04/02 12:38:49 INFO mapred.JobClient:     Total committed heap usage (bytes)=3032481792
14/04/02 12:38:49 INFO mapreduce.ImportJobBase: Transferred 544 bytes in 24.8174 seconds (21.9201
bytes/sec)
14/04/02 12:38:49 INFO mapreduce.ImportJobBase: Retrieved 8 records.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message