sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arvind Prabhakar <arv...@apache.org>
Subject Re: [sqoop-user] Re: Sqoop export not having reduce jobs , even table is partitions
Date Thu, 11 Aug 2011 19:19:29 GMT
Thanks Bejoy.

We have considered adding reduce jobs to Sqoop to further partition
the output files. See [SQOOP-137] for more details.

[SQOOP-137] https://issues.cloudera.org/browse/SQOOP-137

Thanks,
Arvind

On Tue, Aug 9, 2011 at 10:05 AM,  <bejoyks@gmail.com> wrote:
> Moving the discussion on apache sqoop mailing list. Please continue it here.
>
> Regards
> Bejoy K S
>
> -----Original Message-----
> From: bejoyks@gmail.com
> Date: Tue, 9 Aug 2011 16:54:44
> To: <sqoop-user@cloudera.org>
> Reply-To: bejoyks@gmail.com
> Subject: Re: [sqoop-user] Re: Sqoop export not having reduce jobs , even table is partitions
>
> Yes Sqoop imports and exports are totally on parallel processing/ map only processes
. No reduce operation required in such scenarios.
>  You are not  doing any sort of aggregated operation while performing imports and exports,
hence reducer do hardly come to play.
> SQOOP with a reduce job, I don't have a clue. Are you looking out for some specific implementation?
If so please share more details.
>
> Regards
> Bejoy K S
>
> -----Original Message-----
> From: Sonal <imsonalkumar@gmail.com>
> Date: Tue, 9 Aug 2011 07:52:55
> To: Sqoop Users<sqoop-user@cloudera.org>
> Reply-To: sqoop-user@cloudera.org
> Subject: [sqoop-user] Re: Sqoop export not having reduce jobs , even table is partitions
>
> Hi,
>
> Thanks for reply.
> So is it sqoop is just parallel processing , even if you have primary
> key/unique index/partition on table?
>
> Is there any case in which sqoop can make use of reduce job.?
> Is there any way we can set the batchsize/fetchsize in sqoop?
>
> Thanks & Regards,
> Sonal Kumar
>
>
> On Aug 9, 7:44 pm, bejo...@gmail.com wrote:
>> Hi Sonal
>>         AFAIK Sqoop import and export jobs kicks of map tasks alone, both are
map only jobs.
>>  In imports the data set to be imported is equally distributed across the mappers
and each mapper is responsible for firing its corresponding  SQL query and fetch data to
hdfs. Here no reduce operation required as it is just  parallel processing(parallel fetching
of data) happening under the hood. Similar case applies for SQOOP export as well, parallel
inserts happening under the hood. For parallel processing just map tasks alone is fine no
reduce operation needed.
>>
>> Regards
>> Bejoy K S
>>
>> -----Original Message-----
>> From: Sonal <imsonalku...@gmail.com>
>> Date: Tue, 9 Aug 2011 04:02:10
>> To: Sqoop Users<sqoop-u...@cloudera.org>
>>
>> Reply-To: sqoop-u...@cloudera.org
>> Subject: [sqoop-user] Sqoop export not having reduce jobs , even table is partitions
>>
>> Hi,
>>
>> I am trying to load the data into db using sqoop export with following
>> command:
>> sqoop export --connect jdbc:oracle:thin:@adc2190481.us.oracle.com:
>> 45773:dooptry --username sh --password sh --export-dir $ORACLE_HOME/
>> work/SALES_input --table SALES_OLH_RANGE -m 4
>>
>> It is able to insert the data , but it is only map jobs
>> 11/08/09 03:57:42 WARN tool.BaseSqoopTool: Setting your password on
>> the command-line is insecure. Consider using -P instead.
>> 11/08/09 03:57:42 INFO tool.CodeGenTool: Beginning code generation
>> 11/08/09 03:57:42 INFO manager.OracleManager: Time zone has been set
>> to GMT
>> 11/08/09 03:57:42 INFO manager.SqlManager: Executing SQL statement:
>> SELECT t.* FROM SALES_OLH_RANGE t
>> 11/08/09 03:57:42 WARN manager.SqlManager: SQLException closing
>> ResultSet: java.sql.SQLException: Could not commit with auto-commit
>> set on
>> 11/08/09 03:57:42 INFO manager.SqlManager: Executing SQL statement:
>> SELECT t.* FROM SALES_OLH_RANGE t
>> 11/08/09 03:57:42 WARN manager.SqlManager: SQLException closing
>> ResultSet: java.sql.SQLException: Could not commit with auto-commit
>> set on
>> 11/08/09 03:57:42 INFO orm.CompilationManager: HADOOP_HOME is /usr/lib/
>> hadoop
>> 11/08/09 03:57:42 INFO orm.CompilationManager: Found hadoop core jar
>> at: /usr/lib/hadoop/hadoop-0.20.2+737-core.jar
>> Note: /net/adc2190481/scratch/sonkumar/view_storage/sonkumar_hadooptry/
>> work/./SALES_OLH_RANGE.java uses or overrides a deprecated API.
>> Note: Recompile with -Xlint:deprecation for details.
>> 11/08/09 03:57:43 INFO orm.CompilationManager: Writing jar file: /tmp/
>> sqoop/compile/SALES_OLH_RANGE.jar
>> 11/08/09 03:57:43 INFO mapreduce.ExportJobBase: Beginning export of
>> SALES_OLH_RANGE
>> 11/08/09 03:57:44 INFO manager.OracleManager: Time zone has been set
>> to GMT
>> 11/08/09 03:57:44 INFO manager.SqlManager: Executing SQL statement:
>> SELECT t.* FROM SALES_OLH_RANGE t
>> 11/08/09 03:57:44 WARN manager.SqlManager: SQLException closing
>> ResultSet: java.sql.SQLException: Could not commit with auto-commit
>> set on
>> 11/08/09 03:57:44 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>> processName=JobTracker, sessionId=
>> 11/08/09 03:57:44 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 11/08/09 03:57:44 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 11/08/09 03:57:44 INFO mapred.JobClient: Running job: job_local_0001
>> 11/08/09 03:57:45 INFO mapred.JobClient:  map 0% reduce 0%
>> 11/08/09 03:57:50 INFO mapred.LocalJobRunner:
>> 11/08/09 03:57:51 INFO mapred.JobClient:  map 24% reduce 0%
>> 11/08/09 03:57:53 INFO mapred.LocalJobRunner:
>> 11/08/09 03:57:54 INFO mapred.JobClient:  map 41% reduce 0%
>> 11/08/09 03:57:56 INFO mapred.LocalJobRunner:
>> 11/08/09 03:57:57 INFO mapred.JobClient:  map 58% reduce 0%
>> 11/08/09 03:57:59 INFO mapred.LocalJobRunner:
>> 11/08/09 03:58:00 INFO mapred.JobClient:  map 75% reduce 0%
>> 11/08/09 03:58:02 INFO mapred.LocalJobRunner:
>> 11/08/09 03:58:02 INFO mapred.JobClient:  map 92% reduce 0%
>> 11/08/09 03:58:03 INFO mapreduce.AutoProgressMapper: Auto-progress
>> thread is finished. keepGoing=false
>> 11/08/09 03:58:03 INFO mapred.Task: Task:attempt_local_0001_m_000000_0
>> is done. And is in the process of commiting
>> 11/08/09 03:58:03 INFO mapred.LocalJobRunner:
>> 11/08/09 03:58:03 INFO mapred.Task: Task
>> 'attempt_local_0001_m_000000_0' done.
>> 11/08/09 03:58:03 WARN mapred.FileOutputCommitter: Output path is null
>> in cleanup
>> 11/08/09 03:58:04 INFO mapred.JobClient:  map 100% reduce 0%
>> 11/08/09 03:58:04 INFO mapred.JobClient: Job complete: job_local_0001
>> 11/08/09 03:58:04 INFO mapred.JobClient: Counters: 6
>> 11/08/09 03:58:04 INFO mapred.JobClient:   FileSystemCounters
>> 11/08/09 03:58:04 INFO mapred.JobClient:     FILE_BYTES_READ=41209592
>> 11/08/09 03:58:04 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=309754
>> 11/08/09 03:58:04 INFO mapred.JobClient:   Map-Reduce Framework
>> 11/08/09 03:58:04 INFO mapred.JobClient:     Map input records=918843
>> 11/08/09 03:58:04 INFO mapred.JobClient:     Spilled Records=0
>> 11/08/09 03:58:04 INFO mapred.JobClient:     SPLIT_RAW_BYTES=154
>> 11/08/09 03:58:04 INFO mapred.JobClient:     Map output records=918843
>> 11/08/09 03:58:04 INFO mapreduce.ExportJobBase: Transferred 0 bytes in
>> 20.3677 seconds (0 bytes/sec)
>> 11/08/09 03:58:04 INFO mapreduce.ExportJobBase: Exported 918843
>> records.
>>
>> why reduce jobs are not coming up? Do i have to pass some other option
>> as well?
>>
>> Quick reply will be appreciated.
>>
>> Thanks & Regards,
>> Sonal Kumar
>>
>> --
>> NOTE: The mailing list sqoop-u...@cloudera.org is deprecated in favor of Apache Sqoop
mailing list sqoop-u...@incubator.apache.org. Please subscribe to it by sending an email to
incubator-sqoop-user-subscr...@apache.org.
>
> --
> NOTE: The mailing list sqoop-user@cloudera.org is deprecated in favor of Apache Sqoop
mailing list sqoop-user@incubator.apache.org. Please subscribe to it by sending an email to
incubator-sqoop-user-subscribe@apache.org.
>

Mime
View raw message