sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gwen Shapira <gshap...@cloudera.com>
Subject Re: sqoop import to S3 hits 5 GB limit
Date Mon, 04 Aug 2014 19:39:09 GMT
Just for completeness - I often configure Sqoop with high number of
mappers (so if a tasktracker fails it won't lose huge amounts of work)
and then use the fair-scheduler to limit the number of concurrent
mappers to something reasonable for the DB.

On Mon, Aug 4, 2014 at 12:11 PM, Allan Ortiz <aortiz@g2llc.com> wrote:
> Great!  Thanks for the reply Gwen.  I did not know that sqoop2 isn't ready
> for the prime time yet.  For various reasons, I am going to use the sqoop to
> HDFS, then copy to S3 option.  One reason is that we are currently doing
> non-incremental sqoop (so the import time is significant), and I've observed
> that the import run-time goes up as the number of mappers exceeds the number
> of cores for my source DB.
> Thanks again,
> Allan
> ________________________________
> From: "Gwen Shapira" <gshapira@cloudera.com>
> To: user@sqoop.apache.org
> Sent: Sunday, August 3, 2014 12:07:10 PM
> Subject: Re: sqoop import to S3 hits 5 GB limit
> Hi,
> Sqoop2 is rather experimental and will not solve your problem.
> I'd try to work-around the issue by increasing number of mappers until
> each mapper is writing less than 5GB worth of data.
> If this doesn't work for you, then HDFS->S3 is an option.
> Gwen
> On Thu, Jul 31, 2014 at 2:32 PM, Allan Ortiz <aortiz@g2llc.com> wrote:
>> I am trying to use sqoop 1.4.4 to import data from a mysql DB directly to
>> S3
>> and I am running into an issue where if one of the file splits is larger
>> than 5 GB then the import fails.
>> Details for this question are listed here in my SO post - I promise to
>> follow good cross-posting etiquette :)
>> http://stackoverflow.com/questions/25068747/sqoop-import-to-s3-hits-5-gb-limit
>> One of my main questions is should I be using sqoop 2 rather than sqoop
>> 1.4.4?  Also, should I be sqooping to HDFS, then copying the data over to
>> S3
>> for permanent storage?  Thanks!

View raw message