sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allan Ortiz <aor...@g2llc.com>
Subject Re: sqoop import to S3 hits 5 GB limit
Date Tue, 05 Aug 2014 03:59:59 GMT
Hmm, Sean, this sounds off topic :). I don't use python for an my hadoop related work, but
on occasion I end up doing some minor scripting stuff with it. 

----- Original Message -----

From: "Sean Franks" <seancfranks@gmail.com> 
To: user@sqoop.apache.org 
Sent: Monday, August 4, 2014 6:38:15 PM 
Subject: RE: sqoop import to S3 hits 5 GB limit 


Do you program - by any chance - in Python? 

- Sean 

Sean Franks | (212) 284-8787 | (908) 310-4200 
“With the addition of a well structured Big Data ecosystem to the Data Highway of an enterprise,
Business Intelligence analytics will take a quantum leap forward.” 

-----Original Message----- 
From: Gwen Shapira [mailto:gshapira@cloudera.com] 
Sent: Monday, August 04, 2014 3:39 PM 
To: user@sqoop.apache.org 
Subject: Re: sqoop import to S3 hits 5 GB limit 

Just for completeness - I often configure Sqoop with high number of mappers (so if a tasktracker
fails it won't lose huge amounts of work) and then use the fair-scheduler to limit the number
of concurrent mappers to something reasonable for the DB. 

On Mon, Aug 4, 2014 at 12:11 PM, Allan Ortiz <aortiz@g2llc.com> wrote: 
> Great! Thanks for the reply Gwen. I did not know that sqoop2 isn't 
> ready for the prime time yet. For various reasons, I am going to use 
> the sqoop to HDFS, then copy to S3 option. One reason is that we are 
> currently doing non-incremental sqoop (so the import time is 
> significant), and I've observed that the import run-time goes up as 
> the number of mappers exceeds the number of cores for my source DB. 
> Thanks again, 
> Allan 
> ________________________________ 
> From: "Gwen Shapira" <gshapira@cloudera.com> 
> To: user@sqoop.apache.org 
> Sent: Sunday, August 3, 2014 12:07:10 PM 
> Subject: Re: sqoop import to S3 hits 5 GB limit 
> Hi, 
> Sqoop2 is rather experimental and will not solve your problem. 
> I'd try to work-around the issue by increasing number of mappers until 
> each mapper is writing less than 5GB worth of data. 
> If this doesn't work for you, then HDFS->S3 is an option. 
> Gwen 
> On Thu, Jul 31, 2014 at 2:32 PM, Allan Ortiz <aortiz@g2llc.com> wrote: 
>> I am trying to use sqoop 1.4.4 to import data from a mysql DB 
>> directly to 
>> S3 
>> and I am running into an issue where if one of the file splits is 
>> larger than 5 GB then the import fails. 
>> Details for this question are listed here in my SO post - I promise 
>> to follow good cross-posting etiquette :) 
>> http://stackoverflow.com/questions/25068747/sqoop-import-to-s3-hits-5 
>> -gb-limit 
>> One of my main questions is should I be using sqoop 2 rather than 
>> sqoop 1.4.4? Also, should I be sqooping to HDFS, then copying the 
>> data over to 
>> S3 
>> for permanent storage? Thanks! 

View raw message