sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allan Ortiz <aor...@g2llc.com>
Subject Re: sqoop import to S3 hits 5 GB limit
Date Mon, 04 Aug 2014 19:11:50 GMT

Great! Thanks for the reply Gwen. I did not know that sqoop2 isn't ready for the prime time
yet. For various reasons, I am going to use the sqoop to HDFS, then copy to S3 option. One
reason is that we are currently doing non-incremental sqoop (so the import time is significant),
and I've observed that the import run-time goes up as the number of mappers exceeds the number
of cores for my source DB. 

Thanks again, 

----- Original Message -----

From: "Gwen Shapira" <gshapira@cloudera.com> 
To: user@sqoop.apache.org 
Sent: Sunday, August 3, 2014 12:07:10 PM 
Subject: Re: sqoop import to S3 hits 5 GB limit 


Sqoop2 is rather experimental and will not solve your problem. 

I'd try to work-around the issue by increasing number of mappers until 
each mapper is writing less than 5GB worth of data. 

If this doesn't work for you, then HDFS->S3 is an option. 


On Thu, Jul 31, 2014 at 2:32 PM, Allan Ortiz <aortiz@g2llc.com> wrote: 
> I am trying to use sqoop 1.4.4 to import data from a mysql DB directly to S3 
> and I am running into an issue where if one of the file splits is larger 
> than 5 GB then the import fails. 
> Details for this question are listed here in my SO post - I promise to 
> follow good cross-posting etiquette :) 
> http://stackoverflow.com/questions/25068747/sqoop-import-to-s3-hits-5-gb-limit 
> One of my main questions is should I be using sqoop 2 rather than sqoop 
> 1.4.4? Also, should I be sqooping to HDFS, then copying the data over to S3 
> for permanent storage? Thanks! 

View raw message