sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tanzir Musabbir <tmusab...@outlook.com>
Subject RE: Using Sqoop incremental import as chunk
Date Thu, 09 May 2013 15:31:33 GMT
Sure Jarcec,Actually we would like to import data(from oracle) 4-5 times in a day and then
process(analytics) them pretty much same number of time. Each of the chunk may have around
10 millions record. The records are continuously added in that table and sometimes for a given
time frame, it may cross 10 M. So in that case we will not import all of the records, instead
we will import only 10 M records. That's why we are trying to import them as a chunk.
Tanzir

> Date: Wed, 8 May 2013 11:23:25 -0700
> From: jarcec@apache.org
> To: user@sqoop.apache.org
> Subject: Re: Using Sqoop incremental import as chunk
> 
> Hi Tanzir,
> would you mind describing a bit more about your use case? Is there a reason why you do
not want your Oozie job to import all missing data?
> 
> Jarcec
> 
> On Thu, May 09, 2013 at 12:17:03AM +0600, Tanzir Musabbir wrote:
> > Thanks a lot Felix & Jarcec. So it looks like, if I am running a Oozie coordinator
job which periodically imports chunk data through Sqoop, before calling the Sqoop action I
need to change the boundary query value every time. Like
> > --boundary-query 'select 1,20' - for the 1st run--boundary-query 'select 21,40'
- for the 2nd run
> > Please correct me if I'm wrong. Thanks again.
> > 
> > > Date: Wed, 8 May 2013 11:08:05 -0700
> > > From: jarcec@apache.org
> > > To: user@sqoop.apache.org
> > > Subject: Re: Using Sqoop incremental import as chunk
> > > 
> > > Hi Tanzir,
> > > incremental import is not working in chunks, it always imports everything since
last import - e.g. everything from --last-value up. You can simulate the chunks if needed
using --boundary-query argument as was advised by Felix.
> > > 
> > > Jarcec
> > > 
> > > On Wed, May 08, 2013 at 01:46:47PM -0400, Felix GV wrote:
> > > > --boundary-query
> > > > 
> > > > http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_connecting_to_a_database_server
> > > > 
> > > > --
> > > > Felix
> > > > 
> > > > 
> > > > On Wed, May 8, 2013 at 1:00 PM, Tanzir Musabbir <tmusabbir@outlook.com>wrote:
> > > > 
> > > > >  Hello everyone,
> > > > >
> > > > > Is it really possible to import chunk-wise data through sqoop incremental
> > > > > import?
> > > > >
> > > > > Say I have a table with id 1,2,3..... N (here N is 100) and now I
want to
> > > > > import it as chunk. Like
> > > > > 1st import: 1,2,3.... 20
> > > > > 2nd import: 21,22,23.....40
> > > > > last import: 81,82,83....100
> > > > >
> > > > > I have read about the Sqoop job with incremental import and also
know the
> > > > > --last-value parameter but do not know how to pass the chunk size.
For the
> > > > > above example, chunk size here is 20.
> > > > >
> > > > >
> > > > > Any information will be highly appreciated. Thanks in advance.
> > > > >
> >  		 	   		  
 		 	   		  
Mime
View raw message