sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chalcy <cha...@gmail.com>
Subject Re: Another sqoop incremental update question
Date Tue, 23 Oct 2012 16:11:19 GMT
Hi Jarec,

I split the questions into 2, actually trying to achieve one objective.

The usecase is not to export back to db.  For huge tables, do one time
pull, then increment append based on modified date to a new table, merge
both so I get the updated rows. I am using left outer join efficiently, but
would like to try sqoop merge, if it is easy as to just give, input,
incremented table and be able to merge.

Also some rows would have been deleted in the database when we do the
incremental update to the hive table.  I should be able to delete the rows.
 The way I handle is to get all the ids(unique id) only from the database
and do another outer join, so the database deleted rows will not be in the
merged hive table.

Thanks, Jarec,

On Tue, Oct 23, 2012 at 11:02 AM, Jarek Jarcec Cecho <jarcec@apache.org>wrote:

> Hi Chalcy,
> I'm afraid that there isn't a way how to achieve deletes from withing
> Sqoop.
> Just a quick question. It seems to me that you're trying to import data to
> HDFS, do some transformations and put the data back to your database (using
> updates, inserts and deletes). In case that I do understand your use case
> correctly, I would propose to truncate the table after your input and use
> simple export to load updated data. I believe that such approach will be
> faster than selective inserts, updates and deletes.
> Jarcec
> On Tue, Oct 23, 2012 at 09:44:04AM -0400, Chalcy wrote:
> > Hello sqoop users,
> >
> > Sqoop incremental append for insert and update works really great.  Is
> > there anyway to handle deletes?  I am planning to do it by left outer
> join
> > but trying to find if there is any other way.
> >
> > Thanks,
> > Chalcy

View raw message