spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From amit kumar singh <amitiem...@gmail.com>
Subject Re: optimize hive query to move a subset of data from one partition table to another table
Date Mon, 12 Feb 2018 00:55:49 GMT
Hi

create table emp as select * from emp_full where join_date
>=date_sub(join_date,2)

i am trying to select from one table insert into another table

i need a way to do select last 2 month of data everytime

table is partitioned on year month day

On Sun, Feb 11, 2018 at 4:30 PM, Richard Qiao <richardqiao2000@gmail.com>
wrote:

> Would you mind share your code with us to analyze?
>
> > On Feb 10, 2018, at 10:18 AM, amit kumar singh <amitiemcal@gmail.com>
> wrote:
> >
> > Hi Team,
> >
> > We have hive external  table which has 50 tb of data partitioned on year
> month day
> >
> > i want to move last 2 month of data into another table
> >
> > when i try to do this through spark ,more than 120k task are getting
> created
> >
> > what is the best way to do this
> >
> > thanks
> > Rohit
>
>

Mime
View raw message