hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Stepachev <oct...@gmail.com>
Subject Re: Loading hbase from parquet files
Date Wed, 08 Oct 2014 18:49:08 GMT
For that use case I'd prefer to write new filtered HFiles with map reduce
and then import those data into hbase using bulk import. Keep in mind, that
incremental load tool moves files, not copies them. So once written you
will not do any additional writes (except for those regions which was split
while you filtering data). If importing data is small that would not be a
problem.

On Wed, Oct 8, 2014 at 8:45 PM, Nishanth S <nishanth.2884@gmail.com> wrote:

> Thanks Andrey.In the current system  the hbase cfs have a ttl of  30 days
> and data gets deleted after this(has snappy compression).Below is something
> what I am trying to acheive.
>
> 1.Export the data from hbase  table  before it gets deleted.
> 2.Store it  in some format  which supports maximum compression(storage cost
> is my primary concern here),so looking at parquet.
> 3.Load a subset of this data back into hbase based on  certain rules(say i
> want  to load all rows which has a particular string in one of the fields).
>
>
> I was thinking of bulkloading this data back into hbase but I am not sure
> how I can  load a subset of the data using
> org.apache.hadoop.hbase.mapreduce.Driver
> import.
>
>
>
>
>
>
> On Wed, Oct 8, 2014 at 10:20 AM, Andrey Stepachev <octo47@gmail.com>
> wrote:
>
> > Hi Nishanth.
> >
> > Not clear what exactly you are building.
> > Can you share more detailed description of what you are building, how
> > parquet files are supposed to be ingested.
> > Some questions arise:
> > 1. is that online import or bulk load
> > 2. why rules need to be deployed to cluster. Do you suppose to do reading
> > inside hbase region server?
> >
> > As for deploying filters your cat try to use coprocessors instead. They
> can
> > be configurable and loadable (but not
> > unloadable, so you need to think about some class loading magic like
> > ClassWorlds)
> > For bulk imports you can create HFiles directly and add them
> incrementally:
> > http://hbase.apache.org/book/arch.bulk.load.html
> >
> > On Wed, Oct 8, 2014 at 8:13 PM, Nishanth S <nishanth.2884@gmail.com>
> > wrote:
> >
> > > I was thinking of using org.apache.hadoop.hbase.mapreduce.Driver
> import.
> > I
> > > could see that we can pass in filters  to this utility but looks less
> > > flexible since  you need to deploy a new filter every time  the rules
> for
> > > processing records change.Is there some way that we could define a
> rules
> > > engine?
> > >
> > >
> > > Thanks,
> > > -Nishan
> > >
> > > On Wed, Oct 8, 2014 at 9:50 AM, Nishanth S <nishanth.2884@gmail.com>
> > > wrote:
> > >
> > > > Hey folks,
> > > >
> > > > I am evaluating on loading  an  hbase table from parquet files based
> on
> > > > some rules that  would be applied on  parquet file records.Could some
> > one
> > > > help me on what would be the best way to do this?.
> > > >
> > > >
> > > > Thanks,
> > > > Nishan
> > > >
> > >
> >
> >
> >
> > --
> > Andrey.
> >
>



-- 
Andrey.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message