hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vignesh <vignesh...@gmail.com>
Subject RE: How spark writes to HBASE
Date Mon, 22 Jan 2018 16:57:12 GMT
So in cases like spark the write to memstore will be via network if the
executor runs on different machine than the region server machine which is
responsible to take the puts?

On Jan 22, 2018 22:24, "Dave Birdsall" <dave.birdsall@esgyn.com> wrote:

> There are some engines that will do this. Apache Trafodion for example
> will hash partition results to be inserted into a table in HBase so that
> the puts are done locally.
>
> -----Original Message-----
> From: Ted Yu [mailto:yuzhihong@gmail.com]
> Sent: Monday, January 22, 2018 8:47 AM
> To: user@hbase.apache.org
> Subject: Re: How spark writes to HBASE
>
> Which connector do you use to perform the write ?
>
> bq. Or spark will wisely launch an executor on that machine
>
> I don't think that is the case. Multiple writes may be performed which
> would end up on different region servers. Spark won't provide the affinity
> described above.
>
> On Mon, Jan 22, 2018 at 7:19 AM, vignesh <vignesh093@gmail.com> wrote:
>
> > Hi,
> >
> > I have a Spark job which reads some timeseries data and pushes that to
> > HBASE using HBASE client API. I am executing this Spark job on a 10
> > node cluster. Say at first when spark kicks off it picks
> > machine1,machine2,machine3 as its executors. Now when the job inserts
> > a row to HBASE. Below is what my undersatnding on what it does.
> >
> > Based on the row key a particular region(from the META) would be
> > chosen and that row will be pushed to that RegionServer's memstore and
> > WAL and once the memestore is full it would be flushed to the disk.Now
> > if assume a particular row is being processed by a executor on
> > machine2 and the regionserver which handles that region to which the
> > put is to be made is on machine6. Will the data be transferred from
> > machine2 to machine6 over network and then the data will be stored in
> > memstore of machine6. Or spark will wisely launch an executor on that
> > machine during write(if the dynamic allocation is turned on) and
> > pushes to it?
> >
> >
> > --
> > I.VIGNESH
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message