tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergio Esteves <sroeste...@gmail.com>
Subject Re: Hybrid Hash Join
Date Tue, 02 Jul 2013 04:14:18 GMT
Hi Jihoon,

HHJ would perform optimally if there are no overflowing buckets.

After the implementation, I was thinking of taking some measures (e.g.,
processing time, number of I/O operations) and comparing the performance of
HHJ with other join operators already implemented in TAJO.

If bucket overflow happens in the solution I described, the performance of
HHJ could be worse than the performance of other join algorithms due to the
extra I/O cost of rescanning. However, we could dispatch the overflowing
join tuples to another join operator. Some papers in the literature follow
such approach, e.g., the original HHJ paper suggests to attack overflowing
buckets (due to duplicated keys) with nested loops join.

Thanks.

On Tue, Jul 2, 2013 at 3:05 AM, Jihoon Son <ghoonson@gmail.com> wrote:

> Hi, Sergio
>
> I wonder the performance of your idea.
>
> Do you have any experiment plans?
>
> Thanks,
> Jihoon
>
> 2013/6/28 Sergio Esteves <sroesteves@gmail.com>
>
> > OK, since there are two different problems that can be separated, I
> decided
> > for now to put my efforts on the hybrid hash join operator.
> >
> > Doing so, I started a draft with some design choices (mostly for the
> > partition phase of the algorithm, since it is the most challenging part):
> > https://www.dropbox.com/s/menm6vb2hv3jv0n/HybridHashJoin_v0.pdf
> >
> > I checked some related work and did not find any better solution to deal
> > with overflowed buckets caused by duplicated join keys.
> >
> > Also, would it be possible to get histograms (like the ones described in
> > the document), value bounded or with hash ranges, through the stats
> > retrieved by the running subquery?
> >
> > Feedback regarding the solution described in Section 2.1 is appreciated.
> >
> > Thanks/Regards.
> >
> > On Tue, Jun 18, 2013 at 8:00 AM, Hyunsik Choi <hyunsik@apache.org>
> wrote:
> >
> > > Sergio,
> > >
> > > That's good idea. That is necessary work. However, now it would be
> better
> > > to adopt some of your approaches to the distributed plan part which
> has a
> > > global view of relations and running tasks.
> > >
> > > Actually, I've been also concerned with hash partition on intermediate
> > data
> > > between execution blocks using histogram. I believe that Tajo can
> handle
> > > most of skewed problems if the intermediate are evenly distributed
> across
> > > cluster nodes.
> > >
> > > Probably, the straightforward solution is as follows. In the current
> > > implementation, a running subquery collects some statistics from
> running
> > > query units (tasks). The statistics of each task includes hash keys on
> > > specified columns and the number of bytes for each key. By using the
> > > statistics, Tajo could build evenly distributed sets of hash keys. They
> > > could be used to create join tasks with evenly distributed loads; the
> > > creation of even load tasks is not implemented yet and is the future
> > plan.
> > > Like one of your approaches, we could store the statistics of joining
> > > tables into Tajo catalog. The statistics also would be reused for later
> > > query processing.
> > >
> > > Best regards,
> > > Hyunsik
> > >
> > >
> > >
> > > On Tue, Jun 18, 2013 at 1:17 PM, Sergio Esteves <sroesteves@gmail.com
> > > >wrote:
> > >
> > > > Hi,
> > > >
> > > > I have been thinking of the partition phase in hybrid hash join and
> > would
> > > > like to discuss it with the rest of the community.
> > > >
> > > > During the partition of data into buckets, a very important issue is
> > how
> > > to
> > > > deal with skewed buckets (many tuples with a certain same join key)
> > that
> > > > may cause hash table overflow, i.e., having partitions that do not
> fit
> > in
> > > > memory.
> > > >
> > > > It is hard to divide the hash space into buckets with different sizes
> > > > without knowing the join key distribution beforehand. Therefore, one
> > > > possible way is to partition the overflow buckets 1 level further
> (this
> > > > process may go on onto deeper levels). Other way that is sometimes
> > > > followed, is to do very small partitions of the data and then combine
> > > them
> > > > in accordance with their sizes (which can also fail if the number of
> > > > duplicates inside a bucket is too high).
> > > >
> > > > I have been discussing with my GSoC mentor, and we think the best and
> > > more
> > > > efficient solution would be to have an histogram of the distribution
> of
> > > the
> > > > join keys, so that we can partition more evenly the hash space
> through
> > > the
> > > > buckets. Such histogram can be mantained for every table that can be
> > used
> > > > with a join (i.e., that have a foreign key or that can be used as
> > foreign
> > > > key to other table). Hence, it is necessary to update the histograms
> > > every
> > > > time data is updated/deleted from the respective tables. Other way we
> > > were
> > > > discussing was to build the histogram on the first time hybrid hash
> > join
> > > > physical operator is called with a certain join query (i.e., during
> > query
> > > > execution). On the second time the same query is submitted, or a
> > similar
> > > > query containing the same inner relation, the histogram was already
> > there
> > > > and thus hybrid hash join could be executed more efficiently. This
> > latter
> > > > approach has the drawback of ignoring changes between consecutive
> join
> > > > calls. On the other hand, maintaining a histogram for every table and
> > > > updating it every time changes occur can be costly and introduce some
> > > > overhead.
> > > >
> > > > What do you guys think?
> > > >
> > > > Thanks.
> > > >
> > >
> >
>
>
>
> --
> Jihoon Son
>
> Database & Information Systems Group,
> Prof. Yon Dohn Chung Lab.
> Dept. of Computer Science & Engineering,
> Korea University
> 1, 5-ga, Anam-dong, Seongbuk-gu,
> Seoul, 136-713, Republic of Korea
>
> Tel : +82-2-3290-3580
> E-mail : jihoonson@korea.ac.kr
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message