drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@dremio.com>
Subject Re: [DISCUSS] Ideas to improve metadata cache read performance
Date Mon, 26 Oct 2015 21:42:45 GMT
My first thought is we've gotten too generous in what we're storing in the
Parquet metadata file. Early implementations were very lean and it seems
far larger today. For example, early implementations didn't keep statistics
and ignored row groups (files, schema and block locations only). If we need
multiple levels of information, we may want to stagger (or normalize) them
in the file. Also, we may think about what is the minimum that must be done
in planning. We could do the file pruning at execution time rather than
single-tracking these things (makes stats harder though).

I also think we should be cautious around jumping to a conclusion until
DRILL-3973 provides more insight.

In terms of caching, I'd be more inclined to rely on file system caching
and make sure serialization/deserialization is as efficient as possible as
opposed to implementing an application-level cache. (We already have enough
problems managing memory without having to figure out when we should drop a
metadata cache :D).

Aside, I always liked this post for entertainment and the thoughts on
virtual memory: https://www.varnish-cache.org/trac/wiki/ArchitectNotes


--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Mon, Oct 26, 2015 at 2:25 PM, Hanifi Gunes <hgunes@maprtech.com> wrote:

> One more thing, for workloads running queries over subsets of same parquet
> files, we can consider maintaining an in-memory cache as well. Assuming
> metadata memory footprint per file is low and parquet files are static, not
> needing us to invalidate the cache often.
>
> H+
>
> On Mon, Oct 26, 2015 at 2:10 PM, Hanifi Gunes <hgunes@maprtech.com> wrote:
>
> > I am not familiar with the contents of metadata stored but if
> > deserialization workload seems to be fitting to any of afterburner's
> > claimed improvement points [1] It could well be worth trying given the
> > claimed gain on throughput is substantial.
> >
> > It could also be a good idea to partition caching over a number of files
> > for better parallelization given number of cache files generated is
> > *significantly* less than number of parquet files. Maintaining global
> > statistics seems an improvement point too.
> >
> >
> > -H+
> >
> > 1:
> >
> https://github.com/FasterXML/jackson-module-afterburner#what-is-optimized
> >
> > On Sun, Oct 25, 2015 at 9:33 AM, Aman Sinha <amansinha@apache.org>
> wrote:
> >
> >> Forgot to include the link for Jackson's AfterBurner module:
> >>   https://github.com/FasterXML/jackson-module-afterburner
> >>
> >> On Sun, Oct 25, 2015 at 9:28 AM, Aman Sinha <amansinha@apache.org>
> wrote:
> >>
> >> > I was going to file an enhancement JIRA but thought I will discuss
> here
> >> > first:
> >> >
> >> > The parquet metadata cache file is a JSON file that contains a subset
> of
> >> > the metadata extracted from the parquet files.  The cache file can get
> >> > really large .. a few GBs for a few hundred thousand files.
> >> > I have filed a separate JIRA: DRILL-3973 for profiling the various
> >> aspects
> >> > of planning including metadata operations.  In the meantime, the
> >> timestamps
> >> > in the drillbit.log output indicate a large chunk of time spent in
> >> creating
> >> > the drill table to begin with, which indicates bottleneck in reading
> the
> >> > metadata.  (I can provide performance numbers later once we confirm
> >> through
> >> > profiling).
> >> >
> >> > A few thoughts around improvements:
> >> >  - The jackson deserialization of the JSON file is very slow.. can
> this
> >> be
> >> > speeded up ? .. for instance the AfterBurner module of jackson claims
> to
> >> > improve performance by 30-40% by avoiding the use of reflection.
> >> >  - The cache file read is a single threaded process.  If we were
> >> directly
> >> > reading from parquet files, we use a default of 16 threads.  What can
> be
> >> > done to parallelize the read ?
> >> >  - Any operation that can be done one time during the REFRESH METADATA
> >> > command ?  for instance..examining the min/max values to determine
> >> > single-value for partition column could be eliminated if we do this
> >> > computation during REFRESH METADATA command and store the summary one
> >> time.
> >> >
> >> >  - A pertinent question is: should the cache file be stored in a more
> >> > efficient format such as Parquet instead of JSON ?
> >> >
> >> > Aman
> >> >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message