asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Young-Seok Kim <>
Subject Re: Issue 868 in asterixdb: About prefix merge policy behavior
Date Mon, 18 May 2015 04:20:29 GMT
It will merge 13 components instead of 5.


On Sun, May 17, 2015 at 5:38 PM, Michael Carey <> wrote:

>  Moving this to the incubator dev list (just looking it over again).  Q -
> if when a merge finishes there's a bigger backlog of components - will it
> currently consider doing a more-ways merge?  (Instead of 5, if there are 13
> sitting there when the 5-merge finishes - will a 13-merge be initiated?)
> Just curious.  We do probably need to think about some sort of better flow
> control here - the storage gate should presumably slow down admissions if
> it can't keep up - have to ponder what that might mean.  (I have a better
> idea of what it could mean for feeds than for normal inserts.)  One could
> argue that an increasing backlog is a sign that we should be scaling out
> the number of partitions for the dataset (future work but important work
> :-)).
> Cheers,
> Mike
> On 4/15/15 2:33 PM, wrote:
> Status: New
> Owner: ----
> Labels: Type-Defect Priority-Medium
> New issue 868 by About prefix merge policy behavior
> I describe how current prefix merge policy works based on the observation
> from ingestion experiments.
> Also, the similar observation was observed by Sattam as well.
> The observed behavior seems a bit unexpected, so I post the observation
> here to consider better merge policy and/or better lsm index design
> regarding merge operations.
> The aqls used for the experiment are shown at the end of this writing.
> Prefix merge policy decides to merge disk components based on the
> following conditions
> 1.  Look at the candidate components for merging in oldest-first order.
> If one exists, identify the prefix of the sequence of all such components
> for which the sum of their sizes exceeds MaxMergableComponentSize.
> Schedule a merge of those components into a new component.
> 2.  If a merge from 1 doesn't happen, see if the set of candidate
> components for merging exceeds MaxToleranceComponentCnt.  If so, schedule a
> merge all of the current candidates into a new single component.
> Also, the prefix merge policy doesn't allow concurrent merge operations
> for a single index partition.
> In other words, if there is a scheduled or an on-going merge operation,
> even if the above conditions are met, the merge operation is not scheduled.
> Based on this merge policy, the following situation can occur.
> Suppose MaxToleranceCompCnt = 5 and 5 disk components were flushed to
> disk.
> When 5th disk component is flushed, the prefix merge policy schedules a
> merge operation to merge the 5 components.
> During the merge operation is scheduled and starts merging, concurrently
> ingested records generates more disk components.
> As long as a merge operation is not fast enough to catch up the speed of
> generating 5 disk components by incoming ingested records,
> the number of disk components increases as time goes.
> So, the slower merge operations are, the more disk components there will
> be as time goes.
> I also attached a result of a command, "ls -alR <directory of the
> asterixdb instance for an ingestion experiment>" which was executed after
> the ingestion is over.
> The attached file shows that for primary index (whose directory is
> FsqCheckinTweet_idx_FsqCheckinTweet), ingestion generated 20 disk
> components, where each disk component consists of btree (the filename has
> suffix _b) and bloom filter (the filename has suffix_f) and
> MaxMergableComponentSize is set to 1GB.
> It also shows that for the secondary index (whose directory is
> FsqCheckinTweet_idx_sifCheckinCoordinate), ingestion generated more than
> 1400 components, where each disk component consist of a dictionary btree
> (suffix: _b), an inverted list (suffix: _i), a deleted-key btree (suffix:
> _d), and a bloom filter for the deleted-key btree (suffix: _f).
> Even if the ingestion was over, since our merge operation happens
> asynchronously, the merge operation continues and eventually merge all
> mergable disk components according to the describe merge policy.
> ------------------------------------------
> AQLs for the ingestion experiment
> ------------------------------------------
> drop dataverse STBench if exists;
> create dataverse STBench;
> use dataverse STBench;
> create type FsqCheckinTweetType as closed {
>     id: int64,
>     user_id: int64,
>     user_followers_count: int64,
>     text: string,
>     datetime: datetime,
>     coordinates: point,
>     url: string?
> }
> create dataset FsqCheckinTweet (FsqCheckinTweetType) primary key id
> /* this index type is only available kisskys/hilbertbtree branch. however,
> you can easily replace sif index to inverted keyword index on the text
> field and you will see similar behavior */
> create index sifCoordinate on FsqCheckinTweet(coordinates) type
> sif(-180.0, -90.0, 180.0, 90.0);
> /* create feed */
> create feed  TweetFeed
> using file_feed
> (("fs"="localfs"),
> ("path"=""),("format"="adm"),("type-name"="FsqCheckinTweetType"),("tuple-interval"="0"));
> /* connect feed */
> use dataverse STBench;
> set wait-for-completion-feed "true";
> connect feed TweetFeed to dataset FsqCheckinTweet;
> Attachments:
>     storage-layout.txt  574 KB
>  --
> You received this message because you are subscribed to the Google Groups
> "asterixdb-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
> For more options, visit

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message