asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Li <>
Subject Re: Issue 868 in asterixdb: About prefix merge policy behavior
Date Mon, 18 May 2015 03:25:28 GMT
A general question is: if the speed of incoming records is higher than
the merge and disk-flush speed, what should we do?  Can we reject the
insertion requests?


On Sun, May 17, 2015 at 5:38 PM, Michael Carey <> wrote:
> Moving this to the incubator dev list (just looking it over again).  Q - if
> when a merge finishes there's a bigger backlog of components - will it
> currently consider doing a more-ways merge?  (Instead of 5, if there are 13
> sitting there when the 5-merge finishes - will a 13-merge be initiated?)
> Just curious.  We do probably need to think about some sort of better flow
> control here - the storage gate should presumably slow down admissions if it
> can't keep up - have to ponder what that might mean.  (I have a better idea
> of what it could mean for feeds than for normal inserts.)  One could argue
> that an increasing backlog is a sign that we should be scaling out the
> number of partitions for the dataset (future work but important work :-)).
> Cheers,
> Mike
> On 4/15/15 2:33 PM, wrote:
> Status: New
> Owner: ----
> Labels: Type-Defect Priority-Medium
> New issue 868 by About prefix merge policy behavior
> I describe how current prefix merge policy works based on the observation
> from ingestion experiments.
> Also, the similar observation was observed by Sattam as well.
> The observed behavior seems a bit unexpected, so I post the observation here
> to consider better merge policy and/or better lsm index design regarding
> merge operations.
> The aqls used for the experiment are shown at the end of this writing.
> Prefix merge policy decides to merge disk components based on the following
> conditions
> 1.  Look at the candidate components for merging in oldest-first order.  If
> one exists, identify the prefix of the sequence of all such components for
> which the sum of their sizes exceeds MaxMergableComponentSize.  Schedule a
> merge of those components into a new component.
> 2.  If a merge from 1 doesn't happen, see if the set of candidate components
> for merging exceeds MaxToleranceComponentCnt.  If so, schedule a merge all
> of the current candidates into a new single component.
> Also, the prefix merge policy doesn't allow concurrent merge operations for
> a single index partition.
> In other words, if there is a scheduled or an on-going merge operation, even
> if the above conditions are met, the merge operation is not scheduled.
> Based on this merge policy, the following situation can occur.
> Suppose MaxToleranceCompCnt = 5 and 5 disk components were flushed to disk.
> When 5th disk component is flushed, the prefix merge policy schedules a
> merge operation to merge the 5 components.
> During the merge operation is scheduled and starts merging, concurrently
> ingested records generates more disk components.
> As long as a merge operation is not fast enough to catch up the speed of
> generating 5 disk components by incoming ingested records,
> the number of disk components increases as time goes.
> So, the slower merge operations are, the more disk components there will be
> as time goes.
> I also attached a result of a command, "ls -alR <directory of the asterixdb
> instance for an ingestion experiment>" which was executed after the
> ingestion is over.
> The attached file shows that for primary index (whose directory is
> FsqCheckinTweet_idx_FsqCheckinTweet), ingestion generated 20 disk
> components, where each disk component consists of btree (the filename has
> suffix _b) and bloom filter (the filename has suffix_f) and
> MaxMergableComponentSize is set to 1GB.
> It also shows that for the secondary index (whose directory is
> FsqCheckinTweet_idx_sifCheckinCoordinate), ingestion generated more than
> 1400 components, where each disk component consist of a dictionary btree
> (suffix: _b), an inverted list (suffix: _i), a deleted-key btree (suffix:
> _d), and a bloom filter for the deleted-key btree (suffix: _f).
> Even if the ingestion was over, since our merge operation happens
> asynchronously, the merge operation continues and eventually merge all
> mergable disk components according to the describe merge policy.
> ------------------------------------------
> AQLs for the ingestion experiment
> ------------------------------------------
> drop dataverse STBench if exists;
> create dataverse STBench;
> use dataverse STBench;
> create type FsqCheckinTweetType as closed {
>     id: int64,
>     user_id: int64,
>     user_followers_count: int64,
>     text: string,
>     datetime: datetime,
>     coordinates: point,
>     url: string?
> }
> create dataset FsqCheckinTweet (FsqCheckinTweetType) primary key id
> /* this index type is only available kisskys/hilbertbtree branch. however,
> you can easily replace sif index to inverted keyword index on the text field
> and you will see similar behavior */
> create index sifCoordinate on FsqCheckinTweet(coordinates) type sif(-180.0,
> -90.0, 180.0, 90.0);
> /* create feed */
> create feed  TweetFeed
> using file_feed
> (("fs"="localfs"),
> ("path"=""),("format"="adm"),("type-name"="FsqCheckinTweetType"),("tuple-interval"="0"));
> /* connect feed */
> use dataverse STBench;
> set wait-for-completion-feed "true";
> connect feed TweetFeed to dataset FsqCheckinTweet;
> Attachments:
>     storage-layout.txt  574 KB
> --
> You received this message because you are subscribed to the Google Groups
> "asterixdb-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
> For more options, visit

View raw message