asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Carey <>
Subject Re: Issue 868 in asterixdb: About prefix merge policy behavior
Date Mon, 18 May 2015 00:38:05 GMT
Moving this to the incubator dev list (just looking it over again).  Q - 
if when a merge finishes there's a bigger backlog of components - will 
it currently consider doing a more-ways merge?  (Instead of 5, if there 
are 13 sitting there when the 5-merge finishes - will a 13-merge be 
initiated?)  Just curious. We do probably need to think about some sort 
of better flow control here - the storage gate should presumably slow 
down admissions if it can't keep up - have to ponder what that might 
mean.  (I have a better idea of what it could mean for feeds than for 
normal inserts.)  One could argue that an increasing backlog is a sign 
that we should be scaling out the number of partitions for the dataset 
(future work but important work :-)).


On 4/15/15 2:33 PM, wrote:
> Status: New
> Owner: ----
> Labels: Type-Defect Priority-Medium
> New issue 868 by About prefix merge policy behavior
> I describe how current prefix merge policy works based on the 
> observation from ingestion experiments.
> Also, the similar observation was observed by Sattam as well.
> The observed behavior seems a bit unexpected, so I post the 
> observation here to consider better merge policy and/or better lsm 
> index design regarding merge operations.
> The aqls used for the experiment are shown at the end of this writing.
> Prefix merge policy decides to merge disk components based on the 
> following conditions
> 1.  Look at the candidate components for merging in oldest-first 
> order.  If one exists, identify the prefix of the sequence of all such 
> components for which the sum of their sizes exceeds 
> MaxMergableComponentSize.  Schedule a merge of those components into a 
> new component.
> 2.  If a merge from 1 doesn't happen, see if the set of candidate 
> components for merging exceeds MaxToleranceComponentCnt.  If so, 
> schedule a merge all of the current candidates into a new single 
> component.
> Also, the prefix merge policy doesn't allow concurrent merge 
> operations for a single index partition.
> In other words, if there is a scheduled or an on-going merge 
> operation, even if the above conditions are met, the merge operation 
> is not scheduled.
> Based on this merge policy, the following situation can occur.
> Suppose MaxToleranceCompCnt = 5 and 5 disk components were flushed to 
> disk.
> When 5th disk component is flushed, the prefix merge policy schedules 
> a merge operation to merge the 5 components.
> During the merge operation is scheduled and starts merging, 
> concurrently ingested records generates more disk components.
> As long as a merge operation is not fast enough to catch up the speed 
> of generating 5 disk components by incoming ingested records,
> the number of disk components increases as time goes.
> So, the slower merge operations are, the more disk components there 
> will be as time goes.
> I also attached a result of a command, "ls -alR <directory of the 
> asterixdb instance for an ingestion experiment>" which was executed 
> after the ingestion is over.
> The attached file shows that for primary index (whose directory is 
> FsqCheckinTweet_idx_FsqCheckinTweet), ingestion generated 20 disk 
> components, where each disk component consists of btree (the filename 
> has suffix _b) and bloom filter (the filename has suffix_f) and 
> MaxMergableComponentSize is set to 1GB.
> It also shows that for the secondary index (whose directory is 
> FsqCheckinTweet_idx_sifCheckinCoordinate), ingestion generated more 
> than 1400 components, where each disk component consist of a 
> dictionary btree (suffix: _b), an inverted list (suffix: _i), a 
> deleted-key btree (suffix: _d), and a bloom filter for the deleted-key 
> btree (suffix: _f).
> Even if the ingestion was over, since our merge operation happens 
> asynchronously, the merge operation continues and eventually merge all 
> mergable disk components according to the describe merge policy.
> ------------------------------------------
> AQLs for the ingestion experiment
> ------------------------------------------
> drop dataverse STBench if exists;
> create dataverse STBench;
> use dataverse STBench;
> create type FsqCheckinTweetType as closed {
>     id: int64,
>     user_id: int64,
>     user_followers_count: int64,
>     text: string,
>     datetime: datetime,
>     coordinates: point,
>     url: string?
> }
> create dataset FsqCheckinTweet (FsqCheckinTweetType) primary key id
> /* this index type is only available kisskys/hilbertbtree branch. 
> however, you can easily replace sif index to inverted keyword index on 
> the text field and you will see similar behavior */
> create index sifCoordinate on FsqCheckinTweet(coordinates) type 
> sif(-180.0, -90.0, 180.0, 90.0);
> /* create feed */
> create feed  TweetFeed
> using file_feed
> (("fs"="localfs"),
> ("path"=""),("format"="adm"),("type-name"="FsqCheckinTweetType"),("tuple-interval"="0"));

> /* connect feed */
> use dataverse STBench;
> set wait-for-completion-feed "true";
> connect feed TweetFeed to dataset FsqCheckinTweet;
> Attachments:
>     storage-layout.txt  574 KB

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message