directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel L├ęcharny <>
Subject Cache : there is some room for improvement...
Date Tue, 03 Dec 2013 06:48:46 GMT
Hi !

last numbers I got are quite interesting, now that we are corectly
leveraging the cache (alias cache, ParentIdAndRdn cache aka PIAR cache,
entry cache). Still, the way we configure and initialize the cache is
far from being perfect. I'll summarize some findings I gathered during
those last weeks here.

1) Cache is critical to performances.
When we process a search, there are many areas where we access the
backend (be it JDBM or Mavibot) and we would gain for not doing so. By
adding a cache for Aliases and ParentIdAndRdn, I was able to get a 25%
speed improvement (assuming the cache is hit everytime). The very same
for the entry cache : having all the entries loaded into the cache is a
major factor of speed.

So we need a big entry, aliases and ParentIdAndRdn cache, that's for sure.

2) The cache configuration is not perfect.
I discovered that the entry cache was initialized with a value of 1
entry being cached... Obviously, it's a bit tight. But the pb is that
whatever configuration you set, it won't change !
So I fixed that (the ugly way).
The real problem is that the cache configuration and initialization is a
mess... We use a CacheService class (good thing !) which is not
initialized in some tests, so I had to check if the cache is not null
before using it in many parts of the code. This has to be fixed. The
various caches (aliases, entry, PIAR aren't all initialized into to
AbstractBTreePartition, for instance).

We also have various cache configurations :
- partition cache
- index cache

This is not clear what parameter is used for which cache. We have to get
this fixed.

3) Backend cache and ApacheDS cache
The backend cache and teh ADS cache are two different things. In
Mavibot, we cache Pages. In JDBM, we also cache Pages. In ADS, we cache
entries, aliases, etc. Atm, the configuration makes it not clear which
cache is being set (although the index cacheSize parameter is only used
to set the backend cache size).

The thing is that in JDBM, each single index can have its own cache,
when the cache is global in Mavibot. In other words, we can't really
assume that configuring the backend cache is something generic.

Otherwise, we are using EhCache, and a dedicated configuration file for
it. It would be good not to have to manipulate this file at all, and
have the cache configuration all in ADS config.

Well, there is some room for improvement in this area

4) Which cache should we favor ?
Backend page cache is useless if the ADS cache are loaded, except if we
are using indexes. That means we need both. The thing is that what is
expensive when brosing a BTree is not only to fetch pages from the disk,
but also to deserialize them. It would be good to keep the index pages
in memory (as we don't have any cache at the ADS level for indexes) and
not to cache the MasterTable (as we have an EntryCache) nor the RdnIndex
(for the same reason : we already have the PIAR index). This requires
some information to be propagated to the backend cache (do *not* cache
this BTree, do cache this one...).

There is room for improvement here.

5) What if we have enough memory ?
90% of the raw search time is caused by the entry cloning. We *have* to
avoid cloning the entry if we want to get better performances. This is
what we should work on.

Regardless, if we don't have enough memory, at the end of the day, the
server will hit the disk and we will get way lower performances (by at
least one order of magnitude). This is something to keep in mind when
doing perf tests : we are NOT testing the disk performance, we are
testing the server performance. Running a benchmark when there is not
enough memory to have cache loaded is a waste of time, as the impact of
disk reads is so huge it will hide any improvement we can make on the

Soooo : we need enough memory to run the server ! The pb is : how much
memory do we need ? This is the tricky part...

Emmanuel L├ęcharny 

View raw message