samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sriram <sriram....@gmail.com>
Subject Re: soft references for object caching in the key-value storage engine
Date Tue, 10 Sep 2013 18:52:18 GMT
1. The soft reference collection is largely jvm dependent. The question
arises if the task needs predictable performance or best effort. If there
were two tasks(both need state) of the same user, one being memory
intensive and the other not so much, the user would have a hard time
understanding the behavior of his tasks.

2. The JVM needs to do more work on GC. By not bounding the memory,
depending on the heuristic used and the memory characteristics of the
tasks, during GC there would be long stalls to clear up all the weak
references. My guess is we would end up tweaking
XX:SoftRefLRUPolicyMSPerMB depending
on the task.

It is worth testing with soft references but I would argue that we should
test it with different types of tasks. If it works fine for a large subset
then we have a win. If not, it is not much different from bounding the
cache size.


On Tue, Sep 10, 2013 at 8:50 AM, Jay Kreps <jay.kreps@gmail.com> wrote:

> One idea I had was to use soft references for object cache in key-value
> store. Currently we use an LRU hashmap, but the drawback of this is that it
> needs to be carefully sized based on heap size and the number of
> partitions. It is a little hard to know when to add memory to the object
> cache vs the block cache. Plus, since the size is based both on the objects
> in it, but also the overhead per object this is pretty impossible to
> calculate the worst case memory usage of N objects to make this work
> properly with a given heap size.
>
> Another option would be to use soft references:
> http://docs.oracle.com/javase/7/docs/api/java/lang/ref/SoftReference.html
>
> Soft references will let you use all available heap space as a cache that
> gets gc'd only when strong These are usually frowned upon for caches due to
> the unpredictability of the discard--basically the garbage collector has
> some heuristic by which it chooses what to discard (
>
> http://jeremymanson.blogspot.com/2009/07/how-hotspot-decides-to-clear_07.html
> )
> but it is based on a heuristic of how much actual free memory to maintain.
> This makes soft references a little dicey for latency sensitive services.
>
> But for Samza the caching is really about optimizing throughput not
> reducing the latency of a particular lookup. So using the rest of the free
> memory in the heap for caching is actually attractive. It is true that the
> garbage collector might occasionally destroy our cache but that is actually
> okay and possibly worth getting orders of magnitude extra cache space.
>
> This does seem like the kind of thing that would have odd corner cases.
> Anyone have practical experience with these who can tell me why this is a
> bad idea?
>
> -Jay
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message