samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Navina Ramesh <nram...@linkedin.com.INVALID>
Subject Re: Required vs. optional methods for KeyValueStore
Date Wed, 29 Jul 2015 18:08:23 GMT
Ken,

range() and all() are used as needed by the tasks, and not by the
framework. However, if these features are skipped, it should be made clear
to the user that these operations are not supported for your store.

Also, you might have to update some tests in samza-test, which runs a fixed
set of tests on all types of key-value store supported by samza.

I agree that we can add more javadocs indicating which methods are
mandatory while implementing a KV store.

Cheers!
Navina

On Wed, Jul 29, 2015 at 10:58 AM, Ken Krugler <kkrugler_lists@transpac.com>
wrote:

> Hi Navina,
>
> Thanks for confirming that putAll(list) is a required method, for
> supporting the changelog functionality.
>
> I'm hoping you or others can confirm that range() and all() are _not_ used
> by the Samza system - i.e. these are only used internally (as needed) by
> tasks.
>
> And if the above is true, then adding some Javadoc notes about which
> methods are required (used by the Samza system) for changelog support vs.
> optional (only used by task-specific code as needed) would be very helpful.
>
> Thanks!
>
> -- Ken
>
> > From: Navina Ramesh
> > Sent: July 29, 2015 10:38:45am PDT
> > To: dev@samza.apache.org
> > Subject: Re: Required vs. optional methods for KeyValueStore
> >
> > Hi Ken,
> >
> > We use putAll(list) when restoring from changelog. So, unless you don't
> > want your store to have support for changelog, the implementation is
> > required.
> >
> > I only have a high-level overview of what Solr is. Perhaps, others on the
> > mailing list have experience with Solr and can provide more useful
> > information.
> >
> > Thanks!
> > Navina
> >
> > On Tue, Jul 28, 2015 at 5:30 PM, Ken Krugler <
> kkrugler_lists@transpac.com>
> > wrote:
> >
> >> Hi all,
> >>
> >> I'm looking at using embedded Solr as the KeyValueStore, as that lets me
> >> extract ranked results from the state to publish as part of the task's
> >> operation.
> >>
> >> Some of the methods defined by KeyValueStore are problematic, though -
> >> specifically the range() and all() methods that return iterators.
> >>
> >> Iterating over lots of results in Solr, while more feasible with newer
> >> paging support, is still an abuse of its architecture :)
> >>
> >> So I'm wondering whether I need to support those methods, or are they
> only
> >> called internally by tasks (e.g. my task) and thus can be optional.
> >>
> >> I'm assuming that when state is being automatically restored from a
> >> changelog, the Samza system is calling putAll(list) repeatedly, but I
> >> haven't dug into those details. So that would be an example of a
> required
> >> method.
> >>
> >> Thanks,
> >>
> >> -- Ken
>
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>
>
>
>
>


-- 
Navina R.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message