chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Graham <>
Subject Re: Cluster-specific Adaptors
Date Tue, 21 Sep 2010 19:55:10 GMT
Keeping state for a use case like this adds unnecessary complexity in
both the implementation of the client/server and the protocol. Also,
the protocol would be state-full in this case not because doing so
would make it easier to use protocol, but to work around the limits of
the existing protocol, which really could just be changed.

>  It's easy to know when the state can be discarded

I disagree. If I make two HTTP requests the underlying HTTP
client/browser could use one or more connections for example and it
puts an unnecessary burden on the client to track this. You'd instead
need to keep a token to identify each user to bind their state to,
which would need to be passed around, which is a pain. Then how does
he server and the client know when this state is expiring? What if the
server gets bounced? Race conditions? Etc. Yes, there are technical
solutions to all of these things, but they all come at a way more
complex cost to implement and use than I think we need to handle
per-adaptor tags.

On Tue, Sep 21, 2010 at 11:54 AM, Ariel Rabkin <> wrote:
> Why is it bad to keep some state per command session?  It's easy to
> know when the state can be discarded -- as soon as we're done reading
> the file or when the socket closes.
> I think it's fairly intuitive and readable; it's routine for e.g.,
> scripts to modify interpreter state.  In straight-line code, which is
> all we're ever going to have in the control protocol, this is very
> easy to reason about.
> --Ari
> On Tue, Sep 21, 2010 at 11:17 AM, Bill Graham <> wrote:
>> +1 on staying stateless.
>> I think the challenge we're facing is that we're trying to support a
>> syntax that is simple and readable and can be done with a single line
>> (i.e. for the initial_adaptors file, the telnet API, the command line,
>> etc), but the configs can potentially be not-so-simple.
>> For example, here's how you might configure the JMS adaptor which used
>> dependency injection. That's a lot for a single line and there's
>> nowhere to add new global configs in front of the adaptor specific
>> configs without breaking things.
>> add jms.JMSAdaptor jms-events
>> failover:(tcp://,tcp://
>> -q -s "id_type IN ('162')" -x
>> org.apache.hadoop.chukwa.datacollection.adaptor.
>> jms.JMSMessagePropertyTransformer -p
>> "event_time,id_type,id,srcurl,xref,xrq,title -r event_time,id_type,id"
>> 0
>> What if we were to adopt a few flags into the syntax:
>> add [name =] <adaptor_class_name> <datatype> [--tags <tags>]
>> [--adaptor-params <adaptor specific params>|--adaptor-config-file
>> <file>]
>> <initial offset>
>> The '--*' flags could be reserved. This would allow us to keep with a
>> one-line syntax where that approach works, but allow for expansion.
>> Also, if an adaptor config got to complex, those configs could be
>> specified in a file if needed.
>> On Mon, Sep 20, 2010 at 9:52 PM, Jerome Boulon <> wrote:
>>> Hi,
>>> If I had to implement this, I will add an extra parameter
>>> (?extraParams=xyz).
>>> The adaptorImp will be the only one responsible for parsing this adaptor’s
>>> specific info.
>>> I don’t think that we could/should add new complexity in the parsing.
>>> The same think should be done for getCurrentStatus(), a public result, that
>>> is the same for all adaptors in order to know if the adaptors is working or
>>> not and a private section that will give extra information.
>>> Also, moving to a json input should simplify everything.
>>> /Jerome.
>>> On 9/20/10 5:15 PM, "Bill Graham" <> wrote:
>>> I'd like to hear Ari's take on this, but this does feel a bit hacky to
>>> me. Plus, it would put the responsibility of parsing tags on each
>>> adaptor impl and would require a refactor of how each one currently
>>> parses args.
>>> Actually, we might be able to intercept the call to parseArgs in
>>> AbstractAdaptor and pull out the tags if they exist and pass the rest
>>> to the subclass, which would be none the wiser. Not the cleanest, but
>>> at lease not as intrusive on the adaptor implementations.
>>> Ari, also what about the getCurrentStatus() method? I'd think all the
>>> impls would somehow need to incorporate tags into that response as
>>> well, since AFAIR that's what's used to do Adaptor SerDe with the
>>> checkpoints file.
>>> On Mon, Sep 20, 2010 at 3:57 PM, Eric Yang <> wrote:
>>>> Hi Bill,
>>>> This might be hacky but it should be possible to have adaptor specific
>>>> params to include tags.  Ari, what do you think?
>>>> Regards,
>>>> Eric
>>>> On 9/20/10 2:58 PM, "Bill Graham" <> wrote:
>>>> Hi,
>>>> In CHUKWA-515 we discussed the possibility being able to add an
>>>> adaptor bound to a given cluster:
>>>> I can actually see this being useful, especially now that it's easier
>>>> to add/remove agents with the Adaptor REST API. Looking into the code
>>>> it doesn't seem like it would be that hard to do, but I want to make
>>>> sure I'm not overlooking anything.
>>>> It seems like we could support this with a few small changes:
>>>> - Add the concept of tags to the Adaptor interface.
>>>> - AbstractAdator would support a getTags method which would return the
>>>> union of tags set on the Adaptor and the default tags on the
>>>> DataFactory.
>>>> - Internal tag implementations on each would change to store tags in
>>>> maps, instead of concat'ed strings. This would allow for a "last in
>>>> wins" type of functionality so tags could be overriden. This assumes
>>>> of course that there should never be more than one of the same tag key
>>>> value, which I _assume_ is the case.
>>>> - The ChunkImpl constructor will call getTags on the agent, instead of
>>>> getDefaultTags the data factory.
>>>> The trickiest part as I see it is figuring out how to change the add
>>>> adaptor string syntax in ChukwaAgetn.processAddCommandE in a way that
>>>> both makes sense and doesn't break things. In it's current form it
>>>> doesn't have room for easy expansion except at the end of the line:
>>>> add [name =] <adaptor_class_name> <datatype> <adaptor specific
>>>> <initial offset>
>>>> Any thoughts or suggestions? There's also a potential gotcha with all
>>>> the impls of Adaptor.parseArgs either breaking or needing to
>>>> change....
>>>> thanks,
>>>> Bill
> --
> Ari Rabkin
> UC Berkeley Computer Science Department

View raw message