metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Casey Stella <ceste...@gmail.com>
Subject Re: [DISCUSS] Turning off indexing writers feature discussion
Date Thu, 12 Jan 2017 22:15:09 GMT
Hey Matt,

Thanks for the comment!
1. At the moment, we only have one index name, the default of which is the
sensor name but that's entirely up to the user.  This is sensor specific,
so it'd be a separate config for each sensor.  If we want to build multiple
indices per sensor, we'd have to think carefully about how to do that and
would be a bigger undertaking.  I guess I can see the use, though (redirect
messages to one index vs another based on a predicate for a given sensor).
Anyway, not where I was originally thinking that this discussion would go,
but it's an interesting point.

2. I hadn't thought through the implementation quite yet, but we don't
actually have a splitter bolt in that topology, just a spout that goes to
the elasticsearch writer and also to the hdfs writer.

On Thu, Jan 12, 2017 at 4:52 PM, Matt Foley <mattf@apache.org> wrote:

> Casey, good to have controls like this.  Couple questions:
>
> 1. Regarding the “index” : “squid” name/value pair, is the index name
> expected to always be a sensor name?  Or is the given json structure
> subordinate to a sensor name in zookeeper?  Or can we build arbitrary
> indexes with this new specification, independent of sensor?  Should there
> actually be a list of “indexes”, ie
> { “indexes” : [
>         {“index” : “name1”,
>                 …
>         },
>         {“index” : “name2”,
>                 …
>         } ]
> }
>
> 2. Would the filtering / writer selection logic take place in the indexing
> topology splitter bolt?  Seems like that would have the smallest impact on
> current implementation, no?
>
> Sorry if these are already answered in PR-415, I haven’t had time to
> review that one yet.
> Thanks,
> --Matt
>
>
> On 1/12/17, 12:55 PM, "Michael Miklavcic" <michael.miklavcic@gmail.com>
> wrote:
>
>     I like the flexibility and expressibility of the first option with
> Stellar
>     filters.
>
>     M
>
>     On Thu, Jan 12, 2017 at 1:51 PM, Casey Stella <cestella@gmail.com>
> wrote:
>
>     > As of METRON-652 <https://github.com/apache/
> incubator-metron/pull/415>, we
>     > will have decoupled the indexing configuration from the enrichment
>     > configuration.  As an immediate follow-up to that, I'd like to
> provide the
>     > ability to turn off and on writers via the configs.  I'd like to get
> some
>     > community feedback on how the functionality should work, if y'all are
>     > amenable. :)
>     >
>     >
>     > As of now, we have 3 possible writers which can be used in the
> indexing
>     > topology:
>     >
>     >    - Solr
>     >    - Elasticsearch
>     >    - HDFS
>     >
>     > HDFS is always used, elasticsearch or solr is used depending on how
> you
>     > start the indexing topology.
>     >
>     > A couple of proposals come to mind immediately:
>     >
>     > *Index Filtering*
>     >
>     > You would be able to specify a filter as defined by a stellar
> statement
>     > (likely a reuse of the StellarFilter that exists in the Parsers)
> which
>     > would allow you to indicate on a message-by-message basis whether or
> not to
>     > write the message.
>     >
>     > The semantics of this would be as follows:
>     >
>     >    - Default (i.e. unspecified) is to pass everything through (hence
>     >    backwards compatible with the current default config).
>     >    - Messages which have the associated stellar statement evaluate
> to true
>     >    for the writer type will be written, otherwise not.
>     >
>     >
>     > Sample indexing config which would write out no messages to HDFS and
> write
>     > out only messages containing a field called "field1":
>     > {
>     >    "index" : "squid"
>     >   ,"batchSize" : 100
>     >   ,"filters" : {
>     >       "HDFS" : "false"
>     >      ,"ES" : "exists(field1)"
>     >                  }
>     > }
>     >
>     > *Index On/Off Switch*
>     >
>     > A simpler solution would be to just provide a list of writers to
> write
>     > messages.  The semantics would be as follows:
>     >
>     >    - If the list is unspecified, then the default is to write all
> messages
>     >    for every writer in the indexing topology
>     >    - If the list is specified, then a writer will write all messages
> if and
>     >    only if it is named in the list.
>     >
>     > Sample indexing config which turns off HDFS and keeps on
> Elasticsearch:
>     > {
>     >    "index" : "squid"
>     >   ,"batchSize" : 100
>     >   ,"writers" : [ "ES" ]
>     > }
>     >
>     > Thanks in advance for the feedback!  Also, if you have any other,
> better
>     > ideas than the ones presented here, let me know too.
>     >
>     > Best,
>     >
>     > Casey
>     >
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message