metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Allen <n...@nickallen.org>
Subject Re: [DISCUSS] Batch Profiler Feature Branch
Date Wed, 26 Sep 2018 22:20:47 GMT
Thanks for the review.  With
https://github.com/apache/metron/pull/1209 complete,
I think the feature branch is ready to be merged.  Sounds like I have
Mike's support.  Anyone else have comments, concerns, questions?

On Tue, Sep 25, 2018 at 10:33 PM Michael Miklavcic <
michael.miklavcic@gmail.com> wrote:

> I just made a couple minor comments on that PR, and I am in agreement about
> the readiness for merging with master. Good stuff Nick.
>
> On Fri, Sep 21, 2018 at 12:37 PM Nick Allen <nick@nickallen.org> wrote:
>
> > Here is a PR that adds the input time constraints to the Batch Profiler
> > (METRON-1787);  https://github.com/apache/metron/pull/1209.
> >
> > It seems that the consensus is that this is probably the last feature we
> > need before merging the FB into master.  The other two can wait until
> after
> > the feature branch has been merged.  Let me know if you disagree.
> >
> > Thanks
> >
> >
> > On Thu, Sep 20, 2018 at 1:55 PM Nick Allen <nick@nickallen.org> wrote:
> >
> > > Yeah, agreed.  Per use case 3, when deploying to production there
> really
> > > wouldn't be a huge overlap like 3 months of already profiled data.  Its
> > day
> > > 1, the profile was just deployed around the same time as you are
> running
> > > the Batch Profiler, so the overlap is in minutes, maybe hours.  But I
> can
> > > definitely see the usefulness of the feature for re-runs, etc as you
> have
> > > described.
> > >
> > > Based on this discussion, I created a few JIRAs.  Thanks all for the
> > great
> > > feedback and keep it coming.
> > >
> > > [1] METRON-1787 - Input Time Constraints for Batch Profiler
> > > [2] METRON-1788 - Fetch Profile Definitions from Zk for Batch Profiler
> > > [3] METRON-1789 - MPack Should Define Default Input Path for Batch
> > > Profiler
> > >
> > >
> > > --
> > > [1] https://issues.apache.org/jira/browse/METRON-1787
> > > [2] https://issues.apache.org/jira/browse/METRON-1788
> > > [3] https://issues.apache.org/jira/browse/METRON-1789
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Sep 20, 2018 at 1:34 PM Michael Miklavcic <
> > > michael.miklavcic@gmail.com> wrote:
> > >
> > >> I think we might want to allow the flexibility to choose the date
> range
> > >> then. I don't yet feel like I have a good enough understanding of all
> > the
> > >> ways in which users would want to seed to force them to run the batch
> > job
> > >> over all the data. It might also make it easier to deal with
> > remediation,
> > >> ie an error doesn't force you to re-run over the entire history. Same
> > goes
> > >> for testing out the profile seeing batch job in the first place.
> > >>
> > >> On Thu, Sep 20, 2018 at 11:23 AM Nick Allen <nick@nickallen.org>
> wrote:
> > >>
> > >> > Assuming you have 9 months of data archived, yes.
> > >> >
> > >> > On Thu, Sep 20, 2018 at 1:22 PM Michael Miklavcic <
> > >> > michael.miklavcic@gmail.com> wrote:
> > >> >
> > >> > > So in the case of 3 - if you had 6 months of data that hadn't
been
> > >> > profiled
> > >> > > and another 3 that had been profiled (9 months total data), in
its
> > >> > current
> > >> > > form the batch job runs over all 9 months?
> > >> > >
> > >> > > On Thu, Sep 20, 2018 at 11:13 AM Nick Allen <nick@nickallen.org>
> > >> wrote:
> > >> > >
> > >> > > > > How do we establish "tm" from 1.1 above? Any concerns
about
> > >> overlap
> > >> > or
> > >> > > > gaps after the seeding is performed?
> > >> > > >
> > >> > > > Good point.  Right now, if the Streaming and Batch Profiler
> > overlap
> > >> the
> > >> > > > last write wins.  And presumably the output of the Streaming
and
> > >> Batch
> > >> > > > Profiler are the same, so no worries, right? :)
> > >> > > >
> > >> > > > So it kind of works, but it is definitely not ideal for
use case
> > >> 3.  I
> > >> > > > could add --begin and --end args to constrain the time frame
> over
> > >> which
> > >> > > the
> > >> > > > Batch Profiler runs.  I do not have that in the feature
branch.
> > It
> > >> > would
> > >> > > > be easy enough to add though.
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > On Thu, Sep 20, 2018 at 12:41 PM Michael Miklavcic <
> > >> > > > michael.miklavcic@gmail.com> wrote:
> > >> > > >
> > >> > > > > Ok, makes sense. That's sort of what I was thinking
as well,
> > Nick.
> > >> > > > Pulling
> > >> > > > > at this thread just a bit more...
> > >> > > > >
> > >> > > > >    1. I have an existing system that's been up a while,
and I
> > have
> > >> > > added
> > >> > > > k
> > >> > > > >    profiles - assume these are the first profiles I've
> created.
> > >> > > > >       1. I would have t0 - tm (where m is the time
when the
> > >> profiles
> > >> > > were
> > >> > > > >       first installed) worth of data that has not been
> profiled
> > >> yet.
> > >> > > > >       2. The batch profiler process would be to take
that
> exact
> > >> > profile
> > >> > > > >       definition from ZK and run the batch loader with
that
> from
> > >> the
> > >> > > CLI.
> > >> > > > >       3. Profiles are now up to date from t0 - tCurrent
> > >> > > > >    2. I've already done #1 above. Time goes by and
now I want
> to
> > >> add
> > >> > a
> > >> > > > new
> > >> > > > >    profile.
> > >> > > > >       1. Same first step above
> > >> > > > >       2. I would run the batch loader with *only* that
new
> > profile
> > >> > > > >       definition to seed?
> > >> > > > >
> > >> > > > > Forgive me if I missed this in PR's and discussion
in the FB,
> > but
> > >> how
> > >> > > do
> > >> > > > we
> > >> > > > > establish "tm" from 1.1 above? Any concerns about overlap
or
> > gaps
> > >> > after
> > >> > > > the
> > >> > > > > seeding is performed?
> > >> > > > >
> > >> > > > > On Thu, Sep 20, 2018 at 10:26 AM Nick Allen <
> nick@nickallen.org
> > >
> > >> > > wrote:
> > >> > > > >
> > >> > > > > > I think more often than not, you would want to
load your
> > profile
> > >> > > > > definition
> > >> > > > > > from a file.  This is why I considered the 'load
from Zk'
> more
> > >> of a
> > >> > > > > > nice-to-have.
> > >> > > > > >
> > >> > > > > >    - In use case 1 and 2, this would definitely
be the case.
> > >> The
> > >> > > > > profiles
> > >> > > > > >    I am working with are speculative and I am
using the
> batch
> > >> > > profiler
> > >> > > > to
> > >> > > > > >    determine if they are worth keeping.  In this
case, my
> > >> > speculative
> > >> > > > > > profiles
> > >> > > > > >    would not be in Zk (yet).
> > >> > > > > >    - In use case 3, I could see it go either way.
 It might
> be
> > >> > useful
> > >> > > > to
> > >> > > > > >    load from Zk, but it certainly isn't a blocker.
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > > So if the config does not correctly match
the profiler
> > config
> > >> > held
> > >> > > in
> > >> > > > > ZK
> > >> > > > > > and
> > >> > > > > > the user runs the batch seeding job, what happens?
> > >> > > > > >
> > >> > > > > > You would just get a profile that is slightly
different over
> > the
> > >> > > entire
> > >> > > > > > time span.  This is not a new risk.  If the user
changes
> their
> > >> > > Profile
> > >> > > > > > definitions in Zk, the same thing would happen.
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On Thu, Sep 20, 2018 at 12:15 PM Michael Miklavcic
<
> > >> > > > > > michael.miklavcic@gmail.com> wrote:
> > >> > > > > >
> > >> > > > > > > I think I'm torn on this, specifically because
it's batch
> > and
> > >> > would
> > >> > > > > > > generally be run as-needed. Justin, can you
elaborate on
> > your
> > >> > > > concerns
> > >> > > > > > > there? This feels functionally very similar
to our flat
> file
> > >> > > loaders,
> > >> > > > > > which
> > >> > > > > > > all have inputs for config from the CLI only.
On the other
> > >> hand,
> > >> > > our
> > >> > > > > flat
> > >> > > > > > > file loaders are not typically seeding an
existing
> > structure.
> > >> My
> > >> > > > > concern
> > >> > > > > > of
> > >> > > > > > > a local file profiler config stems from this
stated goal:
> > >> > > > > > > > The goal would be to enable “profile
seeding” which
> allows
> > >> > > profiles
> > >> > > > > to
> > >> > > > > > be
> > >> > > > > > > populated from a time before the profile
was created.
> > >> > > > > > > So if the config does not correctly match
the profiler
> > config
> > >> > held
> > >> > > in
> > >> > > > > ZK
> > >> > > > > > > and the user runs the batch seeding job,
what happens?
> > >> > > > > > >
> > >> > > > > > > On Thu, Sep 20, 2018 at 10:06 AM Justin Leet
<
> > >> > > justinjleet@gmail.com>
> > >> > > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > The profile not being able to read from
ZK feels like a
> > >> fairly
> > >> > > > > > > substantial,
> > >> > > > > > > > if subtle, set of potential problems.
 I'd like to see
> > that
> > >> in
> > >> > > > either
> > >> > > > > > > > before merging or at least pretty soon
after merging.
> Is
> > >> it a
> > >> > > lot
> > >> > > > of
> > >> > > > > > > work
> > >> > > > > > > > to add that functionality based on where
things are
> right
> > >> now?
> > >> > > > > > > >
> > >> > > > > > > > On Thu, Sep 20, 2018 at 9:59 AM Nick
Allen <
> > >> nick@nickallen.org
> > >> > >
> > >> > > > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > > Here is another limitation that
I just thought. It can
> > >> only
> > >> > > read
> > >> > > > a
> > >> > > > > > > > profile
> > >> > > > > > > > > definition from a file.  It probably
also makes sense
> to
> > >> add
> > >> > an
> > >> > > > > > option
> > >> > > > > > > > that
> > >> > > > > > > > > allows it to read the current Profiler
configuration
> > from
> > >> > > > > Zookeeper.
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > > Is it worth setting up a default
config that pulls
> > from
> > >> the
> > >> > > > main
> > >> > > > > > > > indexing
> > >> > > > > > > > > output?
> > >> > > > > > > > >
> > >> > > > > > > > > Yes, I think that makes sense.
 We want the Batch
> > >> Profiler to
> > >> > > > point
> > >> > > > > > to
> > >> > > > > > > > the
> > >> > > > > > > > > right HDFS URL, no matter where/how
Metron is
> deployed.
> > >> When
> > >> > > > > Metron
> > >> > > > > > > gets
> > >> > > > > > > > > spun-up on a cluster, I should
be able to just run the
> > >> Batch
> > >> > > > > Profiler
> > >> > > > > > > > > without having to fuss with the
input path.
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > On Thu, Sep 20, 2018 at 9:46 AM
Justin Leet <
> > >> > > > justinjleet@gmail.com
> > >> > > > > >
> > >> > > > > > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > > Re:
> > >> > > > > > > > > >
> > >> > > > > > > > > > >  * You do not configure
the Batch Profiler in
> > >> Ambari.  It
> > >> > > is
> > >> > > > > > > > configured
> > >> > > > > > > > > > > and executed completely
from the command-line.
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > > Is it worth setting up a default
config that pulls
> > from
> > >> the
> > >> > > > main
> > >> > > > > > > > indexing
> > >> > > > > > > > > > output?  I'm a little on the
fence about it, but it
> > >> seems
> > >> > > like
> > >> > > > > > making
> > >> > > > > > > > the
> > >> > > > > > > > > > most common case more or less
built-in would be
> nice.
> > >> > > > > > > > > >
> > >> > > > > > > > > > Having said that, I do not
consider that a
> requirement
> > >> for
> > >> > > > > merging
> > >> > > > > > > the
> > >> > > > > > > > > > feature branch.
> > >> > > > > > > > > >
> > >> > > > > > > > > > On Wed, Sep 19, 2018 at 11:23
AM James Sirota <
> > >> > > > > jsirota@apache.org>
> > >> > > > > > > > > wrote:
> > >> > > > > > > > > >
> > >> > > > > > > > > > > I think what you have
outlined above is a good
> > initial
> > >> > stab
> > >> > > > at
> > >> > > > > > the
> > >> > > > > > > > > > > feature.  Manual install
of spark is not a big
> deal.
> > >> > > > > Configuring
> > >> > > > > > > via
> > >> > > > > > > > > > > command line while we
mature this feature is ok as
> > >> well.
> > >> > > > > Doesn't
> > >> > > > > > > > look
> > >> > > > > > > > > > like
> > >> > > > > > > > > > > configuration steps are
too hard.  I think you
> > should
> > >> > > merge.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > James
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > 19.09.2018, 08:15, "Nick
Allen" <
> nick@nickallen.org
> > >:
> > >> > > > > > > > > > > > I would like to
open a discussion to get the
> Batch
> > >> > > Profiler
> > >> > > > > > > feature
> > >> > > > > > > > > > > branch
> > >> > > > > > > > > > > > merged into master
as part of METRON-1699 [1]
> > Create
> > >> > > Batch
> > >> > > > > > > > Profiler.
> > >> > > > > > > > > > All
> > >> > > > > > > > > > > > of the work that
I had in mind for our first
> draft
> > >> of
> > >> > the
> > >> > > > > Batch
> > >> > > > > > > > > > Profiler
> > >> > > > > > > > > > > > has been completed.
Please take a look through
> > what
> > >> I
> > >> > > have
> > >> > > > > and
> > >> > > > > > > let
> > >> > > > > > > > me
> > >> > > > > > > > > > > know
> > >> > > > > > > > > > > > if there are other
features that you think are
> > >> required
> > >> > > > > > *before*
> > >> > > > > > > we
> > >> > > > > > > > > > > merge.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > Previous list discussions
on this topic include
> > [2]
> > >> and
> > >> > > > [3].
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > (Q) What can I do
with the feature branch?
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >   * With the Batch
Profiler, you can
> backfill/seed
> > >> > > profiles
> > >> > > > > > using
> > >> > > > > > > > > > > archived
> > >> > > > > > > > > > > > telemetry. This
enables the following types of
> use
> > >> > cases.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >       1. As a Security
Data Scientist, I want to
> > >> > > understand
> > >> > > > > the
> > >> > > > > > > > > > > historical
> > >> > > > > > > > > > > > behaviors and trends
of a profile that I have
> > >> created
> > >> > so
> > >> > > > > that I
> > >> > > > > > > can
> > >> > > > > > > > > > > > determine if I have
created a feature set that
> has
> > >> > > > predictive
> > >> > > > > > > value
> > >> > > > > > > > > for
> > >> > > > > > > > > > > > model building.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >       2. As a Security
Data Scientist, I want to
> > >> > > understand
> > >> > > > > the
> > >> > > > > > > > > > > historical
> > >> > > > > > > > > > > > behaviors and trends
of a profile that I have
> > >> created
> > >> > so
> > >> > > > > that I
> > >> > > > > > > can
> > >> > > > > > > > > > > > determine if I have
defined the profile
> correctly
> > >> and
> > >> > > > > created a
> > >> > > > > > > > > feature
> > >> > > > > > > > > > > set
> > >> > > > > > > > > > > > that matches reality.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >       3. As a Security
Platform Engineer, I want
> > to
> > >> > > > generate
> > >> > > > > a
> > >> > > > > > > > > profile
> > >> > > > > > > > > > > > using archived telemetry
when I deploy a new
> model
> > >> to
> > >> > > > > > production
> > >> > > > > > > so
> > >> > > > > > > > > > that
> > >> > > > > > > > > > > > models depending
on that profile can function on
> > >> day 1.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >   * METRON-1699
[1] includes a more detailed
> > >> > description
> > >> > > of
> > >> > > > > the
> > >> > > > > > > > > > feature.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > (Q) What work was
completed?
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >   * The Batch Profiler
runs on Spark and was
> > >> > implemented
> > >> > > in
> > >> > > > > > Java
> > >> > > > > > > to
> > >> > > > > > > > > > > remain
> > >> > > > > > > > > > > > consistent with
our current Java-heavy code
> base.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >   * The Batch Profiler
is executed from the
> > >> > command-line.
> > >> > > > It
> > >> > > > > > can
> > >> > > > > > > be
> > >> > > > > > > > > > > > launched using a
script or by calling
> > >> `spark-submit`,
> > >> > > which
> > >> > > > > may
> > >> > > > > > > be
> > >> > > > > > > > > > useful
> > >> > > > > > > > > > > > for advanced users.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >   * Input telemetry
can be consumed from
> multiple
> > >> > > sources;
> > >> > > > > for
> > >> > > > > > > > > example
> > >> > > > > > > > > > > HDFS
> > >> > > > > > > > > > > > or the local file
system.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >   * Input telemetry
can be consumed in multiple
> > >> > formats;
> > >> > > > for
> > >> > > > > > > > example
> > >> > > > > > > > > > JSON
> > >> > > > > > > > > > > > or ORC.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >   * The 'output'
profile measurements are
> > persisted
> > >> in
> > >> > > > HBase
> > >> > > > > > and
> > >> > > > > > > is
> > >> > > > > > > > > > > > consistent with
the Storm Profiler.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >   * It can be run
on any underlying engine
> > >> supported by
> > >> > > > > Spark.
> > >> > > > > > I
> > >> > > > > > > > have
> > >> > > > > > > > > > > > tested it both in
'local' mode and on a YARN
> > >> cluster.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >   * It is installed
automatically by the Metron
> > >> MPack.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >   * A README was
added that documents usage
> > >> > instructions.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >   * The existing
Profiler code was refactored so
> > >> that
> > >> > as
> > >> > > > much
> > >> > > > > > > code
> > >> > > > > > > > as
> > >> > > > > > > > > > > > possible is shared
between the 3 Profiler ports;
> > >> Storm,
> > >> > > the
> > >> > > > > > > Stellar
> > >> > > > > > > > > > REPL,
> > >> > > > > > > > > > > > and Spark. For example,
the logic which
> determines
> > >> the
> > >> > > > > > timestamp
> > >> > > > > > > > of a
> > >> > > > > > > > > > > > message was refactored
so that it could be
> reused
> > by
> > >> > all
> > >> > > > > ports.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >       * metron-profiler-common:
The common
> > Profiler
> > >> > code
> > >> > > > > shared
> > >> > > > > > > > > amongst
> > >> > > > > > > > > > > > each port.
> > >> > > > > > > > > > > >       * metron-profiler-storm:
Profiler on Storm
> > >> > > > > > > > > > > >       * metron-profiler-spark:
Profiler on Spark
> > >> > > > > > > > > > > >       * metron-profiler-repl:
Profiler on the
> > >> Stellar
> > >> > > REPL
> > >> > > > > > > > > > > >       * metron-profiler-client:
The client code
> > for
> > >> > > > > retrieving
> > >> > > > > > > > > profile
> > >> > > > > > > > > > > > data; for example
PROFILE_GET.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >   * There are 3
separate RPM and DEB packages
> now
> > >> > created
> > >> > > > for
> > >> > > > > > the
> > >> > > > > > > > > > > Profiler.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >       * metron-profiler-storm-*.rpm
> > >> > > > > > > > > > > >       * metron-profiler-spark-*.rpm
> > >> > > > > > > > > > > >       * metron-profiler-repl-*.rpm
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >   * The Profiler
integration tests were enhanced
> > to
> > >> > > > leverage
> > >> > > > > > the
> > >> > > > > > > > > > Profiler
> > >> > > > > > > > > > > > Client logic to
validate the results.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >   * Review METRON-1699
[1] for a complete
> > >> break-down of
> > >> > > the
> > >> > > > > > tasks
> > >> > > > > > > > > that
> > >> > > > > > > > > > > have
> > >> > > > > > > > > > > > been completed on
the feature branch.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > (Q) What limitations
exist?
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >   * You must manually
install Spark to use the
> > Batch
> > >> > > > > Profiler.
> > >> > > > > > > The
> > >> > > > > > > > > > Metron
> > >> > > > > > > > > > > > MPack does not treat
Spark as a Metron
> dependency
> > >> and
> > >> > so
> > >> > > > does
> > >> > > > > > not
> > >> > > > > > > > > > install
> > >> > > > > > > > > > > > it automatically.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >   * You do not configure
the Batch Profiler in
> > >> Ambari.
> > >> > It
> > >> > > > is
> > >> > > > > > > > > configured
> > >> > > > > > > > > > > > and executed completely
from the command-line.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >   * To run the Batch
Profiler in 'Full Dev', you
> > >> have
> > >> > to
> > >> > > > take
> > >> > > > > > the
> > >> > > > > > > > > > > following
> > >> > > > > > > > > > > > manual steps. Some
of these are arguably
> > limitations
> > >> > with
> > >> > > > how
> > >> > > > > > > > Ambari
> > >> > > > > > > > > > > > installs Spark 2
in the version of HDP that we
> > run.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >       1. Install
Spark 2 using Ambari.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >       2. Tell Spark
how to talk with HBase.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >
>  SPARK_HOME=/usr/hdp/current/spark2-client
> > >> > > > > > > > > > > >         cp
> > >> > > > /usr/hdp/current/hbase-client/conf/hbase-site.xml
> > >> > > > > > > > > > > > $SPARK_HOME/conf/
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >       3. Create
the Spark History directory in
> > HDFS.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >         export HADOOP_USER_NAME=hdfs
> > >> > > > > > > > > > > >         hdfs dfs
-mkdir /spark2-history
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >       4. Change
the default input path to
> > >> > > > > > > > `hdfs://localhost:8020/...`
> > >> > > > > > > > > > to
> > >> > > > > > > > > > > > match the port defined
by HDP, instead of port
> > 9000.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > [1]
> > >> https://issues.apache.org/jira/browse/METRON-1699
> > >> > > > > > > > > > > > [2]
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://lists.apache.org/thread.html/da81c1227ffda3a47eb2e5bb4d0b162dd6d36006241c4ba4b659587b@%3Cdev.metron.apache.org%3E
> > >> > > > > > > > > > > > [3]
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://lists.apache.org/thread.html/d28d18cc9358f5d9c276c7c304ff4ee601041fb47bfc97acb6825083@%3Cdev.metron.apache.org%3E
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > -------------------
> > >> > > > > > > > > > > Thank you,
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > James Sirota
> > >> > > > > > > > > > > PMC- Apache Metron
> > >> > > > > > > > > > > jsirota AT apache DOT
org
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message