spot-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Ridley <mrid...@cloudera.com>
Subject Re: [Discuss] - Future plans for Spot-ingest
Date Mon, 17 Apr 2017 18:44:28 GMT
Without having given it too terribly much thought, that seems like an OK
approach.

Michael

On Mon, Apr 17, 2017 at 2:33 PM, Nathanael Smith <nathanael@apache.org>
wrote:

> I think the question is rather we can write the data generically to HDFS
> as parquet without the use of hive/impala?
>
> Today we write parquet data using the hive/mapreduce method.
> As part of the redesign i’d like to use libraries for this as opposed to a
> hadoop dependency.
> I think it would be preferred to use the python master to write the data
> into the format we want, then do normalization of the data in spark
> streaming.
> Any thoughts?
>
> - Nathanael
>
>
>
> > On Apr 17, 2017, at 11:08 AM, Michael Ridley <mridley@cloudera.com>
> wrote:
> >
> > I had thought that the plan was to write the data in Parquet in HDFS
> > ultimately.
> >
> > Michael
> >
> > On Sun, Apr 16, 2017 at 11:55 AM, kant kodali <kanth909@gmail.com>
> wrote:
> >
> >> Hi Mark,
> >>
> >> Thank you so much for hearing my argument. And I definetly understand
> that
> >> you guys have bunch of things to do. My only concern is that I hope it
> >> doesn't take too long too support other backends. For example @Kenneth
> had
> >> given an example of LAMP stack had not moved away from mysql yet which
> >> essentially means its probably a decade ? I see that in the current
> >> architecture the results from with python multiprocessing or Spark
> >> Streaming are written back to HDFS and  If so, can we write them in
> parquet
> >> format ? such that users should be able to plug in any query engine but
> >> again I am not pushing you guys to do this right away or anything just
> >> seeing if there a way for me to get started in parallel and if not
> >> feasible, its fine I just wanted to share my 2 cents and I am glad my
> >> argument is heard!
> >>
> >> Thanks much!
> >>
> >> On Fri, Apr 14, 2017 at 1:38 PM, Mark Grover <mark@apache.org> wrote:
> >>
> >>> Hi Kant,
> >>> Just wanted to make sure you don't feel like we are ignoring your
> >>> comment:-) I hear you about pluggability.
> >>>
> >>> The design can and should be pluggable but the project has one stack it
> >>> ships out of the box with, one stack that's the default stack in the
> >> sense
> >>> that it's the most tested and so on. And, for us, that's our current
> >> stack.
> >>> If we were to take Apache Hive as an example, it shipped (and ships)
> with
> >>> MapReduce as the default configuration engine. At some point, Apache
> Tez
> >>> came along and wanted Hive to run on Tez, so they made a bunch of
> things
> >>> pluggable to run Hive on Tez (instead of the only option up-until then:
> >>> Hive-on-MR) and then Apache Spark came and re-used some of that
> >>> pluggability and even added some more so Hive-on-Spark could become a
> >>> reality. In the same way, I don't think anyone disagrees here that
> >>> pluggabilty is a good thing but it's hard to do pluggability right, and
> >> at
> >>> the right level, unless on has a clear use-case in mind.
> >>>
> >>> As a project, we have many things to do and I personally think the
> >> biggest
> >>> bang for the buck for us in making Spot a really solid and the best
> cyber
> >>> security solution isn't pluggability but the things we are working on
> - a
> >>> better user interface, a common/unified approach to storing and
> modeling
> >>> data, etc.
> >>>
> >>> Having said that, we are open, if it's important to you or someone
> else,
> >>> we'd be happy to receive and review those patches.
> >>>
> >>> Thanks!
> >>> Mark
> >>>
> >>> On Fri, Apr 14, 2017 at 10:14 AM, kant kodali <kanth909@gmail.com>
> >> wrote:
> >>>
> >>>> Thanks Ross! and yes option C sounds good to me as well however I just
> >>>> think Distributed Sql query engine  and the resource manager should
be
> >>>> pluggable.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Apr 14, 2017 at 9:55 AM, Ross, Alan D <alan.d.ross@intel.com>
> >>>> wrote:
> >>>>
> >>>>> Option C is to use python on the front end of ingest pipeline and
> >>>>> spark/scala on the back end.
> >>>>>
> >>>>> Option A uses python workers on the backend
> >>>>>
> >>>>> Option B uses all scala.
> >>>>>
> >>>>>
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: kant kodali [mailto:kanth909@gmail.com]
> >>>>> Sent: Friday, April 14, 2017 9:53 AM
> >>>>> To: dev@spot.incubator.apache.org
> >>>>> Subject: Re: [Discuss] - Future plans for Spot-ingest
> >>>>>
> >>>>> What is option C ? am I missing an email or something?
> >>>>>
> >>>>> On Fri, Apr 14, 2017 at 9:15 AM, Chokha Palayamkottai <
> >>>>> chokha@integralops.com> wrote:
> >>>>>
> >>>>>> +1 for Python 3.x
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 4/14/2017 11:59 AM, Austin Leahy wrote:
> >>>>>>
> >>>>>>> I think that C is the strong solution, getting the ingest
really
> >>>>>>> strong is going to lower barriers to adoption. Doing it
in Python
> >>>>>>> will open up the ingest portion of the project to include
many
> >> more
> >>>>> developers.
> >>>>>>>
> >>>>>>> Before it comes up I would like to throw the following on
the
> >>> pile...
> >>>>>>> Major
> >>>>>>> python projects django/flash, others are dropping 2.x support
in
> >>>>>>> releases scheduled in the next 6 to 8 months. Hadoop projects
in
> >>>>>>> general tend to lag in modern python support, lets please
build
> >> this
> >>>>>>> in 3.5 so that we don't have to immediately expect a rebuild
in
> >> the
> >>>>>>> pipeline.
> >>>>>>>
> >>>>>>> -Vote C
> >>>>>>>
> >>>>>>> Thanks Nate
> >>>>>>>
> >>>>>>> Austin
> >>>>>>>
> >>>>>>> On Fri, Apr 14, 2017 at 8:52 AM Alan Ross <alan@apache.org>
> >> wrote:
> >>>>>>>
> >>>>>>> I really like option C because it gives a lot of flexibility
for
> >>>>>>> ingest
> >>>>>>>> (python vs scala) but still has the robust spark streaming
> >> backend
> >>>>>>>> for performance.
> >>>>>>>>
> >>>>>>>> Thanks for putting this together Nate.
> >>>>>>>>
> >>>>>>>> Alan
> >>>>>>>>
> >>>>>>>> On Fri, Apr 14, 2017 at 8:44 AM, Chokha Palayamkottai
<
> >>>>>>>> chokha@integralops.com> wrote:
> >>>>>>>>
> >>>>>>>> I agree. We should continue making the existing stack
more mature
> >>> at
> >>>>>>>>> this point. Maybe if we have enough community support
we can add
> >>>>>>>>> additional datastores.
> >>>>>>>>>
> >>>>>>>>> Chokha.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 4/14/17 11:10 AM, kenneth@floss.cat wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi Kant,
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> YARN is the standard scheduler in Hadoop. If
you're using
> >>>>>>>>>> Hive+Spark, then sure you'll have YARN.
> >>>>>>>>>>
> >>>>>>>>>> Haven't seen any HIVE on Mesos so far. As said,
Spot is based
> >> on
> >>> a
> >>>>>>>>>> quite standard Hadoop stack and I wouldn't switch
too many
> >> pieces
> >>>>> yet.
> >>>>>>>>>>
> >>>>>>>>>> In most Opensource projects you start relying
on a well-known
> >>>>>>>>>> stack and then you begin to support other DB
backends once it's
> >>>>>>>>>> quite mature. Think in the loads of LAMP apps
which haven't
> >> been
> >>>>>>>>>> ported away from MySQL yet.
> >>>>>>>>>>
> >>>>>>>>>> In any case, you'll need a high performance
SQL + Massive
> >> Storage
> >>>>>>>>>> + Machine Learning + Massive Ingestion, and...
ATM, that can be
> >>>>>>>>>> only provided by Hadoop.
> >>>>>>>>>>
> >>>>>>>>>> Regards!
> >>>>>>>>>>
> >>>>>>>>>> Kenneth
> >>>>>>>>>>
> >>>>>>>>>> A 2017-04-14 12:56, kant kodali escrigué:
> >>>>>>>>>>
> >>>>>>>>>>> Hi Kenneth,
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks for the response.  I think you made
a case for HDFS
> >>>>>>>>>>> however users may want to use S3 or some
other FS in which
> >> case
> >>>>>>>>>>> they can use Auxilio (hoping that there
are no changes needed
> >>>>>>>>>>> within Spot in which case I
> >>>>>>>>>>>
> >>>>>>>>>> can
> >>>>>>>>
> >>>>>>>>> agree to that). for example, Netflix stores all
there data into
> >> S3
> >>>>>>>>>>>
> >>>>>>>>>>> The distributed sql query engine I would
say should be
> >> pluggable
> >>>>>>>>>>> with whatever user may want to use and there
a bunch of them
> >> out
> >>>>> there.
> >>>>>>>>>>>
> >>>>>>>>>> sure
> >>>>>>>>
> >>>>>>>>> Impala is better than hive but what if users are
already using
> >>>>>>>>>>>
> >>>>>>>>>> something
> >>>>>>>>
> >>>>>>>>> else like Drill or Presto?
> >>>>>>>>>>>
> >>>>>>>>>>> Me personally, would not assume that users
are willing to
> >> deploy
> >>>>>>>>>>> all
> >>>>>>>>>>>
> >>>>>>>>>> of
> >>>>>>>>
> >>>>>>>>> that and make their existing stack more complicated
at very
> >> least
> >>> I
> >>>>>>>>>>> would
> >>>>>>>>>>> say it is a uphill battle. Things have been
changing rapidly
> >> in
> >>>>>>>>>>> Big
> >>>>>>>>>>>
> >>>>>>>>>> data
> >>>>>>>>
> >>>>>>>>> space so whatever we think is standard won't be
standard anymore
> >>>>>>>>> but
> >>>>>>>>>>> importantly there shouldn't be any reason
why we shouldn't be
> >>>>>>>>>>> flexible right.
> >>>>>>>>>>>
> >>>>>>>>>>> Also I am not sure why only YARN? why not
make that also more
> >>>>>>>>>>> flexible so users can pick Mesos or standalone.
> >>>>>>>>>>>
> >>>>>>>>>>> I think Flexibility is a key for a wide
adoption rather than
> >> the
> >>>>>>>>>>>
> >>>>>>>>>> tightly
> >>>>>>>>
> >>>>>>>>> coupled architecture.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks!
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Apr 14, 2017 at 3:12 AM, Kenneth
Peiruza
> >>>>>>>>>>> <kenneth@floss.cat>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> PS: you need a big data platform to be able
to collect all
> >> those
> >>>>>>>>>>>> netflows
> >>>>>>>>>>>> and logs.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Spot isn't intended for SMBs, that's
clear, then you need
> >> loads
> >>>>>>>>>>>> of data to get ML working properly,
and somewhere to run
> >> those
> >>>>>>>>>>>> algorithms. That
> >>>>>>>>>>>>
> >>>>>>>>>>> is
> >>>>>>>>
> >>>>>>>>> Hadoop.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Regards!
> >>>>>>>>>>>>
> >>>>>>>>>>>> Kenneth
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Sent from my Mi phone
> >>>>>>>>>>>> On kant kodali <kanth909@gmail.com>,
Apr 14, 2017 4:04 AM
> >>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks for starting this thread. Here
is my feedback.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I somehow think the architecture is
too complicated for wide
> >>>>>>>>>>>> adoption since it requires to install
the following.
> >>>>>>>>>>>>
> >>>>>>>>>>>> HDFS.
> >>>>>>>>>>>> HIVE.
> >>>>>>>>>>>> IMPALA.
> >>>>>>>>>>>> KAFKA.
> >>>>>>>>>>>> SPARK (YARN).
> >>>>>>>>>>>> YARN.
> >>>>>>>>>>>> Zookeeper.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Currently there are way too many dependencies
that
> >> discourages
> >>>>>>>>>>>> lot of users from using it because they
have to go through
> >>>>>>>>>>>> deployment of all that required software.
I think for wide
> >>>>>>>>>>>> option we should minimize the dependencies
and have more
> >>>>>>>>>>>> pluggable architecture. for example
I am
> >>>>>>>>>>>>
> >>>>>>>>>>> not
> >>>>>>>>
> >>>>>>>>> sure why HIVE & IMPALA both are required? why
not just use Spark
> >>>>>>>>> SQL
> >>>>>>>>>>>> since
> >>>>>>>>>>>> its already dependency or say users
may want to use their own
> >>>>>>>>>>>> distributed query engine they like such
as Apache Drill or
> >>>>>>>>>>>> something else. we should be flexible
enough to provide that
> >>>>>>>>>>>> option
> >>>>>>>>>>>>
> >>>>>>>>>>>> Also, I see that HDFS is used such that
collectors can
> >> receive
> >>>>>>>>>>>> file path's through Kafka and be able
to read a file. How big
> >>>>>>>>>>>> are these files ?
> >>>>>>>>>>>> Do we
> >>>>>>>>>>>> really need HDFS for this? Why not provide
more ways to send
> >>>>>>>>>>>> data such as sending data directly through
Kafka or say just
> >>>>>>>>>>>> leaving up to the user to specify the
file location as an
> >>>>>>>>>>>> argument to collector process
> >>>>>>>>>>>>
> >>>>>>>>>>>> Finally, I learnt that to generate Net
flow data one would
> >>>>>>>>>>>> require a specific hardware. This really
means Apache Spot is
> >>>>>>>>>>>> not meant for everyone.
> >>>>>>>>>>>> I thought Apache Spot can be used to
analyze the network
> >>> traffic
> >>>>>>>>>>>> of
> >>>>>>>>>>>>
> >>>>>>>>>>> any
> >>>>>>>>
> >>>>>>>>> machine but if it requires a specific hard then
I think it is
> >>>>>>>>>>>> targeted for
> >>>>>>>>>>>> specific group of people.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The real strength of Apache Spot should
mainly be just
> >>> analyzing
> >>>>>>>>>>>> network traffic through ML.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks!
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Thu, Apr 13, 2017 at 4:28 PM, Segerlind,
Nathan L <
> >>>>>>>>>>>> nathan.l.segerlind@intel.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks, Nate,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Nate.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>>> From: Nate Smith [mailto:natedogs911@gmail.com]
> >>>>>>>>>>>>> Sent: Thursday, April 13, 2017 4:26
PM
> >>>>>>>>>>>>> To: user@spot.incubator.apache.org
> >>>>>>>>>>>>> Cc: dev@spot.incubator.apache.org;
> >>>>>>>>>>>>>
> >>>>>>>>>>>> private@spot.incubator.apache.org
> >>>>>>>>
> >>>>>>>>> Subject: Re: [Discuss] - Future plans for Spot-ingest
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I was really hoping it came through
ok, Oh well :) Here’s an
> >>>>>>>>>>>>> image form:
> >>>>>>>>>>>>> http://imgur.com/a/DUDsD
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Apr 13, 2017, at 4:05 PM, Segerlind,
Nathan L <
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> nathan.l.segerlind@intel.com>
wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> The diagram became garbled in
the text format.
> >>>>>>>>>>>>>> Could you resend it as a pdf?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>> Nate
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>>>> From: Nathanael Smith [mailto:nathanael@apache.org]
> >>>>>>>>>>>>>> Sent: Thursday, April 13, 2017
4:01 PM
> >>>>>>>>>>>>>> To: private@spot.incubator.apache.org;
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> dev@spot.incubator.apache.org;
> >>>>>>>>>>>>
> >>>>>>>>>>>>> user@spot.incubator.apache.org
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Subject: [Discuss] - Future
plans for Spot-ingest
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> How would you like to see Spot-ingest
change?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> A. continue development on the
Python Master/Worker with
> >>> focus
> >>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> performance / error handling / logging
B. Develop Scala
> >> based
> >>>>>>>>>>>>>
> >>>>>>>>>>>> ingest to
> >>>>>>>>>>>> be
> >>>>>>>>>>>>
> >>>>>>>>>>>>> inline with code base from ingest,
ml, to OA (UI to continue
> >>>>>>>>>>>>> being
> >>>>>>>>>>>>> ipython/JS) C. Python ingest Worker
with Scala based Spark
> >>> code
> >>>>>>>>>>>>> for normalization and input into
DB
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Including the high level diagram:
> >>>>>>>>>>>>>> +-----------------------------
> >> ------------------------------
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> -------------------------------+
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> | +--------------------------+
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> +-----------------+        |
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> | | Master                 
 |  A. B. C.
> >>>>  |
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> Worker          |        |
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> | |    A. Python           
 +---------------+      A.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> |   A.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Python     |        |
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> | |    B. Scala            
 |               |
> >>>> +------------->
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>          +----+   |
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> | |    C. Python           
 |               |    |
> >>>> |
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>          |    |   |
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> | +---^------+---------------+
              |    |
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>  +-----------------+    |   |
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> |     |      |             
                 |    |
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>               |   |
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> |     |      |             
                 |    |
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>               |   |
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> |     |     +Note--------------+
            |    |
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>  +-----------------+    |   |
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> |     |     |Running on a  
   |             |    |
> >>>> |
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> Spark
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Streaming |    |   |
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> |     |     |worker node in
   |             |    |      B.
> >>> C.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> | B.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Scala        |    |   |
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> |     |     |the Hadoop cluster|
            |    |
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> +--------> C.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Scala        +-+  |   |
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> |     |     +------------------+
            |    |    |
> >>>>  |
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>          | |  |   |
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> |   A.|                    
                 |    |    |
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> +-----------------+ |  |   |
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> |   B.|                    
                 |    |    |
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>             |  |   |
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> |   C.|                    
                 |    |    |
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>             |  |   |
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> | +----------------------+ 
        +-v------+----+----+-+
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>  +--------------v--v-+ |
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> | |                      | 
        |
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> |           |
> >>>>>>>>>>>>
> >>>>>>>>>>>>>                  | |
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> | |   Local FS:          | 
        |    hdfs
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> |           |
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hive / Impala    | |
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> | |  - Binary/Text       | 
        |
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> |           |
> >>>>>>>>>>>>
> >>>>>>>>>>>>>  - Parquet -     | |
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> | |    Log files -       | 
        |
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> |           |
> >>>>>>>>>>>>
> >>>>>>>>>>>>>                  | |
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> | |                      | 
        |
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> |           |
> >>>>>>>>>>>>
> >>>>>>>>>>>>>                  | |
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> | +----------------------+ 
        +--------------------+
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>  +-------------------+ |
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> +-----------------------------
> >> ------------------------------
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> -------------------------------+
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Please let me know your thoughts,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> - Nathanael
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> >
> >
> > --
> > Michael Ridley <mridley@cloudera.com>
> > office: (650) 352-1337
> > mobile: (571) 438-2420
> > Senior Solutions Architect
> > Cloudera, Inc.
>
>


-- 
Michael Ridley <mridley@cloudera.com>
office: (650) 352-1337
mobile: (571) 438-2420
Senior Solutions Architect
Cloudera, Inc.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message