samoa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eduardo Costa <eduardoc...@gmail.com>
Subject Re: Arff files from Hadoop HDFS
Date Wed, 13 Apr 2016 18:36:36 GMT
Hi Gianmarco,

Yes, I tested with examples from storm-starter. But I' ll see it again.

Best regards,
Eduardo.

2016-04-03 4:23 GMT-03:00 Gianmarco De Francisci Morales <gdfm@apache.org>:

> Hi Eduardo,
>
> It depends very much on the dataset and the cluster setup.
> In my setup, covType takes more or less the same time in local or cluster
> environment (up to parallelism 8).
>
> There are some inefficiencies with serialization that we are aware of, but
> it should not affect the performance to the point of slowing it down one
> order of magnitude.
> Have you validated your cluster setup?
>
> Cheers,
>
> -- Gianmarco
>
> On Mon, Mar 28, 2016 at 3:05 PM, Eduardo Costa <eduardocosi@gmail.com>
> wrote:
>
> > Hi Gianmarco,
> >
> > Yes, it helped me!
> > I put the STORM, HADOOP and SAMOA in the same cluster, it worked!
> However,
> > I
> > am thinking the execution too slow.
> > Considering the same task and covtypeNorm.arff dataset , Samoa (local
> mode)
> > takes 18 seconds. Already in cluster mode, several minutes. Is this
> normal?
> >
> > Best regards,
> > Eduardo.
> >
> > 2016-03-13 4:48 GMT-03:00 Gianmarco De Francisci Morales <
> gdfm@apache.org
> > >:
> >
> > > Hi Eduardo,
> > >
> > > As long as you can access the HDFS cluster from the machines composing
> > the
> > > Storm cluster, there should be no problem.
> > > However, you need to figure out how to set the environment variables to
> > > point to the right installation of Hadoop (set the HADOOP_HOME
> variable).
> > > You just need to set your configuration files (e.g., hdfs-site.xml) to
> > > point to the correct Hadoop cluster.
> > >
> > > Hope it helps,
> > >
> > > -- Gianmarco
> > >
> > > On Sat, Mar 12, 2016 at 4:21 PM, Eduardo Costa <eduardocosi@gmail.com>
> > > wrote:
> > >
> > > > Hi, Gianmarco!
> > > > Thank you so much by response!
> > > > Now, I have another doubt: I run the SAMOA (in cluster mode) in a
> > > different
> > > > machine (cluster) from Hadoop cluster because I run the  SAMOA on top
> > of
> > > > Storm cluster. Is there some way to read arff files from this Hadoop
> > > > cluster remote to run the SAMOA on top of Storm cluster?
> > > > Sorry for bothering so much, but I need it to give continidade my
> > > master's
> > > > thesis in Brazil at the Federal University of the State of Rio de
> > Janeiro
> > > > (UNIRIO). As previously mentioned, I'm trying to build a rudimentary
> > > > anomaly detection system using SAMOA, but I am a layman in relation
> to
> > > > Samoa.
> > > >
> > > > Best regards,
> > > > Eduardo.
> > > >
> > > > 2016-03-06 8:59 GMT-03:00 Gianmarco De Francisci Morales <
> > > gdfm@apache.org
> > > > >:
> > > >
> > > > > Hi Eduardo,
> > > > > Yes, it is possible to read ARFF files from HDFS.
> > > > > However, right now it is way more complicated than it should be,
> and
> > > it's
> > > > > not documented at all.
> > > > > Thanks for asking the question.
> > > > >
> > > > > I managed to do it with this command line:
> > > > >
> > > > > ./bin/samoa local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar
> > > > > "PrequentialEvaluation -s (org.apache.samoa.streams.ArffFileStream
> -s
> > > > > HDFSFileStreamSource -f /user/$USER/covtypeNorm.arff)"
> > > > >
> > > > > But I had to do a small modification to HDFSFileStreamSource to
> make
> > it
> > > > > work, by adding this line after line 61
> > > > >
> > > > >     config.set("fs.hdfs.impl",
> > > > >
> > > > >
> >  org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
> > > > >
> > > > > Things to notice:
> > > > > - We rely on HADOOP_HOME being set to your hadoop installation.
> This
> > > > should
> > > > > be made more robust.
> > > > > - I used explicitly org.apache.samoa.streams.ArffFileStream as the
> > > normal
> > > > > ArffFileStream does not support HDFS (this is related to SAMOA-14
> > > > > <https://issues.apache.org/jira/browse/SAMOA-14>, and I plan
to
> fix
> > it
> > > > > asap).
> > > > > - I will add the snippet of code above in the same patch for
> SAMOA-14
> > > > >
> > > > >
> > > > > Hope it helps,
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > -- Gianmarco
> > > > >
> > > > > On Fri, Feb 12, 2016 at 6:45 PM, Eduardo Costa <
> > eduardocosi@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Could I pass arff files, by "-s " argumment, from hadoop HDFS
to
> > > SAMOA.
> > > > > If
> > > > > > I could, how to make?
> > > > > >
> > > > > > Best regards,
> > > > > > Eduardo.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message