samoa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gianmarco De Francisci Morales <g...@apache.org>
Subject Re: Arff files from Hadoop HDFS
Date Sun, 03 Apr 2016 07:23:44 GMT
Hi Eduardo,

It depends very much on the dataset and the cluster setup.
In my setup, covType takes more or less the same time in local or cluster
environment (up to parallelism 8).

There are some inefficiencies with serialization that we are aware of, but
it should not affect the performance to the point of slowing it down one
order of magnitude.
Have you validated your cluster setup?

Cheers,

-- Gianmarco

On Mon, Mar 28, 2016 at 3:05 PM, Eduardo Costa <eduardocosi@gmail.com>
wrote:

> Hi Gianmarco,
>
> Yes, it helped me!
> I put the STORM, HADOOP and SAMOA in the same cluster, it worked! However,
> I
> am thinking the execution too slow.
> Considering the same task and covtypeNorm.arff dataset , Samoa (local mode)
> takes 18 seconds. Already in cluster mode, several minutes. Is this normal?
>
> Best regards,
> Eduardo.
>
> 2016-03-13 4:48 GMT-03:00 Gianmarco De Francisci Morales <gdfm@apache.org
> >:
>
> > Hi Eduardo,
> >
> > As long as you can access the HDFS cluster from the machines composing
> the
> > Storm cluster, there should be no problem.
> > However, you need to figure out how to set the environment variables to
> > point to the right installation of Hadoop (set the HADOOP_HOME variable).
> > You just need to set your configuration files (e.g., hdfs-site.xml) to
> > point to the correct Hadoop cluster.
> >
> > Hope it helps,
> >
> > -- Gianmarco
> >
> > On Sat, Mar 12, 2016 at 4:21 PM, Eduardo Costa <eduardocosi@gmail.com>
> > wrote:
> >
> > > Hi, Gianmarco!
> > > Thank you so much by response!
> > > Now, I have another doubt: I run the SAMOA (in cluster mode) in a
> > different
> > > machine (cluster) from Hadoop cluster because I run the  SAMOA on top
> of
> > > Storm cluster. Is there some way to read arff files from this Hadoop
> > > cluster remote to run the SAMOA on top of Storm cluster?
> > > Sorry for bothering so much, but I need it to give continidade my
> > master's
> > > thesis in Brazil at the Federal University of the State of Rio de
> Janeiro
> > > (UNIRIO). As previously mentioned, I'm trying to build a rudimentary
> > > anomaly detection system using SAMOA, but I am a layman in relation to
> > > Samoa.
> > >
> > > Best regards,
> > > Eduardo.
> > >
> > > 2016-03-06 8:59 GMT-03:00 Gianmarco De Francisci Morales <
> > gdfm@apache.org
> > > >:
> > >
> > > > Hi Eduardo,
> > > > Yes, it is possible to read ARFF files from HDFS.
> > > > However, right now it is way more complicated than it should be, and
> > it's
> > > > not documented at all.
> > > > Thanks for asking the question.
> > > >
> > > > I managed to do it with this command line:
> > > >
> > > > ./bin/samoa local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar
> > > > "PrequentialEvaluation -s (org.apache.samoa.streams.ArffFileStream -s
> > > > HDFSFileStreamSource -f /user/$USER/covtypeNorm.arff)"
> > > >
> > > > But I had to do a small modification to HDFSFileStreamSource to make
> it
> > > > work, by adding this line after line 61
> > > >
> > > >     config.set("fs.hdfs.impl",
> > > >
> > > >
>  org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
> > > >
> > > > Things to notice:
> > > > - We rely on HADOOP_HOME being set to your hadoop installation. This
> > > should
> > > > be made more robust.
> > > > - I used explicitly org.apache.samoa.streams.ArffFileStream as the
> > normal
> > > > ArffFileStream does not support HDFS (this is related to SAMOA-14
> > > > <https://issues.apache.org/jira/browse/SAMOA-14>, and I plan to
fix
> it
> > > > asap).
> > > > - I will add the snippet of code above in the same patch for SAMOA-14
> > > >
> > > >
> > > > Hope it helps,
> > > >
> > > >
> > > >
> > > >
> > > > -- Gianmarco
> > > >
> > > > On Fri, Feb 12, 2016 at 6:45 PM, Eduardo Costa <
> eduardocosi@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Could I pass arff files, by "-s " argumment, from hadoop HDFS to
> > SAMOA.
> > > > If
> > > > > I could, how to make?
> > > > >
> > > > > Best regards,
> > > > > Eduardo.
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message