samoa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gianmarco De Francisci Morales <g...@apache.org>
Subject Re: Arff files from Hadoop HDFS
Date Sun, 13 Mar 2016 07:48:19 GMT
Hi Eduardo,

As long as you can access the HDFS cluster from the machines composing the
Storm cluster, there should be no problem.
However, you need to figure out how to set the environment variables to
point to the right installation of Hadoop (set the HADOOP_HOME variable).
You just need to set your configuration files (e.g., hdfs-site.xml) to
point to the correct Hadoop cluster.

Hope it helps,

-- Gianmarco

On Sat, Mar 12, 2016 at 4:21 PM, Eduardo Costa <eduardocosi@gmail.com>
wrote:

> Hi, Gianmarco!
> Thank you so much by response!
> Now, I have another doubt: I run the SAMOA (in cluster mode) in a different
> machine (cluster) from Hadoop cluster because I run the  SAMOA on top of
> Storm cluster. Is there some way to read arff files from this Hadoop
> cluster remote to run the SAMOA on top of Storm cluster?
> Sorry for bothering so much, but I need it to give continidade my master's
> thesis in Brazil at the Federal University of the State of Rio de Janeiro
> (UNIRIO). As previously mentioned, I'm trying to build a rudimentary
> anomaly detection system using SAMOA, but I am a layman in relation to
> Samoa.
>
> Best regards,
> Eduardo.
>
> 2016-03-06 8:59 GMT-03:00 Gianmarco De Francisci Morales <gdfm@apache.org
> >:
>
> > Hi Eduardo,
> > Yes, it is possible to read ARFF files from HDFS.
> > However, right now it is way more complicated than it should be, and it's
> > not documented at all.
> > Thanks for asking the question.
> >
> > I managed to do it with this command line:
> >
> > ./bin/samoa local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar
> > "PrequentialEvaluation -s (org.apache.samoa.streams.ArffFileStream -s
> > HDFSFileStreamSource -f /user/$USER/covtypeNorm.arff)"
> >
> > But I had to do a small modification to HDFSFileStreamSource to make it
> > work, by adding this line after line 61
> >
> >     config.set("fs.hdfs.impl",
> >
> >         org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
> >
> > Things to notice:
> > - We rely on HADOOP_HOME being set to your hadoop installation. This
> should
> > be made more robust.
> > - I used explicitly org.apache.samoa.streams.ArffFileStream as the normal
> > ArffFileStream does not support HDFS (this is related to SAMOA-14
> > <https://issues.apache.org/jira/browse/SAMOA-14>, and I plan to fix it
> > asap).
> > - I will add the snippet of code above in the same patch for SAMOA-14
> >
> >
> > Hope it helps,
> >
> >
> >
> >
> > -- Gianmarco
> >
> > On Fri, Feb 12, 2016 at 6:45 PM, Eduardo Costa <eduardocosi@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Could I pass arff files, by "-s " argumment, from hadoop HDFS to SAMOA.
> > If
> > > I could, how to make?
> > >
> > > Best regards,
> > > Eduardo.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message