samoa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gianmarco De Francisci Morales <g...@apache.org>
Subject Re: Arff files from Hadoop HDFS
Date Sun, 06 Mar 2016 11:59:38 GMT
Hi Eduardo,
Yes, it is possible to read ARFF files from HDFS.
However, right now it is way more complicated than it should be, and it's
not documented at all.
Thanks for asking the question.

I managed to do it with this command line:

./bin/samoa local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar
"PrequentialEvaluation -s (org.apache.samoa.streams.ArffFileStream -s
HDFSFileStreamSource -f /user/$USER/covtypeNorm.arff)"

But I had to do a small modification to HDFSFileStreamSource to make it
work, by adding this line after line 61

    config.set("fs.hdfs.impl",

        org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());

Things to notice:
- We rely on HADOOP_HOME being set to your hadoop installation. This should
be made more robust.
- I used explicitly org.apache.samoa.streams.ArffFileStream as the normal
ArffFileStream does not support HDFS (this is related to SAMOA-14
<https://issues.apache.org/jira/browse/SAMOA-14>, and I plan to fix it
asap).
- I will add the snippet of code above in the same patch for SAMOA-14


Hope it helps,




-- Gianmarco

On Fri, Feb 12, 2016 at 6:45 PM, Eduardo Costa <eduardocosi@gmail.com>
wrote:

> Hi,
>
> Could I pass arff files, by "-s " argumment, from hadoop HDFS to SAMOA. If
> I could, how to make?
>
> Best regards,
> Eduardo.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message