samoa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edi Bice (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SAMOA-58) Samoa AvroFileStream from HDFSFileStreamSource stops at end of first file
Date Thu, 18 Feb 2016 20:52:18 GMT

     [ https://issues.apache.org/jira/browse/SAMOA-58?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Edi Bice updated SAMOA-58:
--------------------------
    Description: 
It appears Samoa is capable of streaming a collection of files as a single stream effectively
concatenating the files. However using Samoa AvroFileStream from HDFSFileStreamSource seems
the stream stops at end of first file:

bin/samoa local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar "PrequentialEvaluation -i
-1 -l (classifiers.ensemble.Bagging -s 100) -s (AvroFileStream -s HDFSFileStreamSource -f
/tmp/order_and_feats_flat_avro/2016_02_18/ -c 1 -e binary) -f 10000"

2016-02-18 20:43:20,991 [main] INFO  org.apache.samoa.evaluation.EvaluatorProcessor (EvaluatorProcessor.java:183)
- last event is received!
2016-02-18 20:43:20,991 [main] INFO  org.apache.samoa.evaluation.EvaluatorProcessor (EvaluatorProcessor.java:184)
- total count: 262144

...

2016-02-18 20:43:20,993 [main] INFO  org.apache.samoa.evaluation.EvaluatorProcessor (EvaluatorProcessor.java:191)
- total evaluation time: 34 seconds for 262144 instances

bash-4.1$ hadoop fs -ls /tmp/order_and_feats_flat_avro/2016_02_18 | more
Found 70 items
-rw-r--r--   3 yarn hdfs  230855335 2016-02-18 16:01 /tmp/order_and_feats_flat_avro/2016_02_18/hdfs-1a238673-c4ec-4462-be67-78d573efa790-00001
-rw-r--r--   3 yarn hdfs  229800273 2016-02-18 16:04 /tmp/order_and_feats_flat_avro/2016_02_18/hdfs-1a238673-c4ec-4462-be67-78d573efa790-00002
...

  was:
It appears Samoa is capable of streaming a collection of files as a single stream effectively
concatenating the files. However using Samoa AvroFileStream from HDFSFileStreamSource seems
the stream stops at end of first file:

bin/samoa local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar "PrequentialEvaluation -i
-1 -l (classifiers.ensemble.Bagging -s 100) -s (AvroFileStream -s HDFSFileStreamSource -f
/tmp/order_and_feats_flat_avro/2016_02_18/ -c 1 -e binary) -f 10000"

2016-02-18 20:43:20,991 [main] INFO  org.apache.samoa.evaluation.EvaluatorProcessor (EvaluatorProcessor.java:183)
- last event is received!
2016-02-18 20:43:20,991 [main] INFO  org.apache.samoa.evaluation.EvaluatorProcessor (EvaluatorProcessor.java:184)
- total count: 262144

...

2016-02-18 20:43:20,993 [main] INFO  org.apache.samoa.evaluation.EvaluatorProcessor (EvaluatorProcessor.java:191)
- total evaluation time: 34 seconds for 262144 instances

bash-4.1$ hadoop fs -ls /tmp/order_and_feats_flat_avro/2016_02_18 | more
Found 70 items
-rw-r--r--   3 yarn hdfs  230855335 2016-02-18 16:01 /tmp/order_and_feats_flat_a
vro/2016_02_18/hdfs-1a238673-c4ec-4462-be67-78d573efa790-00001
-rw-r--r--   3 yarn hdfs  229800273 2016-02-18 16:04 /tmp/order_and_feats_flat_a
vro/2016_02_18/hdfs-1a238673-c4ec-4462-be67-78d573efa790-00002


> Samoa AvroFileStream from HDFSFileStreamSource stops at end of first file
> -------------------------------------------------------------------------
>
>                 Key: SAMOA-58
>                 URL: https://issues.apache.org/jira/browse/SAMOA-58
>             Project: SAMOA
>          Issue Type: Bug
>          Components: SAMOA-Instances
>         Environment: RHEL 6.6, java 1.8.0_72
>            Reporter: Edi Bice
>
> It appears Samoa is capable of streaming a collection of files as a single stream effectively
concatenating the files. However using Samoa AvroFileStream from HDFSFileStreamSource seems
the stream stops at end of first file:
> bin/samoa local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar "PrequentialEvaluation
-i -1 -l (classifiers.ensemble.Bagging -s 100) -s (AvroFileStream -s HDFSFileStreamSource
-f /tmp/order_and_feats_flat_avro/2016_02_18/ -c 1 -e binary) -f 10000"
> 2016-02-18 20:43:20,991 [main] INFO  org.apache.samoa.evaluation.EvaluatorProcessor (EvaluatorProcessor.java:183)
- last event is received!
> 2016-02-18 20:43:20,991 [main] INFO  org.apache.samoa.evaluation.EvaluatorProcessor (EvaluatorProcessor.java:184)
- total count: 262144
> ...
> 2016-02-18 20:43:20,993 [main] INFO  org.apache.samoa.evaluation.EvaluatorProcessor (EvaluatorProcessor.java:191)
- total evaluation time: 34 seconds for 262144 instances
> bash-4.1$ hadoop fs -ls /tmp/order_and_feats_flat_avro/2016_02_18 | more
> Found 70 items
> -rw-r--r--   3 yarn hdfs  230855335 2016-02-18 16:01 /tmp/order_and_feats_flat_avro/2016_02_18/hdfs-1a238673-c4ec-4462-be67-78d573efa790-00001
> -rw-r--r--   3 yarn hdfs  229800273 2016-02-18 16:04 /tmp/order_and_feats_flat_avro/2016_02_18/hdfs-1a238673-c4ec-4462-be67-78d573efa790-00002
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message