samoa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From edi-bice <>
Subject [GitHub] incubator-samoa pull request: Patch for SAMOA-58 (Samoa AvroFileSt...
Date Mon, 22 Feb 2016 14:43:15 GMT
GitHub user edi-bice opened a pull request:

    Patch for SAMOA-58 (Samoa AvroFileStream from HDFSFileStreamSource stops at end of first

    FileStreamSource seemed to support multiple files but during my testing it turned out
otherwise - Samoa AvroFileStream from HDFSFileStreamSource stops at end of first file. I had
to change AvroFileStream, ArffFileStream and their parent FileStream in order to make this
    See following JIRA for additional detail:
    Additionally, I modified bin/samoa, pom.xml, SystemUtils (as well as added a resource)
to fix reading from HDFS on my cluster.
    A seemingly unrelated change is the explicit test for supported Avro types so as to filter
out any fields that are not supported instead of assuming all non-nominal (non-enum) fields
are numeric and failing during reading.

You can merge this pull request into a Git repository by running:

    $ git pull master

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #48
commit 5cbbcfab94db47732ab44b3b9d752c45f02e2f30
Author: edi_bice <>
Date:   2016-02-17T15:45:07Z

    Only add fields of supported types (double, float, long, int and enum) rather than adding
and defaulting all non-enum to numeric and failing at value parse time

commit d5a055f5c5ff0c6787beaa03234375cdcbb89cb5
Author: edi_bice <>
Date:   2016-02-17T21:53:02Z

    until we change samza to produce files with .avro extension

commit ba73bb24d9477207e8dfd85fbf478be1e3877c7d
Author: edi_bice <>
Date:   2016-02-18T22:06:12Z

    A tentative solution to issue described in:

commit 29e0379949eb7847ea46bfe432d98d90dff993e9
Author: edi_bice <>
Date:   2016-02-19T16:55:03Z

    Issue described in was apparently more
complicated than what was expected in previous commit. While we did succeed in replacing the
first exhausted file stream with a new one, the loader was not changed and would return null.
This rework of AvroFileStream, FileStream and ArffFileStream hopefully cleans things up a
bit and allows multi-file streams of either (Avro or Arff) type.

commit fe093240a248e26be84ded4d378acc1d5c81d599
Author: edi_bice <>
Date:   2016-01-25T17:02:22Z

    configure don't code

commit 99f04bb4396190e92af2a43e56d005cb502357ca
Author: Edi Bice <>
Date:   2016-02-22T14:25:43Z

    cherry-picked from faf branch - changes needed to be able to read from HDFS on a YARN
2.7.1 cluster


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

View raw message