beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kenneth Knowles (JIRA)" <>
Subject [jira] [Created] (BEAM-2267) Final files for WordCount not appearing with Apex on YARN
Date Thu, 11 May 2017 22:40:04 GMT
Kenneth Knowles created BEAM-2267:

             Summary: Final files for WordCount not appearing with Apex on YARN
                 Key: BEAM-2267
             Project: Beam
          Issue Type: Bug
          Components: runner-apex
            Reporter: Kenneth Knowles
            Assignee: Thomas Weise

When I run WordCount with the Apex runner on a YARN cluster - specifically Dataproc, reading/writing
GCS - the word counts are all written to temporary files but they are never moved to their
final destination.

Hadoop version 2.7.3
Beam RC 2.0.0

Steps to repro:

1. Instantiate archetype (see below)
2. Build uber jar {{mvn --settings ../beamrc-settings.xml clean package -P apex-runner}}
3. SCP to master (or wherever you'd like to launch from)
4. {{java -cp word-count-beam-0.1.jar beamrc.WordCount --runner=ApexRunner --embeddedExecution=false
--inputfile=gs://apache-beam-samples/shakespeare/winterstale-personae --output=SOMEWHERE}}

Appendix: steps to instantiate RC archetype:

Build an RC-specific {{beamrc-settings.xml}}
          <!-- This id _must_ be "archetype" -->

And then instantiate like so
mvn archetype:generate \
      --settings beam-rc-settings.xml \
      -D archetypeCatalog=internal \
      -D archetypeGroupId=org.apache.beam \
      -D archetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
      -D archetypeVersion=2.0.0 \
      -D groupId=beamrc \
      -D artifactId=word-count-beam \
      -D version="0.1" \
      -D package=beamrc \
      -D interactiveMode=false

This message was sent by Atlassian JIRA

View raw message