Dear all,

I would like to run a simple spark job on EMR with yarn.

My job is the follows:

public void EMRRun() {
    SparkConf sparkConf = new SparkConf().setAppName("RunEMR").setMaster("yarn-cluster");
    sparkConf.set("spark.executor.memory", "13000m");
    JavaSparkContext ctx = new JavaSparkContext(sparkConf);
    System.out.println(ctx.appName());

    List<Integer> list = new LinkedList<Integer>();
    for (int i =0; i<10000; i++){
        list.add(i);
    }

    JavaRDD<Integer> listRDD = ctx.parallelize(list);
    List<Integer> results = listRDD.collect();

    for (Integer i : results){
        System.out.println(i);
    }

    ctx.stop();

}

public static void main(String[] args) {
    SparkTest sp = new SparkTest();
    sp.EMRRun();
}

On EMR I run the spark with spark-submit with the following:

./spark-submit --class com.collokia.ml.stackoverflow.usertags.browserhistory.sparkTestJava.SparkTest --master yarn-cluster --executor-memory 512m --num-executors 10 /home/hadoop/MLyBigData.jar

After that finished I tried to see yarn log, but I got this:
 yarn logs -applicationId application_1418123020170_0032
14/12/09 20:29:26 INFO client.RMProxy: Connecting to ResourceManager at /172.31.3.155:9022
Logs not available at /tmp/logs/hadoop/logs/application_1418123020170_0032
Log aggregation has not completed or is not enabled.

But I modified the yarn-site.xml as:
<property><name>yarn.log-aggregation-enable</name><value>true</value></property>
<property><name>yarn.log-aggregation.retain-seconds</name><value>-1</value></property>
<property><name>yarn.log-aggregation.retain-check-interval-seconds</name><value>30</value></property>

I use AMI version of 3.2.3, spark 1.1.0 on hadoop 2.4

Any suggestions how can I see the logs of the yarn?
Thanks,
Istvan