spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jakub Stransky <stransky...@gmail.com>
Subject Write RDD to Elasticsearch
Date Thu, 24 Mar 2016 10:09:58 GMT
Hi,

I am trying to write JavaPairRDD into elasticsearch 1.7 using spark 1.2.1
using elasticsearch-hadoop 2.0.2

JavaPairRDD<NullWritable, Text> output = ...
 final JobConf jc = new JobConf(output.context().hadoopConfiguration());
                jc.set("mapred.output.format.class",
"org.elasticsearch.hadoop.mr.EsOutputFormat");
                jc.setOutputCommitter(FileOutputCommitter.class);
                jc.set(ConfigurationOptions.ES_RESOURCE_WRITE,
"streaming-chck/checks");
                jc.set(ConfigurationOptions.ES_NODES, "localhost:9200");
                FileOutputFormat.setOutputPath(jc, new Path("-"));

                output.saveAsHadoopDataset(jc);

I am getting following error

16/03/19 13:54:51 ERROR Executor: Exception in task 3.0 in stage 14.0 (TID 279)
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found
unrecoverable error [Bad Request(400) -
[MapperParsingException[Malformed content, must start with an
object]]]; Bailing out..
        at org.elasticsearch.hadoop.rest.RestClient.retryFailedEntries(RestClient.java:199)
        at org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:165)
        at org.elasticsearch.hadoop.rest.RestRepository.sendBatch(RestRepository.java:170)
        at org.elasticsearch.hadoop.rest.RestRepository.close(RestRepository.java:189)
        at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.doClose(EsOutputFormat.java:293)
        at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.close(EsOutputFormat.java:280)
        at org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:102)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1072)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1051)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:56)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)16/03/19 13:54:51
ERROR Executor: Exception in task 3.0 in stage 14.0 (TID 279)
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found
unrecoverable error [Bad Request(400) -
[MapperParsingException[Malformed content, must start with an
object]]]; Bailing out..
        at org.elasticsearch.hadoop.rest.RestClient.retryFailedEntries(RestClient.java:199)
        at org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:165)
        at org.elasticsearch.hadoop.rest.RestRepository.sendBatch(RestRepository.java:170)
        at org.elasticsearch.hadoop.rest.RestRepository.close(RestRepository.java:189)
        at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.doClose(EsOutputFormat.java:293)
        at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.close(EsOutputFormat.java:280)
        at org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:102)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1072)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1051)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:56)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

Seems that I don't have correct Writable here or missing some setting when
trying to write json serialized messages as Strings/Text.

Could anybody help?
Thanks
Jakub

Mime
View raw message