spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohamed Lrhazi <Mohamed.Lrh...@georgetown.edu>
Subject PySpark elasticsearch question
Date Tue, 09 Dec 2014 13:15:55 GMT
Hello,

Following a couple of tutorials, I cant seem to get pysprak to get any
"fields" from ES other than the document id?

I tried like so:

es_rdd =
sc.newAPIHadoopRDD(inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",keyClass="org.apache.hadoop.io.NullWritable",valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",conf={
"es.resource" : "en_2004/doc","es.nodes":"rap-es2.uis","es.query" :
"?fields=title,_source" })

es_rdd.take(1)

Always shows:

Out[13]: [(u'en_20040726_fbis_116728340038', {})]

How does one get more fields?


Thanks,
Mohamed.

Mime
View raw message