spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vikas Agarwal <vi...@infoobjects.com>
Subject Re: filtering a SchemaRDD
Date Sat, 15 Nov 2014 05:48:34 GMT
Hi, did you try using single quote instead of double around column name? I
faced similar situation with apache phoenix.

On Saturday, November 15, 2014, Daniel, Ronald (ELS-SDG) <
R.Daniel@elsevier.com> wrote:

>  Hi all,
>
>
>
> I have a SchemaRDD that Is loaded from a file. Each Row contains 7 fields,
> one of which holds the text for a sentence from a document.
>
>
>
>   # Load sentence data table
>
>   sentenceRDD = sqlContext.parquetFile('s3n://some/path/thing')
>
>   sentenceRDD.take(3)
>
> Out[20]: [Row(annotID=118, annotSet=u'ge', annotType=u'sentence',
> endOffset=20194, pii=u'0094576587900440', startOffset=20062, text=u'Paper
> IAF-86-85 presented at the 37th Congress of the International Astronautical
> Federation, Innsbruck, Austria, 4-11 October 1986.'), Row(annotID=163,
> annotSet=u'ge', annotType=u'sentence', endOffset=20249,
> pii=u'0094576587900440', startOffset=20194, text=u"The landsat sensors:
> Eosat's plans for landsats 6 and 7"), Row(annotID=190, annotSet=u'ge',
> annotType=u'sentence', endOffset=20342, pii=u'0094576587900440',
> startOffset=20334, text=u'Abstract')]
>
>
>
> I have this registered as a table and can query it with SQL select
> statments. I would also like to filter the RDD using text operations like
> regexps that have greated capabilities than SQL's LIKE operator. However,
> the code below does not work. Instead I get a runtime error.
>
>
>
>     openProbsRDD = sentenceRDD.filter(lambda row: "remains unknown" in
> row["text"] )
>
>     openProbsRDD.take(5)
>
> …
>
> TypeError: tuple indices must be integers, not str
>
> …
>
>
>
> If I use row[6] instead of row["text"] I get what I am looking for.
> However, finding the right numeric index could be a pain.
>
>
>
> Can I access the fields in a Row of a SchemaRDD by name, so that I can
> map, filter, etc. without a trial and error process of finding the right
> int for the fieldname?
>
>
>
> Thanks,
>
> Ron Daniel
>


-- 
Regards,
Vikas Agarwal
91 – 9928301411

InfoObjects, Inc.
Execution Matters
http://www.infoobjects.com
2041 Mission College Boulevard, #280
Santa Clara, CA 95054
+1 (408) 988-2000 Work
+1 (408) 716-2726 Fax

Mime
View raw message