spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel, Ronald (ELS-SDG)" <R.Dan...@elsevier.com>
Subject Accessing neighboring elements in an RDD
Date Wed, 03 Sep 2014 17:33:24 GMT
Hi all,

Assume I have read the lines of a text file into an RDD:

    textFile = sc.textFile("SomeArticle.txt")

Also assume that the sentence breaks in SomeArticle.txt were done by machine and have some
errors, such as the break at Fig. in the sample text below.

Index	Text
N	 ...as shown in Fig.
N+1	1.
N+2	The figure shows...

What I want is an RDD with:

N	... as shown in Fig. 1.
N+1	The figure shows...

Is there some way a filter() can look at neighboring elements in an RDD? That way I could
look, in parallel, at neighboring elements in an RDD and come up with a new RDD that may have
a different number of elements.  Or do I just have to sequentially iterate through the RDD?

Thanks,
Ron


Mime
View raw message