spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Text
Date Fri, 27 Jan 2017 13:58:51 GMT
Sorry the message was not complete: the key is the file position, so if you sort by key the
lines will be in the same order as in the original file 

> On 27 Jan 2017, at 14:45, Jörn Franke <jornfranke@gmail.com> wrote:
> 
> I agree with the previous statements. You cannot expect any ordering guarantee. This
means you need to ensure that the same ordering is done as the original file. Internally Spark
is using the Hadoop Client libraries - even if you do not have Hadoop installed, because it
is a flexible transparent solution to access many file systems including the local one. In
the case you mentioned it is the TextInputFileFormat that returns a key and the value. The
key i
> This means you can sort by the key.
> However to access this key you must use the hadoopFile method of Sparl together with
the TextInputFormat.
> 
>> On 27 Jan 2017, at 10:44, Soheila S. <soheila518@gmail.com> wrote:
>> 
>> Hi All,
>> I read a test file using sparkContext.textfile(filename) and assign it to an RDD
and process the RDD (replace some words) and finally write it to a text file using rdd.saveAsTextFile(output).
>> Is there any way to be sure the order of the sentences will not be changed? I need
to have the same text with some corrected words.
>> 
>> thanks!
>> 
>> Soheila

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message