2014=EB=85=84 10=EC=9B=94 18=EC=9D=BC =ED=86=A0=EC=9A=94=EC=9D= =BC, Aaron Davidson<ilikerps@gmail= .com>=EB=8B=98=EC=9D=B4 =EC=9E=91=EC=84=B1=ED=95=9C =EB=A9=94=EC=8B= =9C=EC=A7=80:
The "= minPartitions" argument of textFile/hadoopFile cannot decrease the num= ber of splits past the physical number of blocks/files. So if you have 3 HD= FS blocks, asking for 2 minPartitions will still give you 3 partitions (hen= ce the "min"). It can, however, convert a file with fewer HDFS bl= ocks into more (so you could ask for and get 4 partitions), assuming the bl= ocks are "splittable". HDFS blocks are usually splittable, but if= it's compressed with something like bzip2, it would not be.

If you wish to combine splits from a larger file, you can use RDD#= coalesce. With shuffle=3Dfalse, this will simply concatenate partitions, bu= t it does not provide any ordering guarantees (it uses an algorithm which a= ttempts to coalesce co-located partitions, to maintain locality information= ).=C2=A0

coalesce() with shuffle=3Dtrue causes all= of the elements will be shuffled around randomly into new partitions, whic= h is an expensive operation but guarantees uniformity of data distribution.=

On Sa= t, Oct 18, 2014 at 10:47 AM, Mayur Rustagi wrote:
Does it retain the order if its pull= ing from the hdfs blocks, meaning=C2=A0
if =C2=A0file1 =3D> a, b, c = partition in order
if I convert to 2 partition read will it map t= o ab, c or a, bc or it can also be a, cb ?

Mayur Rustagi<= br>Ph: +1 (760) 203 3257

On Sat, Oct 18, 2014 at 9:09 AM, Ilya Ganeli= n <ilganeli@gmail.com> wrote:

Also - if you&= #39;re doing a text file read you can pass the number of resulting partitio= ns as the second argument.

On Oct 17, 2014 9:05 PM, "Larry Liu" &= lt;larryliu05@gmail.com> wrote: