The "= minPartitions" argument of textFile/hadoopFile cannot decrease the num= ber of splits past the physical number of blocks/files. So if you have 3 HD= FS blocks, asking for 2 minPartitions will still give you 3 partitions (hen= ce the "min"). It can, however, convert a file with fewer HDFS bl= ocks into more (so you could ask for and get 4 partitions), assuming the bl= ocks are "splittable". HDFS blocks are usually splittable, but if= it's compressed with something like bzip2, it would not be.

If you wish to combine splits from a larger file, you can use RDD#= coalesce. With shuffle=3Dfalse, this will simply concatenate partitions, bu= t it does not provide any ordering guarantees (it uses an algorithm which a= ttempts to coalesce co-located partitions, to maintain locality information= ).=C2=A0

coalesce() with shuffle=3Dtrue causes all= of the elements will be shuffled around randomly into new partitions, whic= h is an expensive operation but guarantees uniformity of data distribution.=

Does it retain the order if its pull= ing from the hdfs blocks, meaning=C2=A0
if =C2=A0file1 =3D> a, b, c = partition in order
if I convert to 2 partition read will it map t= o ab, c or a, bc or it can also be a, cb ?

Also - if you&= #39;re doing a text file read you can pass the number of resulting partitio= ns as the second argument.

