I am using Spark 1.4.1 , in stand-alone mode, on a cluster of 3 nodes.

Using Spark sql and Hive Context, I am trying to run a simple scan query on an existing Hive table (which is an external table consisting of rows in text files stored in HDFS - it is NOT parquet, ORC or any other richer format). 

DataFrame res = hiveCtx.sql("SELECT * FROM lineitem WHERE L_LINENUMBER < 0");

What I observe is the performance of this full scan in Spark is not comparable with Hive (it is almost 4 times slower). Checking the resource usage, what I see is workers/executors do not do parallel scans but they scan on a per-node basis (first executors from the worker(s) on node 1 do reading from disk, while other two nodes are not doing I/O and just receive data from the first node and through network, then 2nd node does the scan and then the third one).
I also realized that if I load this data file directly from my spark context (using textFile() ) and run count() on that (not using spark sql) then I can get a better performance by increasing number of partitions. I am just trying to do the same thing (increasing number of partitions in the beginning) in Spark sql as:

var tab = sqlContext.read.table("lineitem");

but none of repartition() or coalesce() methods actually work - they do not return an error, but if I check

var p = tab.rdd.partitions.size;

before and after calling any of them, it returns the same number of partitions.

I am just wondering how I can change the number of partitions for a Hive external table, in Spark Sql.

Any help/suggestion would be appreciated.