I ran a dataset of 200 columns and 0.2M records in a cluster of 1 master 18 GB, 2 slaves 32 GB each, 16 cores/slave, took around 772 minutes for a very large ML tuning based job (training).

Now, my requirement is to run the same operation on 3M records. Any idea on how we should proceed? Should we go for a vertical scaling or a horizontal one? How should this problem be approached in a stepwise/systematic manner?

Thanks in advance.