This looks like a feature engineering task. I think VectorSlicer can help with your case. Please refer to http://spark.apache.org/docs/latest/ml-features.html#vectorslicer .


I am performing Regression using Random Forest. In my input vector, I want the algorithm to ignore certain columns/features while training the classifier and also while prediction. These are basically Id columns. I checked the documentation and could not find any information on the same.

