The following code is failing on the collect. If I don't do the collect and go with a JavaRDD<Document> it works fine. Except I really would like to collect. At first I was getting an error regarding JDI threads and an index being 0. Then it just started locking up. I'm running the spark context locally on 8 cores. 


long count = documents.filter(d -> d.getFeatures().size() > Parameters.MIN_CENTROID_FEATURES).count();
List<Document> sampledDocuments = documents.filter(d -> d.getFeatures().size() > Parameters.MIN_CENTROID_FEATURES)
.sample(false, samplingFraction(count)).collect();