spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ron Ayoub <>
Subject RE: collecting fails - requirements for collecting (clone, hashCode etc?)
Date Wed, 03 Dec 2014 14:01:30 GMT
I didn't realize I do get a nice stack trace if not running in debug mode. Basically, I believe
Document has to be serializable. 
But since the question has already been asked, are the other requirements for objects within
an RDD that I should be aware of. serializable is very understandable. How about clone, hashCode,

Subject: collecting fails - requirements for collecting (clone, hashCode etc?)
Date: Wed, 3 Dec 2014 07:48:53 -0600

The following code is failing on the collect. If I don't do the collect and go with a JavaRDD<Document>
it works fine. Except I really would like to collect. At first I was getting an error regarding
JDI threads and an index being 0. Then it just started locking up. I'm running the spark context
locally on 8 cores. 

		long count = documents.filter(d -> d.getFeatures().size() > Parameters.MIN_CENTROID_FEATURES).count();
	List<Document> sampledDocuments = documents.filter(d -> d.getFeatures().size() >
Parameters.MIN_CENTROID_FEATURES)				.sample(false, samplingFraction(count)).collect();

View raw message