spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Singh Hora <hora.a...@gmail.com>
Subject Spark Hbase job taking long time
Date Wed, 06 Aug 2014 12:54:35 GMT
Hi All,

I am trying to run a SQL query on HBase using spark job ,till now i am able
to get the desierd results but as the data set size increases Spark job is
taking a long time 
I believe i am doing something wrong,as after going through documentation
and videos discussing on  spark performance  it should not take more then
couple of seconds.

PFB code snippet 
HBase table contains 10lakh rows

JavaPairRDD<ImmutableBytesWritable, Result> pairRdd = ctx
				.newAPIHadoopRDD(conf, TableInputFormat.class,
						ImmutableBytesWritable.class,
						org.apache.hadoop.hbase.client.Result.class).cache();

JavaRDD<Person> people = pairRdd
				.map(new Function<Tuple2&lt;ImmutableBytesWritable, Result>, Person>() {

					public Person call(Tuple2<ImmutableBytesWritable, Result> v1)
							throws Exception {
						System.out.println("comming");
						Person person = new Person();
						String key=Bytes.toString(v1._2.getRow());
						key=key.substring(0,key.lastIndexOf("_"));
						person.setCalling(Long.parseLong(key));
						person.setCalled(Bytes.toLong(v1._2.getValue(
								Bytes.toBytes("si"), Bytes.toBytes("called"))));
						person.setTime(Bytes.toLong(v1._2.getValue(
								Bytes.toBytes("si"), Bytes.toBytes("at"))));

						return person;
					}
				});
JavaSchemaRDD schemaPeople = sqlCtx.applySchema(people, Person.class);
		schemaPeople.registerAsTable("people");

		// SQL can be run over RDDs that have been registered as tables.
		JavaSchemaRDD teenagers = sqlCtx
				.sql("SELECT count(*) from people group by calling");
		teenagers.printSchema();


I am running spark using start-all.sh script with 2 workers 

Any pointers will be of a great help
Regards,





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Hbase-job-taking-long-time-tp11541.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message