[ https://issues.apache.org/jira/browse/CRUNCH-586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Wills updated CRUNCH-586:
------------------------------
Attachment: CRUNCH-586.patch
Patch for this with an integration test; note that this only fixes the read-side at the moment,
there's still more work to do to figure out how to make the write-side HBaseTarget work correctly
with the SparkPipeline.
> SparkPipeline does not work with HBaseSourceTarget
> --------------------------------------------------
>
> Key: CRUNCH-586
> URL: https://issues.apache.org/jira/browse/CRUNCH-586
> Project: Crunch
> Issue Type: Bug
> Components: Spark
> Affects Versions: 0.13.0
> Reporter: Stefan De Smit
> Assignee: Josh Wills
> Attachments: CRUNCH-586.patch
>
>
> final Pipeline pipeline = new SparkPipeline("local", "crunchhbase", HBaseInputSource.class,
conf);
> final PTable<ImmutableBytesWritable, Result> read = pipeline.read(new HBaseSourceTarget("t1",
new Scan()));
> return an empty table, while it works with MRPipeline.
> root cause is the combination of sparks getJavaRDDLike method:
> source.configureSource(job, -1);
> Converter converter = source.getConverter();
> JavaPairRDD<?, ?> input = runtime.getSparkContext().newAPIHadoopRDD(
> job.getConfiguration(),
> CrunchInputFormat.class,
> converter.getKeyClass(),
> converter.getValueClass());
> That assumes "CrunchInputFormat.class" (and always uses -1)
> and hbase configureSoruce method:
> if (inputId == -1) {
> job.setMapperClass(CrunchMapper.class);
> job.setInputFormatClass(inputBundle.getFormatClass());
> inputBundle.configure(conf);
> } else {
> Path dummy = new Path("/hbase/" + table);
> CrunchInputs.addInputPath(job, dummy, inputBundle, inputId);
> }
> easiest solution I see, is always calling CrunchInputs.addInputPath, in every source.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
|