cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] Commented: (CASSANDRA-342) hadoop integration
Date Tue, 12 Jan 2010 23:06:54 GMT


Jonathan Ellis commented on CASSANDRA-342:

To get around the hadoop-stuff-has-to-run-in-a-different-JVM problem: what if we had Hadoop
operate on Cassandra snapshots?  For the kind of batch oriented, non-latency-sensitive work
that Hadoop is a good fit for, that should be perfect: the Hadoop Task can open up ColumnFamilyStore
objects on the snapshotted sstables, without having to start a full server which is nasty.

Otherwise IMO we should patch Hadoop to allow Tasks to run on an existing JVM.  I'm surprised
HBase didn't do that: doing the copies of *all input* from one jvm to another is not insignificant.
 (You could take that approach w/ cassandra to, using getRangeSlice from StorageProxy started
with StorageService.initClient -- actually we would want to add initLocalClient probably to
mean "I only plan to query the machine I am on" -- but that would be a case of working around
bad design instead of fixing it.)

> hadoop integration
> ------------------
>                 Key: CASSANDRA-342
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jeff Hodges
>         Attachments: 0001-v3-CASSANDRA-342.-Set-up-for-the-hadoop-commits.patch, 0002-v3-CASSANDRA-342.-Working-hadoop-support.patch,
> Some discussion on -dev:

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message