cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeremy Hanna (JIRA)" <j...@apache.org>
Subject [jira] Updated: (CASSANDRA-1497) Add input support for Hadoop Streaming
Date Mon, 18 Oct 2010 21:12:29 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jeremy Hanna updated CASSANDRA-1497:
------------------------------------

    Attachment: 0001-An-updated-avro-based-input-streaming-solution.patch

Updated to use avro.  Despite there being no non-string based map in avro, I just used a key/value
pair list for the input values.  So using a binary search (since the column names are ordered)
is only O(log n) to get each column out of the results.  If they use a slice predicate too,
it shouldn't be bad at all.

The current patch is mostly done - some errors with the streaming script.  Also docs need
updating and python needs to use bisect or something similar to do the binary search over
the list of columns.  Currently it just iterates and searches.

> Add input support for Hadoop Streaming
> --------------------------------------
>
>                 Key: CASSANDRA-1497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>             Fix For: 0.7.1
>
>         Attachments: 0001-An-updated-avro-based-input-streaming-solution.patch
>
>
> related to CASSANDRA-1368 - create similar functionality for input streaming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message