chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ari Rabkin (JIRA)" <>
Subject [jira] Commented: (CHUKWA-462) Store the cluster in the key for performance and easier customization on mappers
Date Wed, 10 Mar 2010 19:29:27 GMT


Ari Rabkin commented on CHUKWA-462:

I am okay with this, but don't know that part of the code as well as Eric and Jerome.  I don't
feel comfortable committing it without giving them a chance to comment.   Eric, can you confirm
that this should go in?

> Store the cluster in the key for performance and easier customization on mappers
> --------------------------------------------------------------------------------
>                 Key: CHUKWA-462
>                 URL:
>             Project: Hadoop Chukwa
>          Issue Type: Improvement
>          Components: Data Processors
>            Reporter: Guille -bisho-
>         Attachments: cluster_in_ChukwaRecordKey.v3.diff
> Right now the chukwa framework is storing the destination cluster as a tag in the Chunk.
Then the tags are copied to the ChukwaRecord, and before storing it, it's parsed with a regular
expression from each record.
> - It's slow to apply a preg to each record
> - It's harder to modify the destination cluster from the mapper, you have to tweak the
tags field.
> - Takes unneeded space on records storing the cluster on each of them.
> The proposed path:
> - Extracts the cluster from chunk tags just once per chunk, much faster.
> - Stores the cluster in the key, so it's easy to recover.
> - It's easy to tweak from the mapper. Just alter it with key.setClusterName(String clusterName)
> - Strips the cluster from the tags field of the resulting chukwa records. If the tags
field is empty, completely skips setting the tags field in the record.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message