chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (JIRA)" <>
Subject [jira] [Commented] (CHUKWA-734) Gora Storage System for Chuckwa Logs
Date Sun, 15 Feb 2015 08:07:11 GMT


Eric Yang commented on CHUKWA-734:

This would be something really great to have.  My recommendation is to write a Gora Writer
class which extends PipelineWriter.  Timestamp or time partition are primary element of a
log file, however, it is not a good idea to store monotonic increasing sequence row key in
hbase or any of the Big table style database.  What would you recommend to be design for primary
key and how it could ensure HBase region server are spread evenly?  We have another JIRA,
CHUKWA-667 which talks about the design of row key.  I am not satisfied with the row key design
that I outlined.  Having Gora in the mix may enable some interesting optimization.

> Gora Storage System for Chuckwa Logs
> ------------------------------------
>                 Key: CHUKWA-734
>                 URL:
>             Project: Chukwa
>          Issue Type: New Feature
>          Components: Data Collection
>    Affects Versions: 0.6.0
>            Reporter: Lewis John McGibbney
>             Fix For: 0.6.0
> I would like to build a Gora-backed log-to-datastore module for Chuckwa. I am going to
work on this today.
> Gora is an in-memory data modeling and storage abstraction 
> Gora powers the Apache Nutch 2.X software which generates a bunch of log data. Having
a Chuckwa monitoring tool for Nutch would be grand.

This message was sent by Atlassian JIRA

View raw message