chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (JIRA)" <>
Subject [jira] Commented: (CHUKWA-22) Need index for chukwa sequence files
Date Sun, 19 Jul 2009 03:29:08 GMT


Eric Yang commented on CHUKWA-22:

Building index file would not be sufficient to serve chukwa data straight from HDFS for long
term operation.  The cost for keeping index in memory will eventually require yet another
distributed system to manage the index files.  Instead of reinvent the wheel, chukwa should
adopt a big table like solution like hbase to manage the data regions.

mapreduce-to-hbase example ( looks like exactly
what Chukwa needs.  Hbase table schema for chukwa could look like this:

Table: SystemMetrics-[TimeType]
Column Family: cpu
Column Family: memory
Column Family: disk
Column Family: temperature
Column Family: network
Column Family: default
Column Family: log

Each row represent 1 minute average, 5 minutes average, etc.  This is determined on the time

Example of a column could be: idle:hostname1, busy:hostname1, idle:hostname2, busy: hostname2

log column family keeps the raw log entries for log viewing.

> Need index for chukwa sequence files
> ------------------------------------
>                 Key: CHUKWA-22
>                 URL:
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: Data Processors
>         Environment: Redhat EL 5.1 and Java 6
>            Reporter: Eric Yang
>            Assignee: Eric Yang
> Chukwa has ability to collect large volume of data, but the lack of index prevents Chukwa
front end to serve data straight from HDFS.  This jira is the place holder for designing a
indexing service for Chukwa.  The plan is to create indexing service base on available software
like lucene or katta.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message