crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-470) Add hdfs/yarn minicluster crunch pipeline
Date Thu, 11 Sep 2014 11:16:34 GMT


Gabriel Reid commented on CRUNCH-470:

Do you mean the addition of a new Pipeline implementation (in addition to MemPipeline, MRPipeline,
and SparkPipeline)? The MRPipeline implementation will already run on YARN as long as Crunch
is compiled for hadoop2, so there shouldn't be a new Pipeline impl needed for this.

On the other hand, if you're referring to testing pipelines on a pseudo-distributed mini cluster,
that is already possible -- this is what's actually done in the HFileTargetIT integration
test, a mini-cluster (with HDFS, etc) is spun up and the pipeline is run there.

> Add hdfs/yarn minicluster crunch pipeline
> -----------------------------------------
>                 Key: CRUNCH-470
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.3
>            Reporter: Rafal Wojdyla
>            Assignee: Josh Wills
>            Priority: Minor
> Crunch currently has two pipelines:
> * MemPipeline
> * MRPipeline
> MemPipeline is in-memory pipelines based on local in-memory mapreduce mode.
> MRPipeline is distributed pipeline based on distributed MapReduce.
> Using HDFS/YARN Minicluster it's possible to better emulate Hadoop cluster, and it could
be a 'final test' before running on the cluster.

This message was sent by Atlassian JIRA

View raw message