spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: RDD equivalent of HBase Scan
Date Thu, 26 Mar 2015 13:54:59 GMT
In examples//src/main/scala/org/apache/spark/examples/HBaseTest.scala,
TableInputFormat is used.
TableInputFormat accepts parameter

  public static final String SCAN = "hbase.mapreduce.scan";

where if specified, Scan object would be created from String form:

    if (conf.get(SCAN) != null) {

      try {

        scan = TableMapReduceUtil.convertStringToScan(conf.get(SCAN));

You can use TableMapReduceUtil#convertScanToString() to convert a Scan
which has filter(s) and pass to TableInputFormat

Cheers

On Thu, Mar 26, 2015 at 6:46 AM, Stuart Layton <stuart.layton@gmail.com>
wrote:

> HBase scans come with the ability to specify filters that make scans very
> fast and efficient (as they let you seek for the keys that pass the filter).
>
> Do RDD's or Spark DataFrames offer anything similar or would I be required
> to use a NoSQL db like HBase to do something like this?
>
> --
> Stuart Layton
>

Mime
View raw message