Please see the following two constants defined in TableInputFormat :
/** Column Family to Scan */
public static final String SCAN_COLUMN_FAMILY =
"hbase.mapreduce.scan.column.family";
/** Space delimited list of columns and column families to scan. */
public static final String SCAN_COLUMNS = "hbase.mapreduce.scan.columns";
CellCounter accepts these parameters. You can play with CellCounter to see
how they work.
FYI
On Mon, Jul 2, 2018 at 4:01 AM, revolutionisme <revolutionisme@gmail.com>
wrote:
> Hi,
>
> I am using HBase with Spark and as I have wide columns (> 10000) I wanted
> to
> use the "setbatch(num)" option to not read all the columns for a row but in
> batches.
>
> I can create a scan and set the batch size I want with
> TableInputFormat.SCAN_BATCHSIZE, but I am a bit confused how this would
> work
> with more than 1 column family.
>
> Any help is appreciated.
>
> PS: Also any documentation or inputs on newAPIHadoopRDD would be really
> appreciated as well.
>
> Thanks & Regards,
> Biplob
>
>
>
> --
> Sent from: http://apache-hbase.679495.n3.nabble.com/HBase-User-
> f4020416.html
>
|