spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Reading from Hbase using python
Date Wed, 12 Nov 2014 21:04:50 GMT
To my knowledge, Spark 1.1 comes with HBase 0.94
To utilize HBase 0.98, you will need:
https://issues.apache.org/jira/browse/SPARK-1297

You can apply the patch and build Spark yourself.

Cheers

On Wed, Nov 12, 2014 at 12:57 PM, Alan Prando <alan@scanboo.com.br> wrote:

> Hi Ted! Thanks for anwsering...
>
> Maybe I didn't make myself clear... What I need is read a table from HBase
> using Python in Spark.
> I'm using HBase 0.98 and Spark 1.1
>
> My code is as following:
> https://github.com/apache/spark/blob/master/examples/src/main/python/hbase_inputformat.py
> My problem is that, when I have two (or more) qualifiers in a rowkey, this
> example return just one qualifier.
>
> In fact, I've already find a question similar (
> http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-get-column-family-and-qualifier-names-from-hbase-table-td18613.html#a18650),
> however I'm not able yet to find the solution.
>
> Do u have any idea?
>
>
> 2014-11-12 18:26 GMT-02:00 Ted Yu <yuzhihong@gmail.com>:
>
> Can you give us a bit more detail:
>>
>> hbase release you're using.
>> whether you can reproduce using hbase shell.
>>
>> I did the following using hbase shell against 0.98.4:
>>
>> hbase(main):001:0> create 'test', 'f1'
>> 0 row(s) in 2.9140 seconds
>>
>> => Hbase::Table - test
>> hbase(main):002:0> put 'test', 'row1', 'f1:1', 'value1'
>> 0 row(s) in 0.1040 seconds
>>
>> hbase(main):003:0> put 'test', 'row1', 'f1:2', 'value2'
>> 0 row(s) in 0.0080 seconds
>>
>> hbase(main):004:0> scan 'test'
>> ROW                                      COLUMN+CELL
>>  row1                                    column=f1:1,
>> timestamp=1415823887048, value=value1
>>  row1                                    column=f1:2,
>> timestamp=1415823893857, value=value2
>>
>> Cheers
>>
>> On Wed, Nov 12, 2014 at 11:32 AM, Alan Prando <alan@scanboo.com.br>
>> wrote:
>>
>>> Hi all,
>>>
>>> I'm trying to read an hbase table using this an example from github (
>>> https://github.com/apache/spark/blob/master/examples/src/main/python/hbase_inputformat.py),
>>> however I have two qualifiers in a column family.
>>>
>>> Ex.:
>>>
>>>  ROW COLUMN+CELL  row1 column=f1:1, timestamp=1401883411986,
>>> value=value1  row1 column=f1:2, timestamp=1401883415212, value=value2  row2
>>> column=f1:1, timestamp=1401883417858, value=value3  row3 column=f1:1,
>>> timestamp=1401883420805, value=value4
>>> When I run the code hbase_inputformat.py, the following loop print row1
>>> just once:
>>>
>>> output = hbase_rdd.collect()  for (k, v) in output:  print (k, v)
>>> Am I doing anything wrong?
>>>
>>> Thanks in advance.
>>>
>>
>>
>

Mime
View raw message