hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: habse schema design and retrieving values through REST interface
Date Wed, 16 Mar 2011 23:44:33 GMT
Thank you Andrew.
St.Ack

On Wed, Mar 16, 2011 at 3:12 PM, Andrew Purtell <apurtell@apache.org> wrote:
>>  This facility is not exposed in the REST API at the moment
>> (not that I know of -- please someone correct me if I'm
>> wrong).
>
> Wrong. :-)
>
> See ScannerModel in the rest package: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/rest/model/ScannerModel.html
>
> ScannerModel#setBatch
>
>   - Andy
>
>
>
> --- On Wed, 3/16/11, Stack <stack@duboce.net> wrote:
>
>> From: Stack <stack@duboce.net>
>> Subject: Re: habse schema design and retrieving values through REST interface
>> To: user@hbase.apache.org
>> Date: Wednesday, March 16, 2011, 10:47 AM
>> You can limit the return when
>> scanning from the java api; see
>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setBatch(int)
>>  This facility is not exposed in the REST API at the moment
>> (not that
>> I know of -- please someone correct me if I'm
>> wrong).   So, yes, wide
>> rows, if thousands of elements of some size, since they
>> need to be
>> composed all in RAM, could bring on an OOME if the composed
>> size >
>> available heap.
>>
>> St.Ack
>>
>>
>> On Wed, Mar 16, 2011 at 2:41 AM, sreejith P. K. <sreejithpk@nesote.com>
>> wrote:
>> > With this schema, if i can limit the column family
>> over a particular range,
>> > I can manage everything else. (like Select first n
>> columns of a column
>> > family)
>> >
>> > Sreejith
>> >
>> >
>> > On Wed, Mar 16, 2011 at 12:33 PM, sreejith P. K.
>> <sreejithpk@nesote.com>wrote:
>> >
>> >> @ Jean-Daniel,
>> >>
>> >> As i told, each row key contains thousands of
>> column family values (may be
>> >> i am wrong with the schema design). I started REST
>> and tried to cURL
>> >> http:/localhost/tablename/rowname. It seems it
>> will work only with limited
>> >> amount of data (may be i can limit the cURL
>> output), and how i can limit the
>> >> column values for a particular row?
>> >> Suppose i have two thousand urls under a keyword
>> and i need to fetch the
>> >> urls and should limit the result to five hundred.
>> How it is possible??
>> >>
>> >> @ tsuna,
>> >>
>> >>  It seems http://www.elasticsearch.org/ using
>> CouchDB right?
>> >>
>> >>
>> >> On Tue, Mar 15, 2011 at 11:32 PM, Jean-Daniel
>> Cryans <jdcryans@apache.org>wrote:
>> >>
>> >>> Can you tell why it's not able to get the
>> bigger rows? Why would you
>> >>> try another schema if you don't even know
>> what's going on right now?
>> >>> If you have the same issue with the new
>> schema, you're back to square
>> >>> one right?
>> >>>
>> >>> Looking at the logs should give you some
>> hints.
>> >>>
>> >>> J-D
>> >>>
>> >>> On Tue, Mar 15, 2011 at 10:19 AM, sreejith P.
>> K. <sreejithpk@nesote.com>
>> >>> wrote:
>> >>> > Hello experts,
>> >>> >
>> >>> > I have a scenario as follows,
>> >>> > I need to maintain a huge table for a
>> 'web crawler' project in HBASE.
>> >>> > Basically it contains thousands of
>> keywords and for each keyword i need
>> >>> to
>> >>> > maintain a list of urls (it again will
>> count in thousands).
>> >>> Corresponding to
>> >>> > each url, i need to store a number, which
>> will in turn resemble the
>> >>> priority
>> >>> > value the keyword holds.
>> >>> > Let me explain you a bit, Suppose i have
>> a keyword 'united states', i
>> >>> need
>> >>> > to store about ten thousand urls
>> corresponding to that keyword. Each
>> >>> keyword
>> >>> > will be holding a priority value which is
>> an integer. Again i have
>> >>> thousands
>> >>> > of keywords like that. The rare thing
>> about this is i need to do the
>> >>> project
>> >>> > in PHP.
>> >>> >
>> >>> > I have configured a hadoop-hbase cluster
>> consists of three machines. My
>> >>> plan
>> >>> > was to design the schema by taking the
>> keyword as 'row key'. The urls i
>> >>> will
>> >>> > keep as column family. The schema looked
>> fine at first. I have done a
>> >>> lot of
>> >>> > research on how to retrieve the url list
>> if i know the keyword. Any ways
>> >>> i
>> >>> > managed a way out by preg-matching the
>> xml data out put using the url
>> >>> > http://localhost:8080/tablename/rowkey (REST interface
>> i used). It also
>> >>> > works fine if the url list has a limited
>> number of urls. When it comes
>> >>> in
>> >>> > thousands, it seems i cannot fetch the
>> xml data itself!
>> >>> > Now I am in a do or die situation. Please
>> correct me if my schema design
>> >>> > needs any changes (I do believe it should
>> change!) and please help me up
>> >>> to
>> >>> > retrieve the column family values (urls)
>> >>> >  corresponding to each row-key in an
>> efficient way. Please guide me how
>> >>> i
>> >>> > can do the same using PHP-REST
>> interface.
>> >>> > Thanks in advance.
>> >>> >
>> >>> > Sreejith
>> >>> >
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Sreejith PK
>> >> Nesote Technologies (P) Ltd
>> >>
>> >>
>> >>
>> >
>> >
>> > --
>> > Sreejith PK
>> > Nesote Technologies (P) Ltd
>> >
>>
>
>
>
>

Mime
View raw message