hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jl...@streamy.com>
Subject Re: Adding/Removing regionservers
Date Thu, 02 Jul 2009 20:24:30 GMT
Sounds about right.  You seem to have a good grip on things.

0.20 will work with millions of columns in a row, but currently there is 
no way to return the massive row in segments.  If the data is big 
enough, you'll have memory allocation issues.  Scanners are still a 
safer way to go until we have intra-row scanning: 
https://issues.apache.org/jira/browse/HBASE-1537

JG

llpind wrote:
> Thanks for the tips.
> 
> Yeah that is the model we had before, the problem is we can potentially have
> millions of IDs for a given TYPE|VAL. 
> 
> we are considering something like:
> Row Key: TYPE|VALUE|ID
> column: link:TYPE|VALUE
> 
> This is only because ID may never have more than a few TYPE|VAL results in
> this current dataset, which would also eliminate the need to go to second
> table.  
> 
> Thanks for the help.  
> 
> 
> Jonathan Gray-2 wrote:
>> Well you're trying to do a join.  How much data is actually in TableB? 
>> You might consider denormalizing so that you don't have to query TableB, 
>> the data you need is already in TableA.
>>
>> You could use a Get (single trip) for the inner loop rather than a 
>> Scanner (which requires multiple round-trips).  You could even use a Get 
>> for the outer loop by making your table wide instead of tall.
>>
>> Row Key:  TYPE|VALUE
>> Column: link:ID
>>
>> And you have a column for each ID within that TYPE|VALUE row.
>>
>> Also, don't forget to close your scanners if you do use scanners.
>>
>> JG
>>
>>
>> llpind wrote:
>>> Assume a schema like so:  
>>>
>>> TableA======================
>>> Row Key:  TYPE|VALUE|ID
>>> Column:  link:ID  (irrelevant)
>>> TableB======================
>>> Row Key: ID
>>> Column: typeval:TYPE|VALUE
>>> ===========================
>>>
>>>
>>>
>>> I need to iterate over the TableA using a Scanner to get all IDs based on
>>> TYPE|VALUE, then for each ID I need to get from TableB what TYPE|VALUE’s
>>> it’s tied to (a many to many).
>>> Assume I have a list of TYPE|VALUES in a List, and need to process
>>> through
>>> this data.  Done something like this:
>>>
>>>
>>>
>>> for (String typeVal : list){
>>>
>>>   Scan tblAScan = new Scan(Bytes.toBytes(typeVal  + “|”),
>>> Bytes.toBytes(typeVal  + “|A”));	//give me all IDs for matching TYPE|VAL
>>>   ResultScanner s1 = tblA.getScanner(tblAScan);
>>>
>>>   for (Result tblBRowResult = s1.next(); tblBRowResult != null;
>>> tblBRowResult = s1.next()){
>>>
>>> 	  Scan tblBScan = new Scan(Bytes.toBytes(tblBRowResult.getValue() ),
>>> Bytes.toBytes(typeVal  + “ ”));  //IDs are all numeric
>>> 	  ResultScanner s2 = tblA.getScanner(tblAScan);
>>> 	  List results = s2.next().list();  //only care about column data here,
>>> since ID is row key
>>>
>>> 	  for (KeyValue kv : results){
>>> 			//do stuff
>>> 			kv.getValue();
>>> 	  }
>>>
>>>   }
>>>
>>> }
>>>
>>
> 

Mime
View raw message