hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Latham <lat...@davelink.net>
Subject Re: Issues with import from 0.92 into 0.98
Date Wed, 27 May 2015 17:54:06 GMT
It looks like the hbase shell (beginning with 0.96) parses column
names as FAMILY:QUALIFIER[:FORMATTER] due to work from HBASE-6592.
As a result, the shell basically doesn't support specifying any
columns (for gets/puts/scans/etc) that include a colon in the
qualifier.  I filed HBASE-13788.

For your case, I suspect the data was properly imported, but when you
tried to scan for "x:twitter:username" it instead scanned for
"x:twitter" and found nothing.


P.S. Here's some related help text from the shell.

Besides the default 'toStringBinary' format, 'get' also supports
custom formatting by
column.  A user can define a FORMATTER by adding it to the column name
in the get
specification.  The FORMATTER can be stipulated:

 1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g,
toInt, toString)
 2. or as a custom class followed by method name: e.g.

Example formatting cf:qualifier1 and cf:qualifier2 both as Integers:
  hbase> get 't1', 'r1' {COLUMN => ['cf:qualifier1:toInt',
    'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] }

Note that you can specify a FORMATTER by column only (cf:qualifer).
You cannot specify
a FORMATTER for all columns of a column family.

On Wed, May 27, 2015 at 10:23 AM,  <apache@borkbork.net> wrote:
> On Wed, May 27, 2015, at 11:35 AM, Dave Latham wrote:
>> Sounds like quite a puzzle.
>> You mentioned that you can read data written through manual Puts from
>> the shell - but not data from the Import.  There must be something
>> different about the data itself once it's in the table.  Can you
>> compare a row that was imported to a row that was manually written -
>> or show them to us?
> Hmph, I may have spoken too soon. I know I tested this at one point and
> it worked, but now I'm getting different results:
> On the new cluster, I created a duplicate test table:
> hbase(main):043:0> create 'content3', {NAME => 'x', BLOOMFILTER =>
> 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536',
> IN_MEMORY => 'false', BLOCKCACHE => 'true'}
> Then I pull some data from the imported table:
> hbase(main):045:0> scan 'content', {LIMIT=>1,
> STARTROW=>'A:9223370612089311807:twtr:57013379'}
> ROW                                  COLUMN+CELL
> ....
> A:9223370612089311807:twtr:570133798827921408
> column=x:twitter:username, timestamp=1424775595345, value=BERITA &
> Then put it:
> hbase(main):046:0> put
> 'content3','A:9223370612089311807:twtr:570133798827921408',
> 'x:twitter:username', 'BERITA & INFORMASI!'
> But then when I query it, I see that I've lost the column qualifier
> ":username":
> hbase(main):046:0> scan 'content3'
> ROW                                  COLUMN+CELL
>  A:9223370612089311807:twtr:570133798827921408 column=x:twitter,
>  timestamp=1432745301788, value=BERITA & INFORMASI!
> Even though I'm missing one of the qualifiers, I can at least filter on
> columns in this sample table.
> So now I'm even more baffled :(
> Z

View raw message