hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gökhan Çapan <gkhn...@gmail.com>
Subject Re: Determine in which row a column exists
Date Fri, 10 Dec 2010 12:08:04 GMT
Claudio,
You say that I can flip the data, right?
If I understand your suggestion correctly, then getting products that a
cluster includes will be the problem.

Actually, in google's BigTable paper, their example table is similar, also
their example code to retrieve data:
Scanner scanner(T);
ScanStream *stream;
stream = scanner.FetchColumnFamily("anchor");
stream->SetReturnAllVersions();
scanner.Lookup("com.cnn.www");
for (; !stream->Done(); stream->Next()) {
printf("%s %s %lld %s\n",
scanner.RowName(),
stream->ColumnName(),
stream->MicroTimestamp(),
stream->Value());
}

I was just thinking of making this scan faster within this table, maybe with
a BloomFilter.

Another table with productId - clusters row will be the next option.

Thank you.


On Fri, Dec 10, 2010 at 1:52 PM, Claudio Martella <
claudio.martella@tis.bz.it> wrote:

> What about a thin table? rowkey:productid columname:clusterid?
>
>
> On 12/10/10 10:52 AM, Gökhan Çapan wrote:
> > Hi,
> >
> > We have the output of a clustering algorithm in an hbase table which has
> the
> > following structure:
> >
> > {NAME => 'clusters', FAMILIES => [{NAME => 'products', COMPRESS
> > true
> >  ION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE =>
> > '655
> >  36', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
> >
> > row ids are cluster ids.
> > Columns in products column family are the id of the products.
> >
> > an example row is:
> >  1-1000936175-1879240683-185 column=products:21840054,
> > timestamp=1291817353183, value=\x00\x00\x00\x01
> >
> >  1-1000936175-1879240683-185 column=products:23194179,
> > timestamp=1291817353183, value=\x00\x00\x00\x01
> >
> >  1-1000936175-1879240683-185 column=products:23585765,
> > timestamp=1291817353183, value=\x00\x00\x00\x01
> >
> >  1-1000936175-1879240683-185 column=products:24544087,
> > timestamp=1291817353183, value=\x00\x00\x00\x01
> >
> >
> >
> > When we want to determine which clusters a product  belongs to, we
> perform a
> > scan over the table using column,
> >
> > e.g.
> >
> > Scan s = new Scan();
> > s.addColumn(Bytes.toBytes("products"), Bytes.toBytes("24659517"));
> > ResultScanner scanner = table.getScanner(s);
> >
> > I am not sure this is the best way, it is slow, could you suggest a
> faster
> > way to determine such rows?
> > Is there a secondary index implementation that we can add to a column
> family
> > after adding data to table?
> >
>
>
> --
> Claudio Martella
> Digital Technologies
> Unit Research & Development - Analyst
>
> TIS innovation park
> Via Siemens 19 | Siemensstr. 19
> 39100 Bolzano | 39100 Bozen
> Tel. +39 0471 068 123
> Fax  +39 0471 068 129
> claudio.martella@tis.bz.it http://www.tis.bz.it
>
> Short information regarding use of personal data. According to Section 13
> of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we
> process your personal data in order to fulfil contractual and fiscal
> obligations and also to send you information regarding our services and
> events. Your personal data are processed with and without electronic means
> and by respecting data subjects' rights, fundamental freedoms and dignity,
> particularly with regard to confidentiality, personal identity and the right
> to personal data protection. At any time and without formalities you can
> write an e-mail to privacy@tis.bz.it in order to object the processing of
> your personal data for the purpose of sending advertising materials and also
> to exercise the right to access personal data and other rights referred to
> in Section 7 of Decree 196/2003. The data controller is TIS Techno
> Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the
> complete information on the web site www.tis.bz.it.
>
>
>


-- 
Gokhan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message