hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jürgen Jakobitsch <jakobits...@punkt.at>
Subject Re: hbase mapreduce scan
Date Tue, 06 Apr 2010 17:59:33 GMT
hi, thanks for your inputs,

i was asking with respect to do sparql queries over hbase tables.

i have read that yahoo and other use hbase or bigtable for their
searchresults and so i'm thinking of how to apply a sparql query
 - which is nothing else than a normal query - to hbase.

openrdf's sail-api provides a complete sparql engine with everything
you need (i have implemented a super fast lucene triple store).
more or less the only thing you need to implement is the getStatements(Resource subject, URI
predicate, Value object)
method and output a CloseableIterator (also from openrdf).

to come to the point : if its true that yahoo and others use hbase or bigtable
for their searchresults, what is the best way to retrieve such query results.

Export.java in mapreduce package also does a scan with tablemapper (or something similar)
but exports the outcome to a file.

i don't really care if its a file that i read in the file and make a openrdf's CloseableIterator
out of it in
some way (same for temp-table) - i'm just looking for the fastest way to retrieve data from
an htable.

i realize, there is a common interest in hbase and rdf, i'll put together
a hbase sail impl and put it on sourceforge and do a challange for the getStatements method,
the fastest wins...

wkr turnguard.com/turnguard


----- Original Message -----
From: "Jean-Daniel Cryans" <jdcryans@apache.org>
To: hbase-user@hadoop.apache.org
Sent: Tuesday, April 6, 2010 6:35:38 PM
Subject: Re: hbase mapreduce scan

Or put it in MySQL, or in S3, or...or... so my point was that you need
a recipient that transcends the JVMs ;)

So it is doable and pretty normal to output in tables the result of
MRs that map other tables, we have dozens of those here at
StumbleUpon. But if it fits in a single HashMap in a single JVM, my
guess is that the output is very small hence this is an operation done
for live clients and not suitable for MR.

J-D

On Tue, Apr 6, 2010 at 4:34 AM, Michael Segel <michael_segel@hotmail.com> wrote:
>
>
> J-D,
>
> There's an alternative...
>
> He could write a M/R that takes the input from a scan() , do something, reduce() and
then output the reduced set back to hbase in the form of a temp table.
> (Even an in memory temp table) and then at the end pull the data out in to a hash table?
>
> In theory this should be possible, but I haven't had time to play with in memory tables....
>
> No?
>
>
> Thx
>
> -Mike
>
>> Date: Mon, 5 Apr 2010 09:57:02 -0700
>> Subject: Re: hbase mapreduce scan
>> From: jdcryans@apache.org
>> To: hbase-user@hadoop.apache.org
>>
>> You want to put the result in a HashMap? MapReduce is a batch
>> processing framework that runs multiple parallel JVMs over a cluster
>> of machines so I don't see how you could simply output in a HashMap
>> (unless you don't mind outputting on disk, then reading it back into a
>> HashMap).
>>
>> So I will guess that you want to do a live query against HBase, here
>> MR won't help you since that is meant for bulk processing which
>> usually takes more than a minute.
>>
>> What you want to use is a Scan, using HTable. The unit tests have tons
>> of example on how to use a scanner, look in the
>> org.apache.hadoop.hbase.client package, so will find what you need.
>> The main client package also contains some examples
>> http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/package-summary.html
>>
>> J-D
>>
>> On Sun, Apr 4, 2010 at 11:18 AM, Jürgen Jakobitsch <jakobitschj@punkt.at>
wrote:
>> > hi,
>> >
>> > i'm totally new to hbase and mapreduce and could really need some
>> > pointer into the right direction for the following situation.
>> >
>> > i managed to run a basic mapreduce example - analog to Export.java
>> > in the hbase.mapreduce package.
>> >
>> > what i need to achieve is the following :
>> >
>> > do a map/reduce scan on a hbase table and put the results
>> > into a HashMap.
>> >
>> > could someone point me to an example.
>> >
>> > any help really appreciated
>> >
>> > wkr turnguard.com/turnguard
>> >
>> > --
>> > punkt. netServices
>> > ______________________________
>> > Jürgen Jakobitsch
>> > Codeography
>> >
>> > Lerchenfelder Gürtel 43 Top 5/2
>> > A - 1160 Wien
>> > Tel.: 01 / 897 41 22 - 29
>> > Fax: 01 / 897 41 22 - 22
>> >
>> > netServices http://www.punkt.at
>> >
>> >
>
> _________________________________________________________________
> Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1

-- 
punkt. netServices
______________________________
Jürgen Jakobitsch
Codeography

Lerchenfelder Gürtel 43 Top 5/2
A - 1160 Wien
Tel.: 01 / 897 41 22 - 29
Fax: 01 / 897 41 22 - 22

netServices http://www.punkt.at


Mime
View raw message