hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amandeep Khurana <ama...@gmail.com>
Subject Re: Using SPARQL against HBase
Date Mon, 05 Apr 2010 10:21:14 GMT
1. We want to have a SPARQL query engine over it that can return results to
queries in real time, comparable to other systems out there. And since we
will have HBase as the storage layer, we want to scale well. The biggest
triple store I'm aware of has 100 billion triples. HBase can certainly store
more than that.

2. We want to enable large scale processing as well, leveraging Hadoop
(maybe? read about this on Cloudera's blog), and maybe something like
Pregel.

These things are fluid and the first step would be to spec out features that
we want to build in, and your thoughts on that would be useful.

What are you aware of Google's work with linked data and bigtable? Give us
some insights there...

-ak


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Mon, Apr 5, 2010 at 2:51 AM, Edward J. Yoon <edwardyoon@apache.org>wrote:

> Well, the structure should be fit for the purpose but, I don't know
> what you are trying to do. (e.g., SPARQL adapter? large-scale RDF
> processing and storing?)
>
> On Mon, Apr 5, 2010 at 3:14 PM, Amandeep Khurana <amansk@gmail.com> wrote:
> > Edward,
> >
> > I think for now we'll start with modeling how to store triples such that
> we
> > can run real time SPARQL queries on them and then later look at the
> Pregel
> > model and how we can leverage that for bulk processing. The Bigtable data
> > model doesnt lend itself directly to store triples such that fast
> querying
> > is possible. Do you have any idea on how Google stores linked data in
> > bigtable? We can build on it from there.
> >
> > -ak
> >
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
> >
> > On Sun, Apr 4, 2010 at 10:50 PM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
> >
> >> Hi, I'm a proposer/sponsor of heart project.
> >>
> >> I have no doubt that RDF can be stored in HBase because google also
> >> stores linked-data in their bigtable.
> >>
> >> However, If you want to focus on large-scale (distributed) processing,
> >> I would recommend you to read google pregel project (google's graph
> >> computing framework). because the SPARQL is a basically graph query
> >> language for RDF graph data.
> >>
> >> On Fri, Apr 2, 2010 at 7:09 AM, Jürgen Jakobitsch <jakobitschj@punkt.at
> >
> >> wrote:
> >> > hi again,
> >> >
> >> > i'm definitly interested.
> >> >
> >> > you probably heard of the heart project, but there's hardly something
> >> going on,
> >> > so i think it's well worth the effort.
> >> >
> >> > for your discussion days i'd recommend taking a look at openrdf sail
> api
> >> >
> >> > @http://www.openrdf.org/doc/sesame2/system/
> >> >
> >> > the point is that there is allready everything you need like query
> engine
> >> and the
> >> > like..
> >> > to make it clear for beginning a quad store its close to perfect
> because
> >> it
> >> > actually comes down to implement the getStatements method as accurate
> as
> >> possible.
> >> >
> >> > the query engine does the same by parsing the sparql query and using
> the
> >> getStatements method.
> >> >
> >> > now this method simply has five arguments :
> >> >
> >> > subject, predicate, object, includeinferred and contexts, where
> subject
> >> predicate, object can
> >> > be null, includeinferred can be ignored for starting and contexts can
> >> also be null for a starter
> >> > or an array of uris.
> >> >
> >> > also note that the sail api is quite commonly used (virtuoso,
> >> openrdfsesame, neo4j, bigdata, even oracle has an old version,
> >> > we'll be having one implementation for talis and 4store in the coming
> >> weeks and of course my quadstore "tuqs")
> >> >
> >> > if you find the way to retrieve the triples (quads) from hbase i could
> >> implement a sail
> >> > store in a day - et voila ...
> >> >
> >> > anyways it would be nice if you keep me informed .. i'd really like to
> >> contribute...
> >> >
> >> > wkr www.turnguard.com
> >> >
> >> >
> >> > ----- Original Message -----
> >> > From: "Amandeep Khurana" <amansk@gmail.com>
> >> > To: hbase-user@hadoop.apache.org
> >> > Sent: Thursday, April 1, 2010 11:45:00 PM
> >> > Subject: Re: Using SPARQL against HBase
> >> >
> >> > Andrew and I just had a chat about exploring how we can leverage HBase
> >> for a
> >> > scalable RDF store and we'll be looking at it in more detail over the
> >> next
> >> > few days. Is anyone of you interested in helping out? We are going to
> be
> >> > looking at what all is required to build a triple store + query engine
> on
> >> > HBase and how HBase can be used as is or remodeled to fit the problem.
> >> > Depending on what we find out, we'll decide on taking the project
> further
> >> > and committing efforts towards it.
> >> >
> >> > -Amandeep
> >> >
> >> >
> >> > Amandeep Khurana
> >> > Computer Science Graduate Student
> >> > University of California, Santa Cruz
> >> >
> >> >
> >> > On Thu, Apr 1, 2010 at 1:12 PM, Jürgen Jakobitsch <
> jakobitschj@punkt.at
> >> >wrote:
> >> >
> >> >> hi,
> >> >>
> >> >> this sounds very interesting to me, i'm currently fiddling
> >> >> around with a suitable row and column setup for triples.
> >> >>
> >> >> i'm about to implement openrdf's sail api for hbase (i just did
> >> >> a lucene quad store implementation which is superfast a scales
> >> >> to a couple of hundreds of millions of triples (
> >> http://turnguard.com/tuqs
> >> >> ))
> >> >> but i'm in my first days of hbase encounters, so my experience
> >> >> in row column design is manageable.
> >> >>
> >> >> from my point of view the problem is to really efficiantly store
> >> >> besides the triples themselves the contexts (named graphs) and
> >> >> languages of literal.
> >> >>
> >> >> by the way : i just did a small tablemanager (in beta) that lets
> >> >> you create htables -> from <- rdf (see
> >> >> http://sourceforge.net/projects/hbasetablemgr/)
> >> >>
> >> >> i'd be really happy to contribute on the rdf and sparql side,
> >> >> but certainly could need some help on the hbase table design side.
> >> >>
> >> >> wkr www.turnguard.com/turnguard
> >> >>
> >> >>
> >> >>
> >> >> ----- Original Message -----
> >> >> From: "Raffi Basmajian" <rbasmajian@oppenheimerfunds.com>
> >> >> To: hbase-user@hadoop.apache.org, apurtell@apache.org
> >> >> Sent: Thursday, April 1, 2010 9:45:59 PM
> >> >> Subject: RE: Using SPARQL against HBase
> >> >>
> >> >>
> >> >> This is an interesting article from a few guys over at BBN/Raytheon.
> By
> >> >> storing triples in flat files theu used a custom algorithm, detailed
> in
> >> >> the article, to iterate the WHERE clause from a SPARQL query and
> reduce
> >> >> the map into the desired result.
> >> >>
> >> >> This is very similar to what I need to do; the only difference being
> >> >> that our data is stored in Hbase tables, not as triples in flat
> files.
> >> >>
> >> >>
> >> >> -----Original Message-----
> >> >> From: Amandeep Khurana [mailto:amansk@gmail.com]
> >> >> Sent: Wednesday, March 31, 2010 3:30 PM
> >> >> To: hbase-user@hadoop.apache.org; apurtell@apache.org
> >> >> Subject: Re: Using SPARQL against HBase
> >> >>
> >> >> Why do you need to build an in-memory graph which you would want to
> >> >> read/write to? You could store the graph in HBase directly. As
> pointed
> >> >> out, HBase might not be the best suited for SPARQL queries, but its
> not
> >> >> impossible to do. Using the triples, you can form a graph that can
be
> >> >> represented in HBase as an adjacency list. I've stored graphs with
> >> >> 16-17M nodes which was data equivalent to about 600M triples. And
> this
> >> >> was on a small cluster and could certainly scale way more than 16M
> graph
> >> >> nodes.
> >> >>
> >> >> In case you are interested in working on SPARQL over HBase, we could
> >> >> collaborate on it...
> >> >>
> >> >> -ak
> >> >>
> >> >>
> >> >> Amandeep Khurana
> >> >> Computer Science Graduate Student
> >> >> University of California, Santa Cruz
> >> >>
> >> >>
> >> >> On Wed, Mar 31, 2010 at 11:56 AM, Andrew Purtell
> >> >> <apurtell@apache.org>wrote:
> >> >>
> >> >> > Hi Raffi,
> >> >> >
> >> >> > To read up on fundamentals I suggest Google's BigTable paper:
> >> >> > http://labs.google.com/papers/bigtable.html
> >> >> >
> >> >> > Detail on how HBase implements the BigTable architecture within
the
> >> >> > Hadoop ecosystem can be found here:
> >> >> >
> >> >> >  http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture
> >> >> >
> >> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
> >> >> >
> >> >> >
> >> http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-l
> >> >> > og.html
> >> >> >
> >> >> > Hope that helps,
> >> >> >
> >> >> >   - Andy
> >> >> >
> >> >> > > From: Basmajian, Raffi <rbasmajian@oppenheimerfunds.com>
> >> >> > > Subject: RE: Using SPARQL against HBase
> >> >> > > To: hbase-user@hadoop.apache.org, apurtell@apache.org
> >> >> > > Date: Wednesday, March 31, 2010, 11:42 AM If Hbase can't
respond
> to
> >> >> > > SPARQL-like queries, then what type of query language can
it
> respond
> >> >>
> >> >> > > to? In a traditional RDBMS database one would use SQL; so
what is
> >> >> > > the counterpart query language with Hbase?
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >>
> ------------------------------------------------------------------------------
> >> >> This e-mail transmission may contain information that is proprietary,
> >> >> privileged and/or confidential and is intended exclusively for the
> >> person(s)
> >> >> to whom it is addressed. Any use, copying, retention or disclosure
by
> >> any
> >> >> person other than the intended recipient or the intended recipient's
> >> >> designees is strictly prohibited. If you are not the intended
> recipient
> >> or
> >> >> their designee, please notify the sender immediately by return e-mail
> >> and
> >> >> delete all copies. OppenheimerFunds may, at its sole discretion,
> >> monitor,
> >> >> review, retain and/or disclose the content of all email
> communications.
> >> >>
> >> >>
> >>
> ==============================================================================
> >> >>
> >> >>
> >> >> --
> >> >> punkt. netServices
> >> >> ______________________________
> >> >> Jürgen Jakobitsch
> >> >> Codeography
> >> >>
> >> >> Lerchenfelder Gürtel 43 Top 5/2
> >> >> A - 1160 Wien
> >> >> Tel.: 01 / 897 41 22 - 29
> >> >> Fax: 01 / 897 41 22 - 22
> >> >>
> >> >> netServices http://www.punkt.at
> >> >>
> >> >>
> >> >
> >> > --
> >> > punkt. netServices
> >> > ______________________________
> >> > Jürgen Jakobitsch
> >> > Codeography
> >> >
> >> > Lerchenfelder Gürtel 43 Top 5/2
> >> > A - 1160 Wien
> >> > Tel.: 01 / 897 41 22 - 29
> >> > Fax: 01 / 897 41 22 - 22
> >> >
> >> > netServices http://www.punkt.at
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon @ NHN, corp.
> >> edwardyoon@apache.org
> >> http://blog.udanax.org
> >>
> >
>
>
>
> --
> Best Regards, Edward J. Yoon @ NHN, corp.
> edwardyoon@apache.org
> http://blog.udanax.org
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message