giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Heitmann (Commented) (JIRA)" <>
Subject [jira] [Commented] (GIRAPH-170) Workflow for loading RDF graph data into Giraph
Date Thu, 19 Apr 2012 11:22:41 GMT


Benjamin Heitmann commented on GIRAPH-170:

Regarding GIRAPH-141, 
I don't think that true multigraph support is required for Giraph in order to use RDF data.

If I have "subject1 predicate1 object1" and "subject1 predicate1 object2", then there will
be a total of three vertices with 2 edges, without any conflict. If I have the same triple
"subject1 predicate1 object1" two or more times, then the RDF semantics document states that
all of these triples refer to the same two vertices and the edge between them in the RDF graph.
So there is no need for a multigraph again. 

If we introduce literals into the mix, then we have the same thing as above, if each literal
will be presented by its own Giraph vertex. 

I am not sure if I missed anything, but multigraphs dont seem to be the issue here, neither
in theory, nor for my already working code. 

An issue which would be more important, is the capability to retrieve and modify an already
created node from inside the TextVertexInputFormat class (as explained above). 
> Workflow for loading RDF graph data into Giraph
> -----------------------------------------------
>                 Key: GIRAPH-170
>                 URL:
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Dan Brickley
>            Priority: Minor
> W3C RDF provides a family of Web standards for exchanging graph-based data. RDF uses
sets of simple binary relationships, labeling nodes and links with Web identifiers (URIs).
Many public datasets are available as RDF, including the "Linked Data" cloud (see
). Many such datasets are listed at
> RDF has several standard exchange syntaxes. The oldest is RDF/XML. A simple line-oriented
format is N-Triples. A format aligned with RDF's SPARQL query language is Turtle. Apache Jena
and Any23 provide software to handle all these;
> This JIRA leaves open the strategy for loading RDF data into Giraph. There are various
possibilites, including exploitation of intermediate Hadoop-friendly stores, or pre-processing
with e.g. Pig-based tools into a more Giraph-friendly form, or writing custom loaders. Even
a HOWTO document or implementor notes here would be an advance on the current state of the
art. The BluePrints Graph API (Gremlin etc.) has also been aligned with various RDF datasources.
> Related topics: multigraphs touches
on the issue (since we can't currently easily represent fully general RDF graphs since two
nodes might be connected by more than one typed edge). Even without multigraphs it ought to
be possible to bring RDF-sourced data
> into Giraph, e.g. perhaps some app is only interested in say the Movies + People subset
of a big RDF collection.
> From Avery in email: "a helper VertexInputFormat (and maybe VertexOutputFormat) would
certainly [despite GIRAPH-141] still help"

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message