tinkerpop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Kuppitz ...@gremlin.guru>
Subject Re: [DISCUSS] Bulk Loading
Date Thu, 07 Jun 2018 16:32:35 GMT
IMO it would be best if graph providers would implement a GraphOutputFormat
for their graph implementation. This way we could rely on
BulkDumperVertexProgram [1,2], which only relies on an InputFormat and an
OutputFormat and thus can be seen as kind of a [Copy|Clone]VertexProgram.
If that's not an option, then graph providers could still create their own
VP, that is optimized to handle transactions, id assignments, etc. properly
in the underlying graph DB implementation.



On Thu, Jun 7, 2018 at 8:53 AM, Stephen Mallette <spmallette@gmail.com>

> TinkerPop tries to generalize various aspects of graph computing and does a
> pretty good job of doing so, but every so often we try to generalize
> something and it just doesn't work the way we'd like. Indexing was one such
> casualty, if you need an example to consider, but I think that our attempt
> at bulk loading is falling into that area as well, specifically:
> BulkLoaderVertexProgram (BLVP):
> http://tinkerpop.apache.org/docs/current/reference/#
> bulkloadervertexprogram
> What I'm seeing is that graph providers are offering their own bulk loading
> tools which are inevitably faster and/or easier to use that BLVP. Here's
> some examples:
> CosmosDB: https://github.com/Microsoft/Microsoft.Azure.Graphs.BulkImport
> Neptune: https://docs.aws.amazon.com/neptune/latest/userguide/bulk-
> load.html
> Neo4j: https://neo4j.com/blog/bulk-data-import-neo4j-3-0/
> DSE Graph:
> https://docs.datastax.com/en/dse/5.1/dse-dev/datastax_
> enterprise/graph/dgl/dglOverview.html
> JanusGraph: https://docs.janusgraph.org/0.2.0/bulk-loading.html
> I suppose there are others, but hopefully those examples convey the point.
> Of those I mentioned, perhaps the JanusGraph one is a bit of a stretch as
> its documentation references hadoop-gremlin which I presume means BLVP.
> Maybe someone on JanusGraph can comment a bit further.
> In addition to graph providers having their own approaches to bulk loading,
> I tend to find that BLVP is always a question mark for users. They tend to
> have problems getting it working right and we really haven't done much to
> improve its usage.
> So, given all that, would it be a bad idea to get TinkerPop out of the
> business of trying to generalize bulk loading? If we did, that would be one
> less feature to support and we could arguably recommend to users a better
> experience by instructing them to use the bulk loader of their graph of
> choice. I suppose that the downside to taking this stance would be that
> graph providers that don't provide bulk loaders couldn't rely on TinkerPop
> anymore for this need (JanusGraph? others?). Finally, users would not have
> a single general way to bulk load to any graph implementation. Perhaps
> there is a way to do that without BLVP in place?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message