tinkerpop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Mallette <spmalle...@gmail.com>
Subject [DISCUSS] Bulk Loading
Date Thu, 07 Jun 2018 15:53:41 GMT
TinkerPop tries to generalize various aspects of graph computing and does a
pretty good job of doing so, but every so often we try to generalize
something and it just doesn't work the way we'd like. Indexing was one such
casualty, if you need an example to consider, but I think that our attempt
at bulk loading is falling into that area as well, specifically:
BulkLoaderVertexProgram (BLVP):

http://tinkerpop.apache.org/docs/current/reference/#bulkloadervertexprogram

What I'm seeing is that graph providers are offering their own bulk loading
tools which are inevitably faster and/or easier to use that BLVP. Here's
some examples:

CosmosDB: https://github.com/Microsoft/Microsoft.Azure.Graphs.BulkImport
Neptune: https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load.html
Neo4j: https://neo4j.com/blog/bulk-data-import-neo4j-3-0/
DSE Graph:
https://docs.datastax.com/en/dse/5.1/dse-dev/datastax_enterprise/graph/dgl/dglOverview.html
JanusGraph: https://docs.janusgraph.org/0.2.0/bulk-loading.html

I suppose there are others, but hopefully those examples convey the point.
Of those I mentioned, perhaps the JanusGraph one is a bit of a stretch as
its documentation references hadoop-gremlin which I presume means BLVP.
Maybe someone on JanusGraph can comment a bit further.

In addition to graph providers having their own approaches to bulk loading,
I tend to find that BLVP is always a question mark for users. They tend to
have problems getting it working right and we really haven't done much to
improve its usage.

So, given all that, would it be a bad idea to get TinkerPop out of the
business of trying to generalize bulk loading? If we did, that would be one
less feature to support and we could arguably recommend to users a better
experience by instructing them to use the bulk loader of their graph of
choice. I suppose that the downside to taking this stance would be that
graph providers that don't provide bulk loaders couldn't rely on TinkerPop
anymore for this need (JanusGraph? others?). Finally, users would not have
a single general way to bulk load to any graph implementation. Perhaps
there is a way to do that without BLVP in place?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message