tinkerpop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Mallette (Jira)" <j...@apache.org>
Subject [jira] [Commented] (TINKERPOP-2352) Gremlin Python driver default pool size makes Gremlin keep-alive difficult
Date Mon, 23 Mar 2020 11:41:00 GMT

    [ https://issues.apache.org/jira/browse/TINKERPOP-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064737#comment-17064737
] 

Stephen Mallette commented on TINKERPOP-2352:
---------------------------------------------

Thanks for your thoughts on this one. I don't think there's a problem with changing the default
pool size to what you suggest as long as we have tests that continue to validate behavior
of the larger pool size somehow. A pull request would be great especially one that included
some more documentation of the type you describe, of course, I think a nicer pull request
would be to solve the keep-alive problem more generally as described on TINKERPOP-1886. Fixing
that would be a much more robust solution. If you have the opportunity to help there it would
be appreciated. If you could solve that, then perhaps we should just close this ticket in
favor of that one and continue our discussion there. 

> Gremlin Python driver default pool size makes Gremlin keep-alive difficult
> --------------------------------------------------------------------------
>
>                 Key: TINKERPOP-2352
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2352
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: python
>    Affects Versions: 3.3.5, 3.4.5
>         Environment: AWS Lambda, Python 3.7 runtime, AWS Neptune.
> (AWS Lambda functions can remain in memory and thus hold connections open for many minutes
between invocations)
>            Reporter: Mark Br...e
>            Priority: Major
>
> I'm working with a Gremlin database that (like many) terminates connections if they don't
execute any transactions with a timeout period.  When we want to run a traversal we first
check our `GraphTraversalSource` by running `g.V().limit(1).count().next()` and if that raises
an exception we know we need to reconnect before running the actual traversal.
> We've been very confused that this hasn't worked as expected: we intermittently see traversals
fail with `WebSocketClosed` or other connection-related errors immediately after the "connection
test" passes. 
> I've (finally) found the cause of this inconsistency is the default pool size in `gremlin_python.driver.client.Client`
being 4.  This means there's no visiblity outside the `Client` of which connection in the
pool is tested and/or used, and in fact no way for the application (`GraphTraversalSource`)
to run keep-alive type traversals reliably.  Anytime an application passes in a pool size
of `None` or a number > 1 there'll be no way to make sure that each and every connection
in the pool actually sends keep-alive traversals to the remote, _except_ in the case of a
single-threaded application where a tight loop could issue `pool_size` of them.  In that
latter case as the application is single-threaded then a `pool_size` above 1 won't provide
much benefit.
> I've raised this as a bug because I think a default `pool_size` of 1 would give much
more predictable behaviour, and in the specific case of the Python driver is probably more
appropriate because Python applications tend to run single-threaded by default, with multi-threading
carefully added when performance requires it.  Perhaps it's a wish, but as the behaviour
from the default option is quite confusing it feels more like a bug, at least.  If it would
help I'm happy to raise a PR with some updated function header comments or maybe updated documentation
about multi-threaded / multi-async-loop usage of gremlin-python.
> (This is my first issue here, apologies if it has some fields wrong.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message