lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Bernstein (JIRA)" <>
Subject [jira] [Commented] (SOLR-8888) Add shortestPath Streaming Expression
Date Tue, 29 Mar 2016 19:03:25 GMT


Joel Bernstein commented on SOLR-8888:

I think this almost ready to commit. As [~dpgove] points out it's not a generic solution for
graph traversals, but it's a useful recipe that probably deserves a top level expression.
As more generic approaches are developed we can always swap out the implementation to use
generic solutions.

The next step I believe would be to implement a nodes() expression, which would use the same
parallel joining technique used in this ticket. But instead of finding the shortest path it
will simply iterate the nodes and track the traversal. This could be wrapped by other streams
to operate over.

> Add shortestPath Streaming Expression
> -------------------------------------
>                 Key: SOLR-8888
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Joel Bernstein
>         Attachments: SOLR-8888.patch, SOLR-8888.patch, SOLR-8888.patch, SOLR-8888.patch,
> This ticket is to implement a distributed shortest path graph traversal as a Streaming
> Expression syntax:
> {code}
> shortestPath(collection, 
>                      from="", 
>                      to="",
>                      edge="from=to",
>                      threads="6",
>                      partitionSize="300", 
>                      fq="limiting query", 
>                      maxDepth="4")
> {code}
> The expression above performs a *breadth first search* to find the shortest path in an
unweighted, directed graph. The search starts from the node  and searches
for the node, traversing the *edges* by iteratively joining the *from* and
*to* columns. Each level in the traversal is implemented as a *parallel partitioned* nested
loop join across the entire *collection*. The *threads* parameter controls the number of threads
performing the join at each level. The *partitionSize* controls the of number of nodes in
each join partition. *maxDepth* controls the number of levels to traverse. *fq* is a limiting
query applied to each level in the traversal.
> Future implementations can add more capabilities such as weighted traversals.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message