lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Bernstein (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-8925) Add gatherNodes Streaming Expression to support breadth first traversals
Date Sat, 09 Apr 2016 01:07:25 GMT

     [ https://issues.apache.org/jira/browse/SOLR-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joel Bernstein updated SOLR-8925:
---------------------------------
    Description: 
The gatherNodes Streaming Expression is a flexible general purpose breadth first graph traversal.
It uses the same parallel join under the covers as (SOLR-8888) but is much more generalized
and can be used for a wide range of use cases.

Sample syntax:

{code}

 gatherNodes(friends,
             gatherNodes(friends
                         search(articles, q=“body:(queryA)”, fl=“author”),
                         walk ="author->user”,
                         gather="friend"),
             walk=“friend -> user”,
             gather="friend",
             scatter=“branches, leaves”)
{code}


The expression above is evaluated as follows:

1) The inner search() expression is evaluated on the *articles* collection, emitting a Stream
of Tuples with the author field populated.
2) The inner gatherNodes() expression reads the Tuples form the search() stream and traverses
to the *friends* collection by performing a distributed join between articles.author and friends.user
field.  It gathers the value from the *friend* field during the join.
3) The inner gatherNodes() expression then emits the *friend* Tuples. By default the gatherNodes
function emits only the leaves which in this case are the *friend* tuples.
4) The outer gatherNodes() expression reads the *friend* Tuples and Traverses again in the
"friends" collection, this time performing the join between *friend* Tuples  emitted in step
3. This collects the friend of friends.
5) The outer gatherNodes() expression emits the entire graph that was collected. This is controlled
by the "scatter" parameter. In the example the *root* nodes are the authors, the *branches*
are the author's friends and the *leaves* are the friend of friends.

This traversal is fully distributed and cross collection.

Like all streaming expressions the gather nodes expression can be combined with other streaming
expressions. For example the following expression uses a hashJoin to intersect the network
of friends rooted to authors found with different queries:

{code}
hashInnerJoin(
                      gatherNodes(friends,
                                  gatherNodes(friends
                                              search(articles, q=“body:(queryA)”, fl=“author”),
                                              walk ="author->user”,
                                              gather="friend"),
                                  walk=“friend -> user”,
                                  gather="friend",
                                  scatter=“branches, leaves”),
                       gatherNodes(friends,
                                  gatherNodes(friends
                                              search(articles, q=“body:(queryB)”, fl=“author”),
                                              walk ="author->user”,
                                              gather="friend"),
                                  walk=“friend -> user”,
                                  gather="friend",
                                  scatter=“branches, leaves”),
                      on=“friend”
         )
{code}




  


  was:
The gatherNodes Streaming Expression is a flexible general purpose breadth first graph traversal.
It uses the same parallel join under the covers as (SOLR-8888) but is much more generalized
and can be used for a wide range of use cases.

Sample syntax:

{code}

 gatherNodes(friends,
             gatherNodes(friends
                          search(articles, q=“body:(queryA)”, fl=“author”),
                          walk ="author->user”,
                          gather="friend"),
             walk=“friend -> user”,
             gather="friend",
             scatter=“branches, leaves”)
{code}


The expression above is evaluated as follows:

1) The inner search() expression is evaluated on the *articles* collection, emitting a Stream
of Tuples with the author field populated.
2) The inner gatherNodes() expression reads the Tuples form the search() stream and traverses
to the *friends* collection by performing a distributed join between articles.author and friends.user
field.  It gathers the value from the *friend* field during the join.
3) The inner gatherNodes() expression then emits the *friend* Tuples. By default the gatherNodes
function emits only the leaves which in this case are the *friend* tuples.
4) The outer gatherNodes() expression reads the *friend* Tuples and Traverses again in the
"friends" collection, this time performing the join between *friend* Tuples  emitted in step
3. This collects the friend of friends.
5) The outer gatherNodes() expression emits the entire graph that was collected. This is controlled
by the "scatter" parameter. In the example the *root* nodes are the authors, the *branches*
are the author's friends and the *leaves* are the friend of friends.

This traversal is fully distributed and cross collection.

Like all streaming expressions the gather nodes expression can be combined with other streaming
expressions. For example the following expression uses a hashJoin to intersect the network
of friends rooted to authors found with different queries:

{code}
hashInnerJoin(
                      gatherNodes(friends,
                                  gatherNodes(friends
                                              search(articles, q=“body:(queryA)”, fl=“author”),
                                              walk ="author->user”,
                                              gather="friend"),
                                  walk=“friend -> user”,
                                  gather="friend",
                                  scatter=“branches, leaves”),
                       gatherNodes(friends,
                                  gatherNodes(friends
                                              search(articles, q=“body:(queryB)”, fl=“author”),
                                              walk ="author->user”,
                                              gather="friend"),
                                  walk=“friend -> user”,
                                  gather="friend",
                                  scatter=“branches, leaves”),
                      on=“friend”
         )
{code}




  



> Add gatherNodes Streaming Expression to support breadth first traversals
> ------------------------------------------------------------------------
>
>                 Key: SOLR-8925
>                 URL: https://issues.apache.org/jira/browse/SOLR-8925
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>             Fix For: 6.1
>
>
> The gatherNodes Streaming Expression is a flexible general purpose breadth first graph
traversal. It uses the same parallel join under the covers as (SOLR-8888) but is much more
generalized and can be used for a wide range of use cases.
> Sample syntax:
> {code}
>  gatherNodes(friends,
>              gatherNodes(friends
>                          search(articles, q=“body:(queryA)”, fl=“author”),
>                          walk ="author->user”,
>                          gather="friend"),
>              walk=“friend -> user”,
>              gather="friend",
>              scatter=“branches, leaves”)
> {code}
> The expression above is evaluated as follows:
> 1) The inner search() expression is evaluated on the *articles* collection, emitting
a Stream of Tuples with the author field populated.
> 2) The inner gatherNodes() expression reads the Tuples form the search() stream and traverses
to the *friends* collection by performing a distributed join between articles.author and friends.user
field.  It gathers the value from the *friend* field during the join.
> 3) The inner gatherNodes() expression then emits the *friend* Tuples. By default the
gatherNodes function emits only the leaves which in this case are the *friend* tuples.
> 4) The outer gatherNodes() expression reads the *friend* Tuples and Traverses again in
the "friends" collection, this time performing the join between *friend* Tuples  emitted in
step 3. This collects the friend of friends.
> 5) The outer gatherNodes() expression emits the entire graph that was collected. This
is controlled by the "scatter" parameter. In the example the *root* nodes are the authors,
the *branches* are the author's friends and the *leaves* are the friend of friends.
> This traversal is fully distributed and cross collection.
> Like all streaming expressions the gather nodes expression can be combined with other
streaming expressions. For example the following expression uses a hashJoin to intersect the
network of friends rooted to authors found with different queries:
> {code}
> hashInnerJoin(
>                       gatherNodes(friends,
>                                   gatherNodes(friends
>                                               search(articles, q=“body:(queryA)”,
fl=“author”),
>                                               walk ="author->user”,
>                                               gather="friend"),
>                                   walk=“friend -> user”,
>                                   gather="friend",
>                                   scatter=“branches, leaves”),
>                        gatherNodes(friends,
>                                   gatherNodes(friends
>                                               search(articles, q=“body:(queryB)”,
fl=“author”),
>                                               walk ="author->user”,
>                                               gather="friend"),
>                                   walk=“friend -> user”,
>                                   gather="friend",
>                                   scatter=“branches, leaves”),
>                       on=“friend”
>          )
> {code}
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message