lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ishan Chattopadhyaya (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-7090) Cross collection join
Date Mon, 09 Feb 2015 11:08:34 GMT

     [ https://issues.apache.org/jira/browse/SOLR-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ishan Chattopadhyaya updated SOLR-7090:
---------------------------------------
    Attachment: SOLR-7090.patch

Here's an implementation for this using a value source, backed by a per core cache.

Here's how to use:

Add this to solrconfig.xml's <query> section,

    <cache name="join"
                class="solr.LRUCache"
                size="4096"
                initialSize="1024"
                autowarmCount="1024"
               regenerator="org.apache.solr.util.SolrPluginUtils$IdentityRegenerator"
                />

At query time, the "coljoin" function can be used:
coljoin(fromCollection,fromKey,fromVal,toKey)

fromCollection: the name of the secondary/"from" collection to be joined from
fromKey: the field name of the foreign key in the "from" collection to be joined against
fromVal: the field name of the value to be returned from "from" collection
toKey: the field name of the key in primary collection to be joined against 

Implementation details:
All values from the secondary collection are fetched at the primary collection's cores and
cached into an LRU "join" cache. An executor thread runs continuously in the background to
update the cache (by fetching values again from secondary collection) at specified intervals
(in this patch this is 2000ms).

> Cross collection join
> ---------------------
>
>                 Key: SOLR-7090
>                 URL: https://issues.apache.org/jira/browse/SOLR-7090
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Ishan Chattopadhyaya
>             Fix For: 5.1
>
>         Attachments: SOLR-7090.patch
>
>
> Although SOLR-4905 supports joins across collections in Cloud mode, there are limitations,
(i) the secondary collection must be replicated at each node where the primary collection
has a replica, (ii) the secondary collection must be singly sharded.
> This issue explores ideas/possibilities of cross collection joins, even across nodes.
This will be helpful for users who wish to maintain boosts or signals in a secondary, more
frequently updated collection, and perform query time join of these boosts/signals with results
from the primary collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message