lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Bernstein (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-4465) Configurable Collectors
Date Mon, 08 Jul 2013 14:53:50 GMT

    [ https://issues.apache.org/jira/browse/SOLR-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702039#comment-13702039
] 

Joel Bernstein edited comment on SOLR-4465 at 7/8/13 2:52 PM:
--------------------------------------------------------------

Otis,

The implementation in this ticket is a POC to explore how pluggable collectors could be used.
I think the best mechanism for expanding collector functionality though is through expanded
uses of PostFilters.

In order to make this approach viable two things need to be done. First, grouping needs to
be revamped so that it plays nicely with the PostFilter framework. Second, in a distributed
environment we need a way to merge the output from PostFilters. 

Here are three tickets that are likely to come out of these requirements:   

1) Create a field collapsing PostFilter. This will involve a small change to the PostFilter
api so it might best be done in Solr 5. This PostFilter will handle only the collapsing part
of the grouping functionality.

2) Add a Grouping search component to handle the rest of the grouping functionality. This
component will work with the collapsed docList generated by the field collapsing PostFilter.
Breaking up the grouping functionality like this should make it more flexible and easier to
maintain.

3) Add a Search component that allows for pluggable merging of output from shards. This would
allow aggregating PostFilters to be developed and used with distributed search. It would also
likely allow custom ranking collectors to be inserted through the PostFilter mechanism.






                
      was (Author: joel.bernstein):
    Otis,

The implementation in this ticket is a POC to explore how pluggable collectors plugged could
be used. I think the best mechanism for expanding collector functionality though is through
expanded uses of PostFilters.

In order to make this approach viable two things need to be done. First, grouping needs to
be revamped so that it plays nicely with the PostFilter framework. Second, in a distributed
environment we need a way to merge the output from PostFilters. 

Here are three tickets that are likely to come out of these requirements:   

1) Create a field collapsing PostFilter. This will involve a small change to the PostFilter
api so it might best be done in Solr 5. This PostFilter will handle only the collapsing part
of the grouping functionality.

2) Add a Grouping search component to handle the rest of the grouping functionality. This
component will work with the collapsed docList generated by the field collapsing PostFilter.
Breaking up the grouping functionality like this should make it more flexible and easier to
maintain.

3) Add a Search component that allows for pluggable merging of output from shards. This would
allow aggregating PostFilters to be developed and used with distributed search. It would also
likely allow custom ranking collectors to be inserted through the PostFilter mechanism.






                  
> Configurable Collectors
> -----------------------
>
>                 Key: SOLR-4465
>                 URL: https://issues.apache.org/jira/browse/SOLR-4465
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 4.1
>            Reporter: Joel Bernstein
>             Fix For: 4.4
>
>         Attachments: SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch,
SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch,
SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch,
SOLR-4465.patch
>
>
> This ticket provides a patch to add pluggable collectors to Solr. This patch was generated
and tested with Solr 4.1.
> This is how the patch functions:
> Collectors are plugged into Solr in the solconfig.xml using the new collectorFactory
element. For example:
> <collectorFactory name="default" class="solr.CollectorFactory"/>
> <collectorFactory name="sum" class="solr.SumCollectorFactory"/>
> The elements above define two collector factories. The first one is the "default" collectorFactory.
The class attribute points to org.apache.solr.handler.component.CollectorFactory, which implements
logic that returns the default TopScoreDocCollector and TopFieldCollector. 
> To create your own collectorFactory you must subclass the default CollectorFactory and
at a minimum override the getCollector method to return your new collector. 
> The parameter "cl" turns on pluggable collectors:
> cl=true
> If cl is not in the parameters, Solr will automatically use the default collectorFactory.
> *Pluggable Doclist Sorting With the Docs Collector*
> You can specify two types of pluggable collectors. The first type is the docs collector.
For example:
> cl.docs=<name>
> The above param points to a named collectorFactory in the solrconfig.xml to construct
the collector. The docs collectorFactorys must return a collector that extends the TopDocsCollector
base class. Docs collectors are responsible for collecting the doclist.
> You can specify only one docs collector per query.
> You can pass parameters to the docs collector using local params syntax. For example:
> cl.docs=\{! sort=mycustomesort\}mycollector
> If cl=true and a docs collector is not specified, Solr will use the default collectorFactory
to create the docs collector.
> *Pluggable Custom Analytics With Delegating Collectors*
> You can also specify any number of custom analytic collectors with the "cl.analytic"
parameter. Analytic collectors are designed to collect something else besides the doclist.
Typically this would be some type of custom analytic. For example:
> cl.analytic=sum
> The parameter above specifies a analytic collector named sum. Like the docs collectors,
"sum" points to a named collectorFactory in the solrconfig.xml. You can specificy any number
of analytic collectors by adding additional cl.analytic parameters.
> Analytic collector factories must return Collector instances that extend DelegatingCollector.

> A sample analytic collector is provided in the patch through the org.apache.solr.handler.component.SumCollectorFactory.
> This collectorFactory provides a very simple DelegatingCollector that groups by a field
and sums a column of floats. The sum collector is not designed to be a fully functional sum
function but to be a proof of concept for pluggable analytics through delegating collectors.
> You can send parameters to analytic collectors with solr local param syntax.
> For example:
> cl.analytic=\{! id=1 groupby=field1 column=field2\}sum
> The "id" parameter is mandatory for analytic collectors and is used to identify the output
from the collector. In this example the "groupby" and "column" params tell the sum collector
which field to group by and sum.
> Analytic collectors are passed a reference to the ResponseBuilder and can place maps
with analytic output directory into the SolrQueryResponse with the add() method.
> Maps that are placed in the SolrQueryResponse are automatically added to the outgoing
response. The response will include a list named cl.analytic.<id>, where id is specified
in the local param.
> *Distributed Search*
> The CollectorFactory also has a method called merge(). This method aggregates the results
from each of the shards during distributed search. The "default" CollectoryFactory implements
the default merge logic for merging documents from each shard. If you define a different docs
collector you can override the default merge method to merge documents in accordance with
how they are collected at the shard level.
> With analytic collectors, you'll need to override the merge method to merge the analytic
output from the shards. An example of how this works is provided in the SumCollectorFactory.
> Each collectorFactory, that is specified in the http parameters, will have its merge
method applied by the Solr aggregator node.
> *Testing the Patch With Sample Data*
> 1) Apply patch to Solr 4.1
> 2) Load sample data
> 3) Send the http command:
> http://localhost:8983/solr/select?q=*:*&cl=true&facet=true&facet.field=manu_id_s&cl.analytic=%7B!+id=%271%27+groupby=%27manu_id_s%27+column=%27price%27%7Dsum
>   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message