lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-5244) Full Search Result Export
Date Tue, 17 Sep 2013 21:22:08 GMT

    [ https://issues.apache.org/jira/browse/SOLR-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13769988#comment-13769988
] 

Mark Miller commented on SOLR-5244:
-----------------------------------

bq. I think scoring and ranking will be difficult because the priority queues will take up
too much memory and be too slow.

I agree that it will be more difficult and certainly a slower operation, but if you are looking
to export an entire results list, 'slow' is very relative and use case dependent.

My main interest in this is in 1 - it's a pretty common want. Using search for sub-selection
that can be processed by something else.

I think it would be great if that sub selection could come out ranked though - I think that
is also valuable for 1 - and while the other system could somehow rank, it would have to dupe
the lucene logic to do it as well. It would be nice to just be able to dump either way and
make your decision based on use case and speed reqs. It's obviously going to be much slower
though. And you would have to deal with huge result sets and limited ram of course.
                
> Full Search Result Export
> -------------------------
>
>                 Key: SOLR-5244
>                 URL: https://issues.apache.org/jira/browse/SOLR-5244
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 5.0
>            Reporter: Joel Bernstein
>            Priority: Minor
>             Fix For: 5.0
>
>         Attachments: SOLR-5244.patch
>
>
> It would be great if Solr could efficiently export entire search result sets without
scoring or ranking documents. This would allow external systems to perform rapid bulk imports
from Solr. It also provides a possible platform for exporting results to support distributed
join scenarios within Solr.
> This ticket provides a patch that has two pluggable components:
> 1) ExportQParserPlugin: which is a post filter that gathers a BitSet with document results
and does not delegate to ranking collectors. Instead it puts the BitSet on the request context.
> 2) BinaryExportWriter: Is a output writer that iterates the BitSet and prints the entire
result as a binary stream. A header is provided at the beginning of the stream so external
clients can self configure.
> Note:
> These two components will be sufficient for a non-distributed environment. 
> For distributed export a new Request handler will need to be developed.
> After applying the patch and building the dist or example, you can register the components
through the following changes to solrconfig.xml
> Register export contrib libraries:
> <lib dir="../../../dist/" regex="solr-export-\d.*\.jar" />
>  
> Register the "export" queryParser with the following line:
>  
> <queryParser name="export" class="org.apache.solr.export.ExportQParserPlugin"/>
>  
> Register the "xbin" writer:
>  
> <queryResponseWriter name="xbin" class="org.apache.solr.export.BinaryExportWriter"/>
>  
> The following query will perform the export:
> {code}
> http://localhost:8983/solr/collection1/select?q=*:*&fq={!export}&wt=xbin&fl=join_i
> {code}
> Initial patch supports export of four data-types:
> 1) Single value trie int, long and float
> 2) Binary doc values.
> The numerics are currently exported from the FieldCache and the Binary doc values can
be in memory or on disk.
> Since this is designed to export very large result sets efficiently, stored fields are
not used for the export.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message