lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <mkhlud...@griddynamics.com>
Subject Re: paging vs streaming. spawn from (Processing a lot of results in Solr)
Date Sun, 28 Jul 2013 05:28:43 GMT
On Sun, Jul 28, 2013 at 1:25 AM, Yonik Seeley <yonik@lucidworks.com> wrote:

>
> Which part is problematic... the creation of the DocList (the search),
>
Literally DocList is a copy of TopDocs. Creating TopDocs is not a search,
but ranking.
And ranking costs is log(rows+start) beside of numFound, which the search
takes.
Interesting that we still pay that log() even if ask for collecting docs
as-is with _docid_


> or it's memory requirements (an int per doc)?
>
TopXxxCollector as well as XxxComparators allocates same [rows+start]

it's clear that after we have deep paging, we need to handle heaps just
with size of rows (without start).
It's fairly ok, if we use Solr like site navigation engine, but it's
'sub-optimal' for data analytic use-cases, where we need something like
SELECT * FROM ... in rdbms. In this case any memory allocation on billions
docs index is a bummer. That's why I'm asking about removing heap based
collector/comparator.


> -Yonik
> http://lucidworks.com
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mkhludnev@griddynamics.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message