lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Li <fancye...@gmail.com>
Subject Re: a bug of solr distributed search
Date Mon, 26 Jul 2010 02:27:15 GMT
where is the link of this patch?

2010/7/24 Yonik Seeley <yonik@lucidimagination.com>:
> On Fri, Jul 23, 2010 at 2:23 PM, MitchK <mitch91@web.de> wrote:
>> why do we do not send the output of TermsComponent of every node in the
>> cluster to a Hadoop instance?
>> Since TermsComponent does the map-part of the map-reduce concept, Hadoop
>> only needs to reduce the stuff. Maybe we even do not need Hadoop for this.
>> After reducing, every node in the cluster gets the current values to compute
>> the idf.
>> We can store this information in a HashMap-based SolrCache (or something
>> like that) to provide constant-time access. To keep the values up to date,
>> we can repeat that after every x minutes.
>
> There's already a patch in JIRA that does distributed IDF.
> Hadoop wouldn't be the right tool for that anyway... it's for batch
> oriented systems, not low-latency queries.
>
>> If we got that, it does not care whereas we use doc_X from shard_A or
>> shard_B, since they will all have got the same scores.
>
> That only works if the docs are exactly the same - they may not be.
>
> -Yonik
> http://www.lucidimagination.com
>

Mime
View raw message