lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Wikia search goes live today
Date Tue, 08 Jan 2008 22:24:01 GMT
Ryan McKinley wrote:
> Andrzej Bialecki wrote:
>> Lukas Vlcek wrote:
>>> So staring will be accommodated only during indexing phase. Does it 
>>> mean it
>>> will be pretty static value not a dynamically changing variable... 
>>> correct?
>>> In other words if I add my starts to some document it won't affect the
>>> scoring immediately but after indexing cycle. Correct?
>>
>> (I'm not involved in Wikia development). There are some ways to go 
>> about it even in the pure Lucene-land, so that the updates are fast 
>> without reindexing the main content. Hint: ParallelReader.
>>
> 
> in solr (1.3-dev) you can have an external value source with a function 
> query...

True, although function query tends to bring more overhead ...

While we're on the subject of complex scoring - I read an interesting 
paper (I don't have a link now), which discussed a so called bucketed 
scoring. The idea is that if your basic scoring is good enough to ensure 
that top-N results are highly relevant, then you can split these results 
into buckets of k documents (let's say 10 ;) ), and within each bucket 
apply arbitrary re-ranking function, which is then very inexpensive to 
perform because of the limited number of documents.

Example: you have a large corpus of web pages, and you want home pages 
to appear first, even if they score somewhat lower - and it doesn't pay 
off to modify the base scoring, because of overfitting, i.e. the scoring 
would be good for home pages but poor for other relevant documents.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message