lucene-openrelevance-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Getting Started
Date Fri, 31 Jul 2009 23:23:20 GMT
Grant Ingersoll wrote:
> OK, so how do we get this started?  Seems like there are a lot of 
> collections out there we could use.  Also, we can crawl.  Seems the 
> tricky part is getting judgments.

I think we should establish first what kind of relevance judgments we 
want to collect:

1. given a corpus, and a query, define an ordered list of top-N 
documents that are relevant to the query. This is our baseline. Getting 
this sort of information is very time-consuming and subjective.

2. given a corpus, a query and a list of top-N results obtained from a 
real search, define what results are relevant and how they should be 
ordered. The reviewed list of top-N results becomes then the initial 
approximation of our baseline. Calculate a distance metric between real 
and reviewed result, and adjust ranking to maximize this metric.

The second scenario could be handled by a webapp, which could present 
the following areas of functionality:

* corpus selection and browsing

* searching using selected search impl and its ranking parameters, and 
storing tuples of <corpus, impl, query, results>

* review of the results (marking relevant / non-relevant, reordering), 
and saving of tuples <corpus, impl, query, reviewed results>

* calculation of distance metrics.

* adjustment of ranking parameters for a given search implementation.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Mime
View raw message