lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Green <Stephen.Gr...@sun.com>
Subject Re: Open Source Relevance
Date Fri, 23 May 2008 13:58:16 GMT

On May 21, 2008, at 10:23 PM, Grant Ingersoll wrote:

>
> On May 21, 2008, at 8:26 AM, Stephen Green wrote:
>
>> Grant Ingersoll wrote:
>>
>>> Cool, hadn't seen that.
>>
>> Hi folks.  Long time lurker (in RSS), first time mailer.  I just  
>> wanted to say that (obviously) I think this is a great idea and we  
>> should try to push it a little further along.  I posted a bit more  
>> about it in my blog this morning:
>>
>> http://blogs.sun.com/searchguy/entry/open_source_trec_trecmentum
>>
>> The practical upshot:  I'd be more than happy to participate in  
>> this and to try to get data sources and queries from Sun or  
>> elsewhere.  I'd also be up for trying to find some place to host  
>> the collections and maybe even try to figure out some way that we  
>> could get computing resources to run the evaluations.  No  
>> guarantees on that (I'm sure a Sun Lawyer's ears are burning  
>> somewhere right now, just for me having said that!), but I'm  
>> willing to tilt at that windmill.
>
> I don't think we want to be in the collection business.  It is a lot  
> of work and a serious amount of legal issues.  I am just proposing  
> we come up w/ questions and judgments for already existing, freely  
> available collections.  There are plenty of them out there, we just  
> need some scripts, etc. to make it easy for people to download like  
> we do already with Wikipedia.

The problem I see in relying on relying on collections that are held  
elsewhere is that they could go away at any time and there goes all  
our investment in creating evaluations.  I'm willing to take a crack  
at the folks here to see if we could get permission (and lawyer  
approval?) for hosting some collections.

Wikipedia's a pretty easy one to start with, then the OpenSolaris  
mailing lists (probably just as easy:  we already host them and I know  
some of the folks involved), then maybe a blog crawl and a small Web  
crawl (anyone got a Nutch going anywhere?)

I'm pretty sure that we could do an evaluation wiki on wikis.sun.com.   
I like the idea you gave in your blog of having to submit source code  
for the runs if you want to put up your results.  This is indeed one  
of the most aggravating things about implementing search algorithms  
described in papers and it would definitely drive everyone forward.

>> TREC had a huge impact on the academic and commercial IR  
>> communities and I think an OSTREC (see, it's already got a cool  
>> acronym!) could benefit all of us (it would give us bragging rights  
>> if nothing else :-)
>
>
> Cool name, don't care much about bragging rights, just want to spur  
> on further improvements in scoring, etc.

OK, OSTREC it is.  I'll start talking to my management (being in the  
Labs makes this a little easier) and  I'll try not to brag too much if  
you (all) won't!

Steve
-- 
Stephen Green                      //   Stephen.Green@sun.com
Principal Investigator             \\   http://blogs.sun.com/searchguy
Aura Project                       //   Voice: +1 781-442-0926
Sun Microsystems Labs              \\   Fax:   +1 781-442-1692




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message