lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Akmal Sarhan">
Subject Re: New Lucene-powered Website
Date Fri, 28 Nov 2003 08:26:11 GMT
nice and fast ;-)

would be interesting though to know how you implemented the "summarizer".

----- Original Message ----- 
From: "Ulrich Mayring" <>
To: <>
Sent: Thursday, November 27, 2003 12:29 PM
Subject: New Lucene-powered Website

> Hello,
> we (DENIC) are the world's second largest domain registry (.de-zone has 
> almost 6.9 million domains) and are using Lucene to index and search our 
> website in a high-traffic scenario. Most of our web pages are available 
> in English in addition to our native language German. If you want to try 
> our Lucene-based search engine, please start here:
> Use the input field on the page to search our website. Don't use the 
> input field at the top right, that is only for searching domains in our 
> domain database, it has nothing to do with Lucene.
> The indexes for German and English are seperate, so you should find only 
> English pages from that page.
> A somewhat interesting feature is the summarizer, on the results page 
> you'll get a short summary of the page. These are not hand-written 
> blurbs, rather they are generated automatically from the HTML pages at 
> indexing time. I'd be especially interested in improvement suggestions 
> in this area.
> Naturally, the automatically generated texts don't have the same quality 
> as hand-written ones. But they're better than nothing and in my eyes 
> more useful than Google-style excerpts. How many times has it happened 
> to you that the Google excerpt doesn't really tell you anything, because 
> it's totally out of context? Summaries tell you what the whole page is 
> about, irregardless of the context within which your search terms may 
> appear. After reading the summary you should (hopefully) be able to 
> decide whether the page contains the info you're looking for. Comments 
> welcome!
> We're using the snowball stemmers/analyzers for German and English, 
> custom stopword lists and the HTML parser from the Sourceforge 
> htmlparser project. Apart from that it's vanilla Lucene.
> cheers,
> Ulrich
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message