lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Schild <>
Subject Spellchecker for multiple sites (and languages?)
Date Mon, 26 Nov 2012 14:52:28 GMT

we are a long time nutch user (Since 0.7)
Now we made the big jump from 0.9 to 1.5 and solr 4.0

We use it to index different websites and then provide site specific 
search for these.

Currently we index the sites and store them all in one solr instance.
The different sites are separated via the host entry in solr, this works 

An important thing is, that each site can have text in multiple 
languages (For example en, de, fr, cn etc.)
We separate the via the lang flag (thins works fine)

We now with to integrate the spellchecker to provide the "Did you 
mean...." functionality.
This works only partly fine, since it will always have a word list over 
all sites and all languages....
We would need to have a wordlist/spellchecker (based on the content 
field) which is "separate" for each site and language.

What would a clean way to solve this requirement bee ?

When we create a solr instance per site, then we would at least get the 
wordlist separated by site,
but then we still have the problem on separating them by language.....

Any ideas/hints ?

With best regards

Aarboard AG    Phone: +41 32 332 97 14
Egliweg 10     Fax:   +41 32 332 97 15
2560 Nidau

View raw message