lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicole Lacoste <niki.laco...@gmail.com>
Subject Re: What are the best practices on Multiple Language support in Solr Cloud ?
Date Fri, 02 May 2014 14:06:38 GMT
Hi Shamik,

I don't have an answer for you, just a couple of comments.

Why not use dynamic field definitions in the schema? As you say most of
your fields are not analysed you just add a language tag _en, _fr, _de,
...) to the field when you index or query.  Then you can add languages as
you need without having to touch the schema.  For fields that you do
analyse (stop words or synonyms) then you'll have to explicitly define a
field type for them.  My experience with docs that are in two or three main
languages is that single core or multi-core has not been that critical,
sharding and replication made a bigger difference to us.  You could put
english in one core and everything else in another.

What we tried to do was just index stuff to the same field, that is french
and english getting indexed to contents or title field (we have our own
tokenizer and filter chain so did actually analyse them differently) but we
got into lots of problems with tf-idf, so I'd advise to not do that. The
motivation was that we wanted multi-ligual results. Terry's approach here
is much better, and as you thought is addressing the multi-lingual
requirement, but I still don't think it totally addresses the tf-idf
problem. So if you don't need multilingual don't go that route.

I am curious to see what other people think.

Niki

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message