lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saïd Radhouani <r.steve....@gmail.com>
Subject Multilingual - Search against the appropriate field
Date Thu, 01 Jul 2010 10:26:23 GMT
Hi,

I know this topic has been treated many times in the (distant) past, but I wonder whether
there are new better practices/tendencies.

In my application, I'm dealing with documents in different languages. Each document is monolingual;
it has some fields containing free text and a set of fields that do not require any text analysis.
For the free text, we need to make a specific analysis based of the language of the document.

I'm for the use of a single index for all the documents instead of one index per language
(any objection?). Thus, in schema.xml, I need to declare a separate field for each language
(text_fr, text_en, etc.), each with its own appropriate analysis. Then, during the indexing,
I need to assign the free text content of each document to the appropriate field. Thus, for
each document, only one of the freetext fields would be populated.

My question is, at search time, what is the best solution to search against the appropriate
field?

I know that using dismax, we can define in "qf" the set the fields we want to search against.
e.g., <str name="qf"> text_fr text_en</str>

With this solution, does Solr choose the appropriate analysis for the query. i.e., if a query
is compared to a document having English free text (text_en is populated), does Solr analyze
the query as it was in English ?

One problem with this approach is that, each query will be compared to all the available documents.
i.e., a query in English would be compared to a document in French. As I know, if we know
the query language, this problem can be avoided, either by searching against the appropriate
field (e.g., text_fr:query), or by using a filter to select only those documents having English
text. Am I correct? Or is there a better solution?

Thanks,
-Saïd


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message