lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli K" <system....@gmail.com>
Subject Re: multi-language searching with Solr
Date Mon, 05 May 2008 22:02:57 GMT
I searched the Solr list but not as much the Lucene list.  I will look
again to see if there is something there that might work with Solr.  I
rather leverage Solr, but if I have no choice I will to do this using
Lucene only.

Thanks,

Eli

On Mon, May 5, 2008 at 4:58 PM, Erick Erickson <erickerickson@gmail.com> wrote:
> You might want to bounce over to the Lucene user's list and search
>  for language. This topic has arisen many times and there's some good
>  discussion. And have you searched the solr users list of "language"? I
>  know it's turned up here as well.
>
>  Best
>  Erick
>
>
>
>  On Mon, May 5, 2008 at 4:28 PM, Eli K <system.out@gmail.com> wrote:
>
>  > Wouldn't this impact both indexing and search performance and the size
>  > of the index?
>  > It is also probable that I will have more then one free text fields
>  > later on and with at least 20 languages this approach does not seem
>  > very manageable.  Are there other options for making this work with
>  > stemming?
>  >
>  > Thanks,
>  >
>  > Eli
>  >
>  >
>  > On Mon, May 5, 2008 at 3:41 PM, Binkley, Peter
>  > <Peter.Binkley@ualberta.ca> wrote:
>  > > I think you would have to declare a separate field for each language
>  > >  (freetext_en, freetext_fr, etc.), each with its own appropriate
>  > >  stemming. Your ingestion process would have to assign the free text
>  > >  content for each document to the appropriate field; so, for each
>  > >  document, only one of the freetext fields would be populated. At search
>  > >  time, you would either search against the appropriate field if you know
>  > >  the search language, or search across them with "freetext_fr:query OR
>  > >  freetext_en:query OR ...". That way your query will be interpreted by
>  > >  each language field using that language's stemming rules.
>  > >
>  > >  Other options for combining indexes, such as copyfield or dynamic
>  > fields
>  > >  (see http://wiki.apache.org/solr/SchemaXml), would lead to a single
>  > >  field type and therefore a single type of stemming. You could always
>  > use
>  > >  copyfield to create an unstemmed common index, if you don't care about
>  > >  stemming when you search across languages (since you're likely to get
>  > >  odd results when a query in one language is stemmed according to the
>  > >  rules of another language).
>  > >
>  > >  Peter
>  > >
>  > >
>  > >
>  > >  -----Original Message-----
>  > >  From: Eli K [mailto:system.out@gmail.com]
>  > >  Sent: Monday, May 05, 2008 8:27 AM
>  > >  To: solr-user@lucene.apache.org
>  > >  Subject: multi-language searching with Solr
>  > >
>  > >  Hello folks,
>  > >
>  > >  Let me start by saying that I am new to Lucene and Solr.
>  > >
>  > >  I am in the process of designing a search back-end for a system that
>  > >  receives 20k documents a day and needs to keep them available for 30
>  > >  days.  The documents should be searchable on a free text field and on
>  > >  about 8 other fields.
>  > >
>  > >  One of my requirements is to index and search documents in multiple
>  > >  languages.  I would like to have the ability to stem and provide the
>  > >  advanced search features that are based on it.  This will only affect
>  > >  the free text field because the rest of the fields are in English.
>  > >
>  > >  I can find out the language of the document before indexing and I might
>  > >  be able to provide the language to search on.  I also need to have the
>  > >  ability to search across all indexed languages (there will be 20 in
>  > >  total).
>  > >
>  > >  Given these requirements do you think this is doable with Solr?  A
>  > major
>  > >  limiting factor is that I need to stick to the 1.2 GA version and I
>  > >  cannot utilize the multi-core features in the 1.3 trunk.
>  > >
>  > >  I considered writing my own analyzer that will call the appropriate
>  > >  Lucene analyzer for the given language but I did not see any way for it
>  > >  to access the field that specifies the language of the document.
>  > >
>  > >  Thanks,
>  > >
>  > >  Eli
>  > >
>  > >  p.s. I am looking for an experienced Lucene/Solr consultant to help
>  > with
>  > >  the design of this system.
>  > >
>  > >
>  >
>

Mime
View raw message