lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manish Bafna <manish.bafna...@gmail.com>
Subject Re: Stemming and other tokenizers
Date Mon, 12 Sep 2011 09:37:58 GMT
What is single document has multiple languages?

On Mon, Sep 12, 2011 at 2:23 PM, Jan Høydahl <jan.asf@cominvent.com> wrote:

> Hi
>
> Everybody else use dedicated field per language, so why can't you?
> Please explain your use case, and perhaps we can better help understand
> what you're trying to do.
> Do you always know the query language in advance?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 12. sep. 2011, at 08:28, Patrick Sauts wrote:
>
> > I can't create one field per language, that is the problem but I'll dig
> into
> > it following your indications.
> > I let you know what I could come out with.
> >
> > Patrick.
> >
> > 2011/9/11 Jan Høydahl <jan.asf@cominvent.com>
> >
> >> Hi,
> >>
> >> You'll not be able to detect language and change stemmer on the same
> field
> >> in one go. You need to create one fieldType in your schema per language
> you
> >> want to use, and then use LanguageIdentification (SOLR-1979) to do the
> magic
> >> of detecting language and renaming the field. If you set
> >> langid.override=false, languid.map=true and populate your "language"
> field
> >> with the known language, you will probably get the desired effect.
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >> Solr Training - www.solrtraining.com
> >>
> >> On 10. sep. 2011, at 03:24, Patrick Sauts wrote:
> >>
> >>> Hello,
> >>>
> >>>
> >>>
> >>> I want to implement some king of AutoStemming that will detect the
> >> language
> >>> of a field based on a tag at the start of this field like #en# my field
> >> is
> >>> stored on disc but I don't want this tag to be stored. Is there a way
> to
> >>> avoid this field to be stored ?
> >>>
> >>> To me all the filters and the tokenizers interact only with the indexed
> >>> field and not the stored one.
> >>>
> >>> Am I wrong ?
> >>>
> >>> Is it possible to you to do such a filter.
> >>>
> >>>
> >>>
> >>> Patrick.
> >>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message