From solr-user-return-74283-apmail-lucene-solr-user-archive=lucene.apache.org@lucene.apache.org Wed Nov 7 16:20:40 2012 Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BFD50D66C for ; Wed, 7 Nov 2012 16:20:40 +0000 (UTC) Received: (qmail 37427 invoked by uid 500); 7 Nov 2012 16:08:23 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 37192 invoked by uid 500); 7 Nov 2012 16:08:14 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 36596 invoked by uid 99); 7 Nov 2012 16:08:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Nov 2012 16:08:00 +0000 X-ASF-Spam-Status: No, hits=0.8 required=5.0 tests=HELO_DYNAMIC_DHCP,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [78.138.121.70] (HELO node-01-58.inetsiteworld.net) (78.138.121.70) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Nov 2012 16:07:54 +0000 X-MP-Spam-Status: No X-MP-Watermark: 1352909252.01871@Wkfe87pUlMFrdwYHpp6APA X-MP-From: andre.widhani@digicol.de X-MP-IP-Protocol: IPv4 X-MP-Result: NOT scanned for Spam, Virus or Trojan (please contact http://inetsiteworld.de) X-MP-ID: 9E79A422132.A559A X-MP-Info: For information about MailProtector contact http://inetsiteworld.de Received: from dico-l3-exch-01.digicol.loc (unknown [195.122.173.22]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by node-01-58.inetsiteworld.net (Postfix) with ESMTPS id 9E79A422132 for ; Wed, 7 Nov 2012 17:07:31 +0100 (CET) Received: from DICO-L3-EXCH-01.digicol.loc ([fe80::350b:1e89:2b78:a0ef]) by dico-l3-exch-01.digicol.loc ([fe80::350b:1e89:2b78:a0ef%16]) with mapi id 14.02.0309.002; Wed, 7 Nov 2012 17:07:25 +0100 From: =?iso-8859-1?Q?Andr=E9_Widhani?= To: "solr-user@lucene.apache.org" Subject: AW: Stemmer German2 Thread-Topic: Stemmer German2 Thread-Index: AQHNvMbAApfTv/NOZE6NZzM2ch95gpfeH87MgABSNICAABfqlw== Date: Wed, 7 Nov 2012 16:07:24 +0000 Message-ID: References: <509A23BB.7050404@informatik.uni-leipzig.de> ,<509A80D7.2060804@informatik.uni-leipzig.de> In-Reply-To: <509A80D7.2060804@informatik.uni-leipzig.de> Accept-Language: de-DE, en-US Content-Language: de-DE X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.30.132.225] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org No - you need to restart Solr to pick up the changes to the schema and you = need to re-index the existing documents.=0A= =0A= Regards,=0A= Andr=E9=0A= =0A= ________________________________________=0A= Von: Andreas Niekler [aniekler@informatik.uni-leipzig.de]=0A= Gesendet: Mittwoch, 7. November 2012 16:40=0A= An: solr-user@lucene.apache.org=0A= Betreff: Re: Stemmer German2=0A= =0A= Hello,=0A= =0A= thanks for the advice. If i now change the schema that my lowercase=0A= factory is before the stemmer. is the index updating itself after the=0A= change? How could i achieve this. I stored all values within the index.=0A= =0A= Thanks=0A= =0A= andreas=0A= =0A= Am 07.11.2012 10:47, schrieb Andr=E9 Widhani:=0A= > Do you use the LowerCaseFilterFactory filter in your analysis chain? You = will probably want to add it and if you aready have, make sure it is _befor= e_ the stemming filter so you get consistent results regardless of lower- o= r uppercase spelling.=0A= >=0A= > You can protect words from being subject to stemming by adding a KeyWordM= arkerFilterFactory filter before the stemmer, protected words are in a text= file. This should be placed after the lower case filter so you can use low= er csase terms in the file.=0A= >=0A= > Some stemmer classes like SnowballPorterFilterFactory also allow you to p= ass a "protected" attribute (again pointing to a file).=0A= >=0A= > All of this is on the Solr wiki (AnalyzersTokenizersTokenFilters, Languag= eAnalysis) if you need more details.=0A= >=0A= > Regards,=0A= > Andr=E9=0A= >=0A= > ________________________________________=0A= > Von: Andreas Niekler [aniekler@informatik.uni-leipzig.de]=0A= > Gesendet: Mittwoch, 7. November 2012 10:02=0A= > An: solr-user@lucene.apache.org=0A= > Betreff: Stemmer German2=0A= >=0A= > Dear List,=0A= >=0A= > i have an unwanted behavior with the German2 Stemmer. For example the=0A= > river Elbe:=0A= >=0A= > If i input elbe - the word gets reduced to elb=0A= > If i input Elbe - everything is ok and elbe is stored to the index.=0A= >=0A= > If i now query for elbe or Elbe i get of course differnt Results=0A= > allowing the users not either use Elbe or elbe to get the same results.= =0A= >=0A= > Can i insert an exception list to the Stemmer. Otherwise we will have a= =0A= > very hard time explaining some users why this is happaning for some words= .=0A= >=0A= > Thank you=0A= >=0A= > Andreas=0A= >=0A= > --=0A= > Andreas Niekler, Dipl. Ing. (FH)=0A= > NLP Group | Department of Computer Science=0A= > University of Leipzig=0A= > Johannisgasse 26 | 04103 Leipzig=0A= >=0A= > mail: aniekler@informatik.uni-leipzig.deg.de=0A= >=0A= =0A= --=0A= Andreas Niekler, Dipl. Ing. (FH)=0A= NLP Group | Department of Computer Science=0A= University of Leipzig=0A= Johannisgasse 26 | 04103 Leipzig=0A= =0A= mail: aniekler@informatik.uni-leipzig.deg.de=0A=