lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Zambrano <czamb...@gmail.com>
Subject Re: Problems with WordDelimiterFilterFactory
Date Thu, 08 Oct 2009 22:45:48 GMT
Bern,

The only way that could be happening is if you are not using the field 
type you described on your original e-mail. The TokenFilter 
WordDelimiterFilterFactory should take care of the hyphen.

On 10/08/2009 05:30 PM, Bernadette Houghton wrote:
> Thanks for this Patrick. If I remove one of the hyphens, solr doesn't throw up the error,
but still doesn't find the right record. I see from marklo's analysis page that solr is still
parsing it with a hyphen. Changing this part of our schema.xml -
>
>          <filter class="solr.PatternReplaceFilterFactory"
>                  pattern="([^a-z])" replacement="" replace="all"
>          />
>
> To
>
>          <filter class="solr.PatternReplaceFilterFactory"
>                  pattern="([^a-z])" replacement=" " replace="all"
>          />
>
> i.e. replacing non-alpha chars with a space, looks like it may handle that aspect.
>
> Regards
> Bern
>
> -----Original Message-----
> From: Patrick Jungermann [mailto:patrick.jungermann@googlemail.com]
> Sent: Friday, 9 October 2009 9:03 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Problems with WordDelimiterFilterFactory
>
> Hi Bern,
>
> the problem is the character sequence "--". A query is not allowed to
> have minus characters that consequent upon another one. Remove one minus
> character and the query will be parsed without problems.
>
> Because of this parsing problem, I'd recommend a query cleanup before
> the submit to the Solr server that replaces each sequence of minus
> characters by a single one.
>
>
> Regards, Patrick
>
>
>
> Bernadette Houghton schrieb:
>    
>> Sorry, the last line was truncated -
>>
>> HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse '(Asia
-- Civilization AND status_i:(2)) ': Encountered "-" at line 1, column 7. Was expecting one
of: "(" ... "*" ...<QUOTED>  ...<TERM>  ...<PREFIXTERM>  ...<WILDTERM>
 ... "[" ... "{" ...<NUMBER>  ...
>>
>> -----Original Message-----
>> From: Bernadette Houghton [mailto:bernadette.houghton@deakin.edu.au]
>> Sent: Friday, 9 October 2009 8:22 AM
>> To: 'solr-user@lucene.apache.org'
>> Subject: RE: Problems with WordDelimiterFilterFactory
>>
>> Here's the query and the error -
>>
>> Oct 09 08:20:17  [debug] [196] Solr query string:    (Asia -- Civilization AND status_i:(2))
>> Oct 09 08:20:17  [debug] [196] Solr sort by:  score desc
>> Oct 09 08:20:17  [error] Error on searching: "400" Status: org.apache.lucene.queryParser.ParseException:
Cannot parse '   (Asia -- Civilization AND status_i:(2)) ': Encount
>>
>> Bern
>>
>> -----Original Message-----
>> From: Christian Zambrano [mailto:czambran@gmail.com]
>> Sent: Thursday, 8 October 2009 12:48 PM
>> To: solr-user@lucene.apache.org
>> Cc: solr-user@lucene.apache.org
>> Subject: Re: Problems with WordDelimiterFilterFactory
>>
>> Bern,
>>
>> I am interested on the solr query. In other words, the query that your
>> system sends to solr.
>>
>> Thanks,
>>
>>
>> Christian
>>
>> On Oct 7, 2009, at 5:56 PM, Bernadette Houghton<bernadette.houghton@deakin.edu.au
>>   >  wrote:
>>
>>      
>>> Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:30000601
>>>
>>> Either scroll down and click one of the "television broadcasting --
>>> asia" links, or type it in the Quick Search box.
>>>
>>>
>>> TIA
>>>
>>> bern
>>>
>>> -----Original Message-----
>>> From: Christian Zambrano [mailto:czambran@gmail.com]
>>> Sent: Thursday, 8 October 2009 9:43 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Problems with WordDelimiterFilterFactory
>>>
>>> Could you please provide the exact URL of a query where you are
>>> experiencing this problem?
>>> eg(Not URL encoded): q=fieldName:"hot and cold: temperatures"
>>>
>>> On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
>>>        
>>>> We are having some issues with our solr parent application not
>>>> retrieving records as expected.
>>>>
>>>> For example, if the input query includes a colon (e.g. hot and
>>>> cold: temperatures), the relevant record (which contains a colon in
>>>> the same place) does not get retrieved; if the input query does not
>>>> include the colon, all is fine.  Ditto if the user searches for a
>>>> query containing hyphens, e.g. "asia - civilization, although with
>>>> the qualifier that something like "asia-civilization" (no spaces
>>>> either side of the hyphen) works fine, whereas "asia -
>>>> civilization" (spaces either side of hyphen) doesn't work.
>>>>
>>>> Our schema.xml contains the following -
>>>>
>>>>      <fieldType name="text" class="solr.TextField"
>>>> positionIncrementGap="100">
>>>>        <analyzer type="index">
>>>>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>          <!-- in this example, we will only use synonyms at query time
>>>>          <filter class="solr.SynonymFilterFactory"
>>>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>>>>          -->
>>>>                                  <filter
>>>> class="solr.ISOLatin1AccentFilterFactory"/>
>>>>          <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>> words="stopwords.txt"/>
>>>>          <filter class="solr.WordDelimiterFilterFactory"
>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>>> catenateNumbers="1" catenateAll="0"/>
>>>>          <filter class="solr.LowerCaseFilterFactory"/>
>>>>          <filter class="solr.EnglishPorterFilterFactory"
>>>> protected="protwords.txt"/>
>>>>          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>>>        </analyzer>
>>>>        <analyzer type="query">
>>>>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>                                  <filter
>>>> class="solr.ISOLatin1AccentFilterFactory"/>
>>>>          <filter class="solr.SynonymFilterFactory"
>>>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>>>          <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>> words="stopwords.txt"/>
>>>>          <filter class="solr.WordDelimiterFilterFactory"
>>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>>> catenateNumbers="0" catenateAll="0"/>
>>>>          <filter class="solr.LowerCaseFilterFactory"/>
>>>>          <filter class="solr.EnglishPorterFilterFactory"
>>>> protected="protwords.txt"/>
>>>>          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>>>        </analyzer>
>>>>      </fieldType>
>>>>
>>>> Bernadette Houghton, Library Business Applications Developer
>>>> Deakin University Geelong Victoria 3217 Australia.
>>>> Phone: 03 5227 8230 International: +61 3 5227 8230
>>>> Fax: 03 5227 8000 International: +61 3 5227 8000
>>>> MSN: bern_houghton@hotmail.com
>>>> Email: bernadette.houghton@deakin.edu.au<mailto:bernadette.houghton@deakin.edu.au
>>>> Website: http://www.deakin.edu.au
>>>> <http://www.deakin.edu.au/>Deakin University CRICOS Provider Code
>>>> 00113B (Vic)
>>>>
>>>> Important Notice: The contents of this email are intended solely
>>>> for the named addressee and are confidential; any unauthorised use,
>>>> reproduction or storage of the contents is expressly prohibited. If
>>>> you have received this email in error, please delete it and any
>>>> attachments immediately and advise the sender by return email or
>>>> telephone.
>>>> Deakin University does not warrant that this email and any
>>>> attachments are error or virus free
>>>>
>>>>
>>>>
>>>>          
>    

Mime
View raw message