lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Bell <billnb...@gmail.com>
Subject Re: Question about Nested Span Near Query
Date Wed, 02 Mar 2011 02:05:08 GMT
I am not 100% sure. But I why did you not use the standard confix for "text" ?

    <fieldType name="text" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- Case insensitive stop word removal.
          add enablePositionIncrements=true in both the index and query
          analyzers to leave a 'gap' for more accurate phrase queries.
        -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>


You are using:

- <fieldtype name="text" class="solr.TextField">
- <analyzer>
  <tokenizer class="solr.StandardTokenizerFactory"
luceneMatchVersion="LUCENE_29" />
  <filter class="solr.StandardFilterFactory" />
  <filter class="solr.LowerCaseFilterFactory" />
- <!--
 <filter class="solr.StopFilterFactory" luceneMatchVersion="LUCENE_29"/>
      <filter class="solr.EnglishPorterFilterFactory"/>

  -->
  </analyzer>
  </fieldtype>


Can you try a more standard approach ?

solr.WhitespaceTokenizerFactory
solr.LowerCaseFilterFactory

??

Thanks.


On Mon, Feb 28, 2011 at 2:38 AM, Ahsan |qbal <ahsan.iqbal023@gmail.com> wrote:
> Hi Bill
> Any update..
>
> On Thu, Feb 24, 2011 at 8:58 PM, Ahsan |qbal <ahsan.iqbal023@gmail.com>
> wrote:
>>
>> Hi
>> schema and document are attached.
>>
>> On Thu, Feb 24, 2011 at 8:24 PM, Bill Bell <billnbell@gmail.com> wrote:
>>>
>>> Send schema and document in XML format and I'll look at it
>>>
>>> Bill Bell
>>> Sent from mobile
>>>
>>>
>>> On Feb 24, 2011, at 7:26 AM, "Ahsan |qbal" <ahsan.iqbal023@gmail.com>
>>> wrote:
>>>
>>> > Hi
>>> >
>>> > To narrow down the issue I indexed a single document with one of the
>>> > sample
>>> > queries (given below) which was giving issue.
>>> >
>>> > *"evaluation of loan and lease portfolios for purposes of assessing the
>>> > adequacy of" *
>>> >
>>> > Now when i Perform a search query (*TextContents:"evaluation of loan
>>> > and
>>> > lease portfolios for purposes of assessing the adequacy of"*) the
>>> > parsed
>>> > query is
>>> >
>>> >
>>> > *spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([Contents:evaluation,
>>> > Contents:of], 0, true), Contents:loan], 0, true), Contents:and], 0,
>>> > true),
>>> > Contents:lease], 0, true), Contents:portfolios], 0, true),
>>> > Contents:for], 0,
>>> > true), Contents:purposes], 0, true), Contents:of], 0, true),
>>> > Contents:assessing], 0, true), Contents:the], 0, true),
>>> > Contents:adequacy],
>>> > 0, true), Contents:of], 0, true)*
>>> >
>>> > and search is not successful.
>>> >
>>> > If I remove '*evaluation*' from start OR *'assessing the adequacy of*'
>>> > from
>>> > end it works fine. Issue seems to come on relatively long phrases but I
>>> > have
>>> > not been able to find a pattern and its really mind boggling coz I
>>> > thought
>>> > this issue might be due to large position list but this is a single
>>> > document
>>> > with one phrase. So its definitely not related to size of index.
>>> >
>>> > Any ideas whats going on??
>>> >
>>> > On Thu, Feb 24, 2011 at 10:25 AM, Ahsan |qbal
>>> > <ahsan.iqbal023@gmail.com>wrote:
>>> >
>>> >> Hi
>>> >>
>>> >> It didn't search.. (means no results found even results exist) one
>>> >> observation is that it works well even in the long phrases but when
>>> >> the long
>>> >> phrases contain stop words and same stop word exist two or more time
>>> >> in the
>>> >> phrase then, solr can't search with query parsed in this way.
>>> >>
>>> >>
>>> >> On Wed, Feb 23, 2011 at 11:49 PM, Otis Gospodnetic <
>>> >> otis_gospodnetic@yahoo.com> wrote:
>>> >>
>>> >>> Hi,
>>> >>>
>>> >>> What do you mean by "this doesn't work fine"?  Does it not work
>>> >>> correctly
>>> >>> or is
>>> >>> it slow or ...
>>> >>>
>>> >>> I was going to suggest you look at Surround QP, but it looks like
you
>>> >>> already
>>> >>> did that.  Wouldn't it be better to get Surround QP to work?
>>> >>>
>>> >>> Otis
>>> >>> ----
>>> >>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>> >>> Lucene ecosystem search :: http://search-lucene.com/
>>> >>>
>>> >>>
>>> >>>
>>> >>> ----- Original Message ----
>>> >>>> From: Ahsan |qbal <ahsan.iqbal023@gmail.com>
>>> >>>> To: solr-user@lucene.apache.org
>>> >>>> Sent: Tue, February 22, 2011 10:59:26 AM
>>> >>>> Subject: Question about Nested Span Near Query
>>> >>>>
>>> >>>> Hi All
>>> >>>>
>>> >>>> I had a requirement to implement queries that involves phrase
>>> >>> proximity.
>>> >>>> like user should be able to search "ab cd" w/5 "de fg", both
>>> >>>>  phrases as
>>> >>>> whole should be with in 5 words of each other. For this I  implement
>>> >>>> a
>>> >>> query
>>> >>>> parser that make use of nested span queries, so above query
 would
>>> >>>> be
>>> >>> parsed
>>> >>>> as
>>> >>>>
>>> >>>> spanNear([spanNear([Contents:ab, Contents:cd], 0,  true),
>>> >>>> spanNear([Contents:de, Contents:fg], 0, true)], 5,  false)
>>> >>>>
>>> >>>> Queries like this seems to work really good when phrases are
small
>>> >>>>  but
>>> >>> when
>>> >>>> phrases are large this doesn't work fine. Now my question, Is
there
>>> >>>>  any
>>> >>>> limitation of SpanNearQuery. that we cannot handle large phrases
in
>>> >>> this
>>> >>>> way?
>>> >>>>
>>> >>>> please help
>>> >>>>
>>> >>>> Regards
>>> >>>> Ahsan
>>> >>>>
>>> >>>
>>> >>
>>> >>
>>
>
>

Mime
View raw message