lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomás Fernández Löbbe <tomasflo...@gmail.com>
Subject Re: Edgengram
Date Tue, 31 May 2011 16:24:27 GMT
...or also use the LowerCaseTokenizerFactory at query time for consistency,
but not the edge ngram filter.

2011/5/31 Tomás Fernández Löbbe <tomasflobbe@gmail.com>

> Hi Brian, I don't know if I understand what you are trying to achieve. You
> want the term query "abcdefg" to have an idf of 1 insead of 7? I think using
> the KeywordTokenizerFilterFactory at query time should work. I would be
> something like:
>
> <fieldType name="edgengram" class="solr.TextField"
> positionIncrementGap="1000">
>   <analyzer type="index">
>
>     <tokenizer class="solr.LowerCaseTokenizerFactory" />
>     <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> maxGramSize="25" side="front" />
>   </analyzer>
>   <analyzer type="query">
>   <tokenizer class="solr.KeywordTokenizerFactory" />
>   </analyzer>
> </fieldType>
>
> this way, at query time "abcdefg" won't be turned to "a ab abc abcd abcde
> abcdef abcdefg". At index time it will.
>
> Regards,
> Tomás
>
>
> On Tue, May 31, 2011 at 1:07 PM, Brian Lamb <brian.lamb@journalexperts.com
> > wrote:
>
>> <fieldType name="edgengram" class="solr.TextField"
>> positionIncrementGap="1000">
>>   <analyzer>
>>     <tokenizer class="solr.LowerCaseTokenizerFactory" />
>>     <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
>> maxGramSize="25" side="front" />
>>   </analyzer>
>> </fieldType>
>>
>> I believe I used that link when I initially set up the field and it worked
>> great (and I'm still using it in other places). In this particular example
>> however it does not appear to be practical for me. I mentioned that I have
>> a
>> similarity class that returns 1 for the idf and in the case of an
>> edgengram,
>> it returns 1 * length of the search string.
>>
>> Thanks,
>>
>> Brian Lamb
>>
>> On Tue, May 31, 2011 at 11:34 AM, bmdakshinamurthy@gmail.com <
>> bmdakshinamurthy@gmail.com> wrote:
>>
>> > Can you specify the analyzer you are using for your queries?
>> >
>> > May be you could use a KeywordAnalyzer for your queries so you don't end
>> up
>> > matching parts of your query.
>> >
>> >
>> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
>> > This should help you.
>> >
>> > On Tue, May 31, 2011 at 8:24 PM, Brian Lamb
>> > <brian.lamb@journalexperts.com>wrote:
>> >
>> > > In this particular case, I will be doing a solr search based on user
>> > > preferences. So I will not be depending on the user to type "abcdefg".
>> > That
>> > > will be automatically generated based on user selections.
>> > >
>> > > The contents of the field do not contain spaces and since I am created
>> > the
>> > > search parameters, case isn't important either.
>> > >
>> > > Thanks,
>> > >
>> > > Brian Lamb
>> > >
>> > > On Tue, May 31, 2011 at 9:44 AM, Erick Erickson <
>> erickerickson@gmail.com
>> > > >wrote:
>> > >
>> > > > That'll work for your case, although be aware that string types
>> aren't
>> > > > analyzed at all,
>> > > > so case matters, as do spaces etc.....
>> > > >
>> > > > What is the use-case here? If you explain it a bit there might be
>> > > > better answers....
>> > > >
>> > > > Best
>> > > > Erick
>> > > >
>> > > > On Fri, May 27, 2011 at 9:17 AM, Brian Lamb
>> > > > <brian.lamb@journalexperts.com> wrote:
>> > > > > For this, I ended up just changing it to string and using
>> "abcdefg*"
>> > to
>> > > > > match. That seems to work so far.
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Brian Lamb
>> > > > >
>> > > > > On Wed, May 25, 2011 at 4:53 PM, Brian Lamb
>> > > > > <brian.lamb@journalexperts.com>wrote:
>> > > > >
>> > > > >> Hi all,
>> > > > >>
>> > > > >> I'm running into some confusion with the way edgengram works.
I
>> have
>> > > the
>> > > > >> field set up as:
>> > > > >>
>> > > > >> <fieldType name="edgengram" class="solr.TextField"
>> > > > >> positionIncrementGap="1000">
>> > > > >>    <analyzer>
>> > > > >>      <tokenizer class="solr.LowerCaseTokenizerFactory"
/>
>> > > > >>        <filter class="solr.EdgeNGramFilterFactory"
>> minGramSize="1"
>> > > > >> maxGramSize="100" side="front" />
>> > > > >>    </analyzer>
>> > > > >> </fieldType>
>> > > > >>
>> > > > >> I've also set up my own similarity class that returns 1 as
the
>> idf
>> > > > score.
>> > > > >> What I've found this does is if I match a string "abcdefg"
>> against a
>> > > > field
>> > > > >> containing "abcdefghijklmnop", then the idf will score that
as a
>> 7:
>> > > > >>
>> > > > >> 7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2
>> > abcdefg=2)
>> > > > >>
>> > > > >> I get why that's happening, but is there a way to avoid that?
Do
>> I
>> > > need
>> > > > to
>> > > > >> do a new field type to achieve the desired affect?
>> > > > >>
>> > > > >> Thanks,
>> > > > >>
>> > > > >> Brian Lamb
>> > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Thanks and Regards,
>> > DakshinaMurthy BM
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message