lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: Stemming with SOLR
Date Fri, 16 Dec 2016 02:46:06 GMT
If you need the full fidelity solution taking care of multiple
edge-cases, it could be worth looking at commercial solutions.


http://www.basistech.com/ has one, including a free-level SAAS plan.

Regards,
   Alex.
----
http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 15 December 2016 at 21:28, Lasitha Wattaladeniya <wattale@gmail.com> wrote:
> Hi all,
>
> Thanks for the replies,
>
> @eric, ahmet : since those stemmers are logical stemmers it won't work on
> words such as caught, ran and so on. So in our case it won't work
>
> @susheel : Yes I thought about it but problems we have is, the documents we
> index are some what large text, so copy fielding these into duplicate
> fields will affect on the index time ( we have jobs to index data
> periodically) and query time. I wonder why there isn't a correct solution
> to this
>
> Regards,
> Lasitha
>
> Lasitha Wattaladeniya
> Software Engineer
>
> Mobile : +6593896893
> Blog : techreadme.blogspot.com
>
> On Fri, Dec 16, 2016 at 12:58 AM, Susheel Kumar <susheel2777@gmail.com>
> wrote:
>
>> We did extensive comparison in the past for Snowball, KStem and Hunspell
>> and there are cases where one of them works better but not other or
>> vice-versa. You may utilise all three of them by having 3 different fields
>> (fieldTypes) and during query, search in all of them.
>>
>> For some of the cases where none of them works (e.g wolves, wolf etc)., use
>> StemOverriderFactory.
>>
>> HTH.
>>
>> Thanks,
>> Susheel
>>
>> On Thu, Dec 15, 2016 at 11:32 AM, Ahmet Arslan <iorixxx@yahoo.com.invalid>
>> wrote:
>>
>> > Hi,
>> >
>> > KStemFilter returns legitimate English words, please use it.
>> >
>> > Ahmet
>> >
>> >
>> >
>> > On Thursday, December 15, 2016 6:17 PM, Lasitha Wattaladeniya <
>> > wattale@gmail.com> wrote:
>> > Hello devs,
>> >
>> > I'm trying to develop this indexing and querying flow where it converts
>> the
>> > words to its original form (lemmatization). I was doing bit of research
>> > lately but the information on the internet is very limited. I tried using
>> > hunspellfactory but it doesn't convert the word to it's original form,
>> > instead it gives suggestions for some words (hunspell works for some
>> > english words correctly but for some it gives multiple suggestions or no
>> > suggestions, i used the en_us.dic provided by openoffice)
>> >
>> > I know this is a generic problem in searching, so is there anyone who can
>> > point me to correct direction or some information :)
>> >
>> > Best regards,
>> > Lasitha Wattaladeniya
>> > Software Engineer
>> >
>> > Mobile : +6593896893
>> > Blog : techreadme.blogspot.com
>> >
>>

Mime
View raw message