mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: Replacement for DefaultAnalyzer
Date Mon, 11 May 2015 21:12:45 GMT
I found Mike's blog post regarding Lucene 4.X from a while ago [0].
In the* '*Other Changes*'* section Mike states "Analyzers must always
provide a reusable token stream, by implementing the
Analyzer.createComponents method (reusableTokenStream has been removed and
tokenStream is now final, in Analzyer)."
This provides a good bit ore context therefore I'm going to continue on
createComponents route with the aim of implementing the newer 4.X Lucene
API.
In the meantime, if you get any updated or have a code sample it would be
very much appreciated.
Thanks
Lewis

[0]
http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html

On Mon, May 11, 2015 at 2:03 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Hi Suneel,
>
> On Sat, May 9, 2015 at 11:21 AM, Suneel Marthi <smarthi@apache.org> wrote:
>
>> Mahout 0.9 and 0.10.0 are using Lucene 4.6.1. There's been a change in the
>> TokenStream workflow in Lucene post-Lucene 4.5.
>>
>
> Yes I know that after looking into the codebase. Thanks for clarifying!
>
>
>>
>> What exactly are u trying to do and where is it u r stuck now? It would
>> help if u posted a code snippet or something.
>>
>>
> In particular I am working on the following implementation [0] which uses
> the following code
>
> TokenStream stream = analyzer.reusableTokenStream(key.toString(), new
> StringReader(sContent.toString()));
>
> Of note here is that the analyzer object is instantiated as of type
> DefaultAnalyzer [1]. It is further noted that the analyzer.reusableTokenStream
> API is deprecated as you've noted so I am just wondering what the suggested
> API semantics are in order to achieve the desired upgrade.
> Thanks in advance again for any input.
> Lewis
>
> [0]
> https://github.com/DigitalPebble/behemoth/blob/master/mahout/src/main/java/com/digitalpebble/behemoth/mahout/LuceneTokenizerMapper.java#L52-L53
> [1]
> http://svn.apache.org/repos/asf/mahout/tags/mahout-0.7/core/src/main/java/org/apache/mahout/vectorizer/DefaultAnalyzer.java
>
>
>



-- 
*Lewis*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message