lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <soko...@ifactory.com>
Subject Re: adding attributes to TokenStream
Date Tue, 01 Jan 2013 23:48:22 GMT
Sure ... The frequency count is maintained in the index to enable 
relevance scoring.  You can pull it out using a TermDocs, which 
enumerates this sort of information.  Sorry, I don't have example code 
handy for this.

-Mike


On 1/1/2013 4:24 PM, Itai Peleg wrote:
> That worked great :) thanks a lot for the quick reply!
>
> I have another question - after I "flagged" all my special tokens (in my
> case, the ones that are entities) is there an elegant way of counting how
> many of them I have in a document? I found an ugly way to do that, but I'm
> sure there's a better one.
>
> Thanks in advance,
> Itai
>
>
> 2012/12/31 Michael Sokolov <sokolov@ifactory.com>
>
>> On 12/31/2012 11:39 AM, Itai Peleg wrote:
>>
>>> Hi all,
>>>
>>> Can someone please post a simple example showing how to add additional
>>> attributes to token in a TokenStream (inside IncrementToken for example?).
>>>
>>> I'm working on entity extraction and want to flag specific tokens an
>>> entities, but I'm having problems.
>>>
>>> Thanks in advance,
>>> Itai
>>>
>>>   Here's a simple example of a filter that adds an atytribute saying
>> whether a token is "the"
>>
>> class YourTokenStream extends TokenFilter {
>>    private final YourAttribute att;
>>    private final CharTermAttribute term;
>>    private final TokenStream source;
>>
>>    public YourTokenStream (TokenStream upstream) {
>>       att = addAttribute (YourAttribute.class);
>>       term = addAttribute (CharTermAttribute.class);
>>       source = upstream;
>>    }
>>
>>    public boolean incrementToken () {
>>      if (source.incrementToken()) ?? {
>>        if ("the".equals (new String(term.buffer())) {
>>          att.setIsAnEnglishArticle(**true);
>>          return true;
>>      }
>>      return false;
>>    }
>>
>> }
>>
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message