lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "T. Kuro Kurosaka" <>
Subject Re: Lucene 4 - POS and Syntactic Tagging
Date Tue, 10 Apr 2012 06:57:43 GMT
Please disregard this suggestion. It is a bad idea. Almost every text 
would have a verb, noun, etc. so search on a pos tag only field won't 
make sense.  Maybe the parallel field should have a lemma (dictionary 
form) plus part-of-speech tag putting together as a token like 
"like_verb", "lemming_propernoun"?

On 4/9/12 1:10 PM, T. Kuro Kurosaka wrote:
> If you want to search on part-of-speech tag, I'd just make a parallel 
> field ("text_pos" for the field "text", for example) and search on 
> that field (text_pos:noun).
> Kuro
> On 3/14/12 9:37 AM, Mark McGuire wrote:
>> I'm working on a project where I need to tag both the part of speech 
>> and other syntactic information on tokens so that this information is 
>> searchable.  I have read the threads on the mailing list regarding 
>> part of speech tagging here 
>> <>

>> and the many responses to similar questions.  To me, inserting 0 
>> increment tokens seems rather clunky, especially when TypeAttributes 
>> appear to be what one would want to use.  Does Lucene do anything 
>> extra when the Type is set to or not set to its default, "word"?  Is 
>> it possible to write a search that uses multiple attributes from 
>> TokenAttributes (ie a search that searches for CharTermAttribute 
>> "dog" followed by a TypeAttribute of verb)?
>> Also if I were to use 0 increment tokens for tagging, would data like 
>> document length or sumTotalTermFreq be different from a document 
>> indexed without these tags?  How would I counteract these differences 
>> if any occur?
>> Thanks,
>> Mark McGuire

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message