lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "T. Kuro Kurosaka" <>
Subject Re: Lucene 4 - POS and Syntactic Tagging
Date Mon, 09 Apr 2012 20:10:02 GMT
If you want to search on part-of-speech tag, I'd just make a parallel 
field ("text_pos" for the field "text", for example) and search on that 
field (text_pos:noun).


On 3/14/12 9:37 AM, Mark McGuire wrote:
> I'm working on a project where I need to tag both the part of speech 
> and other syntactic information on tokens so that this information is 
> searchable.  I have read the threads on the mailing list regarding 
> part of speech tagging here 
> <>

> and the many responses to similar questions.  To me, inserting 0 
> increment tokens seems rather clunky, especially when TypeAttributes 
> appear to be what one would want to use.  Does Lucene do anything 
> extra when the Type is set to or not set to its default, "word"?  Is 
> it possible to write a search that uses multiple attributes from 
> TokenAttributes (ie a search that searches for CharTermAttribute "dog" 
> followed by a TypeAttribute of verb)?
> Also if I were to use 0 increment tokens for tagging, would data like 
> document length or sumTotalTermFreq be different from a document 
> indexed without these tags?  How would I counteract these differences 
> if any occur?
> Thanks,
> Mark McGuire

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message