lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Omit positions but not TF
Date Sun, 08 Nov 2009 00:47:21 GMT

During one of discussions at ApacheCon it occurred to me that it would 
be useful to have an option to discard positional information but still 
keep the term frequency. Even though position-dependent queries wouldn't 
work then, still any other queries would work fine and we would get the 
right scoring.

I believe it should be possible to do this without changing the file 
format, if we used a negative term frequency for terms without postings 
- we would have to check for that condition in SegmentTermDocs, change 
the flags there and flip the sign of docFreq. And eventually we may want 
to add a separate flag for this and bump the format version.

Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration  Contact: info at sigram dot com

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message