lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <ysee...@gmail.com>
Subject Re: WordDelimiterFilter looses position increments of tokens
Date Wed, 05 Jul 2006 14:27:49 GMT
On 7/5/06, Yonik Seeley <yseeley@gmail.com> wrote:
> So fixing the first token at the end of next() and also at the other
> exit point (line 276) is probably the easiest fix.

Something like this I suppose:

Index: src/java/org/apache/solr/analysis/WordDelimiterFilter.java
===================================================================
--- src/java/org/apache/solr/analysis/WordDelimiterFilter.java	(revision 417024)
+++ src/java/org/apache/solr/analysis/WordDelimiterFilter.java	(working copy)
@@ -170,6 +170,7 @@
     // Would it actually be faster to check for the common form
     // of isLetter() isLower()*, and then backtrack if it doesn't match?

+    int origPosOffset;
     while(true) {
       Token t = input.next();
       if (t == null) return null;
@@ -180,6 +181,8 @@
       int end=s.length();
       if (end==0) continue;

+      origPosOffset = t.getPositionIncrement();
+
       // Avoid calling charType more than once for each char (basically
       // avoid any backtracking).
       // makes code slightly more difficult, but faster.
@@ -273,6 +276,7 @@
             // optimization... if this is the only token,
             // return it immediately.
             if (queue.size()==0) {
+              newtok.setPositionIncrement(origPosOffset);
               return newtok;
             }

@@ -376,7 +380,9 @@
     // System.out.println("##########AFTER COMBINATIONS:"+ str(queue));

     queuePos=1;
-    return queue.get(0);
+    Token tok = queue.get(0);
+    tok.setPositionIncrement(origPosOffset);
+    return tok;
   }





-Yonik

Mime
View raw message