lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-7603) Support Graph Token Streams in QueryBuilder
Date Fri, 30 Dec 2016 17:26:58 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15788036#comment-15788036
] 

ASF GitHub Bot commented on LUCENE-7603:
----------------------------------------

Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/129#discussion_r94243010
  
    --- Diff: lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java
---
    @@ -210,85 +199,41 @@ private void finish() {
        */
       private void finish(int maxDeterminizedStates) {
         Automaton automaton = builder.finish();
    -
    -    // System.out.println("before det:\n" + automaton.toDot());
    -
    -    Transition t = new Transition();
    -
    -    // TODO: should we add "eps back to initial node" for all states,
    -    // and det that?  then we don't need to revisit initial node at
    -    // every position?  but automaton could blow up?  And, this makes it
    -    // harder to skip useless positions at search time?
    -
    -    if (anyTermID != -1) {
    -
    -      // Make sure there are no leading or trailing ANY:
    -      int count = automaton.initTransition(0, t);
    -      for (int i = 0; i < count; i++) {
    -        automaton.getNextTransition(t);
    -        if (anyTermID >= t.min && anyTermID <= t.max) {
    -          throw new IllegalStateException("automaton cannot lead with an ANY transition");
    -        }
    -      }
    -
    -      int numStates = automaton.getNumStates();
    -      for (int i = 0; i < numStates; i++) {
    -        count = automaton.initTransition(i, t);
    -        for (int j = 0; j < count; j++) {
    -          automaton.getNextTransition(t);
    -          if (automaton.isAccept(t.dest) && anyTermID >= t.min &&
anyTermID <= t.max) {
    -            throw new IllegalStateException("automaton cannot end with an ANY transition");
    -          }
    -        }
    -      }
    -
    -      int termCount = termToID.size();
    -
    -      // We have to carefully translate these transitions so automaton
    -      // realizes they also match all other terms:
    -      Automaton newAutomaton = new Automaton();
    -      for (int i = 0; i < numStates; i++) {
    -        newAutomaton.createState();
    -        newAutomaton.setAccept(i, automaton.isAccept(i));
    -      }
    -
    -      for (int i = 0; i < numStates; i++) {
    -        count = automaton.initTransition(i, t);
    -        for (int j = 0; j < count; j++) {
    -          automaton.getNextTransition(t);
    -          int min, max;
    -          if (t.min <= anyTermID && anyTermID <= t.max) {
    -            // Match any term
    -            min = 0;
    -            max = termCount - 1;
    -          } else {
    -            min = t.min;
    -            max = t.max;
    -          }
    -          newAutomaton.addTransition(t.source, t.dest, min, max);
    -        }
    -      }
    -      newAutomaton.finishState();
    -      automaton = newAutomaton;
    -    }
    -
         det = Operations.removeDeadStates(Operations.determinize(automaton, maxDeterminizedStates));
       }
     
    -  private int getTermID(BytesRef term) {
    -    Integer id = termToID.get(term);
    -    if (id == null) {
    -      id = termToID.size();
    -      if (term != null) {
    -        term = BytesRef.deepCopyOf(term);
    -      }
    -      termToID.put(term, id);
    +  /**
    +   * Gets an integer id for a given term.
    +   *
    +   * If there is no position gaps for this token then we can reuse the id for the same
term if it appeared at another
    +   * position without a gap.  If we have a position gap generate a new id so we can keep
track of the position
    +   * increment.
    +   */
    +  private int getTermID(int incr, int prevIncr, BytesRef term) {
    +    assert term != null;
    +    boolean isStackedGap = incr == 0 && prevIncr > 1;
    +    boolean hasGap = incr > 1;
    +    term = BytesRef.deepCopyOf(term);
    --- End diff --
    
    The deepCopyOf is only needed if you generate a new ID, not for an existing one.  
    
    BTW... have you seen BytesRefHash?  I think re-using that could minimize the code here
to deal with this stuff.


> Support Graph Token Streams in QueryBuilder
> -------------------------------------------
>
>                 Key: LUCENE-7603
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7603
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/queryparser, core/search
>            Reporter: Matt Weber
>
> With [LUCENE-6664|https://issues.apache.org/jira/browse/LUCENE-6664] we can use multi-term
synonyms query time.  A "graph token stream" will be created which which is nothing more than
using the position length attribute on stacked tokens to indicate how many positions a token
should span.  Currently the position length attribute on tokens is ignored during query parsing.
 This issue will add support for handling these graph token streams inside the QueryBuilder
utility class used by query parsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message