lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Rowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-6697) Use 1D KD tree for alternative to postings based numeric range filters
Date Fri, 14 Aug 2015 13:33:46 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-6697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696999#comment-14696999
] 

Steve Rowe commented on LUCENE-6697:
------------------------------------

Not sure if this warrants reopen or a new issue given the 5.3 release in process, so just
leaving a comment:

My Jenkins found a failing seed on branch_5x w/Java8 for {{TestRangeTree.testAllEqual()}}
that reproduces rarely ([http://jenkins.sarowe.net/job/Lucene-Solr-tests-5.x-Java8/1288/])
- beasting triggered it reliably for me once or twice on the first round with 10 dups, on
the same box:

{noformat}
 [beaster] Beast round: 1
   [junit4] Duplicate suite name used with XML reports: org.apache.lucene.rangetree.TestRangeTree.
This may confuse tools that process XML reports. Set 'ignoreDuplicateSuites' to true to skip
this message.
  [beaster] Executing 10 suites with 10 JVMs.
  [beaster] 
  [beaster] Started J0 PID(12439@localhost).
  [beaster] Started J2 PID(12438@localhost).
  [beaster] Started J5 PID(12398@localhost).
  [beaster] Started J6 PID(12409@localhost).
  [beaster] Started J8 PID(12414@localhost).
  [beaster] Started J9 PID(12437@localhost).
  [beaster] Started J4 PID(12402@localhost).
  [beaster] Started J3 PID(12430@localhost).
  [beaster] Started J7 PID(12400@localhost).
  [beaster] Started J1 PID(12397@localhost).
  [beaster]   2> Aug 14, 2015 9:24:01 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
uncaughtException
  [beaster]   2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestRangeTree]
  [beaster]   2> java.lang.AssertionError: T0: iter=2 id=11552 docID=11478 value=-4284574795410342043
(range: -7914835514817223401 TO -3409576459183210390) expected true but got: false deleted?=false
query=NumericRangeTreeQuery:field=sn_value:[-7914835514817223401 TO -3409576459183210390]
  [beaster]   2>        at __randomizedtesting.SeedInfo.seed([A757E5A079F28A1B]:0)
  [beaster]   2>        at org.junit.Assert.fail(Assert.java:93)
  [beaster]   2>        at org.apache.lucene.rangetree.TestRangeTree$4._run(TestRangeTree.java:502)
  [beaster]   2>        at org.apache.lucene.rangetree.TestRangeTree$4.run(TestRangeTree.java:433)
  [beaster]   2> 
  [beaster]   2> NOTE: reproduce with: ant test  -Dtestcase=TestRangeTree -Dtests.method=testAllEqual
-Dtests.seed=A757E5A079F28A1B -Dtests.slow=true -Dtests.locale=de_GR -Dtests.timezone=Canada/Eastern
-Dtests.asserts=true -Dtests.file.encoding=UTF-8
  [beaster] [09:23:57.007] ERROR   6.34s J5  | TestRangeTree.testAllEqual <<<
  [beaster]    > Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError:
Captured an uncaught exception in thread: Thread[id=22, name=T0, state=RUNNABLE, group=TGRP-TestRangeTree]
  [beaster]    >        at __randomizedtesting.SeedInfo.seed([A757E5A079F28A1B:75B730E5DA2C13B5]:0)
  [beaster]    > Caused by: java.lang.AssertionError: T0: iter=2 id=11552 docID=11478 value=-4284574795410342043
(range: -7914835514817223401 TO -3409576459183210390) expected true but got: false deleted?=false
query=NumericRangeTreeQuery:field=sn_value:[-7914835514817223401 TO -3409576459183210390]
  [beaster]    >        at __randomizedtesting.SeedInfo.seed([A757E5A079F28A1B]:0)
  [beaster]    >        at org.junit.Assert.fail(Assert.java:93)
  [beaster]    >        at org.apache.lucene.rangetree.TestRangeTree$4._run(TestRangeTree.java:502)
  [beaster]    >        at org.apache.lucene.rangetree.TestRangeTree$4.run(TestRangeTree.java:433)
  [beaster]   2> NOTE: test params are: codec=Asserting(Lucene53): {}, docValues:{}, sim=DefaultSimilarity,
locale=de_GR, timezone=Canada/Eastern
  [beaster]   2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_45 (64-bit)/cpus=16,threads=1,free=471275456,total=522715136
  [beaster]   2> NOTE: All tests run in this JVM: [TestRangeTree]
  [beaster]   2> Aug 14, 2015 9:24:02 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
uncaughtException
  [beaster]   2> WARNING: Uncaught exception in thread: Thread[T2,5,TGRP-TestRangeTree]
  [beaster]   2> java.lang.AssertionError: T2: iter=3 id=10830 docID=10756 value=-4284574795410342043
(range: -6351025883581717437 TO 7103712746369706722) expected true but got: false deleted?=false
query=NumericRangeTreeQuery:field=sn_value:{-6351025883581717437 TO 7103712746369706722]
  [beaster]   2>        at __randomizedtesting.SeedInfo.seed([A757E5A079F28A1B]:0)
  [beaster]   2>        at org.junit.Assert.fail(Assert.java:93)
  [beaster]   2>        at org.apache.lucene.rangetree.TestRangeTree$4._run(TestRangeTree.java:502)
  [beaster]   2>        at org.apache.lucene.rangetree.TestRangeTree$4.run(TestRangeTree.java:433)
  [beaster]   2> 
  [beaster]   2> NOTE: reproduce with: ant test  -Dtestcase=TestRangeTree -Dtests.method=testAllEqual
-Dtests.seed=A757E5A079F28A1B -Dtests.slow=true -Dtests.locale=de_GR -Dtests.timezone=Canada/Eastern
-Dtests.asserts=true -Dtests.file.encoding=UTF-8
  [beaster] [09:23:57.229] ERROR   7.52s J6  | TestRangeTree.testAllEqual <<<
  [beaster]    > Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError:
Captured an uncaught exception in thread: Thread[id=24, name=T2, state=RUNNABLE, group=TGRP-TestRangeTree]
  [beaster]    >        at __randomizedtesting.SeedInfo.seed([A757E5A079F28A1B:75B730E5DA2C13B5]:0)
  [beaster]    > Caused by: java.lang.AssertionError: T2: iter=3 id=10830 docID=10756 value=-4284574795410342043
(range: -6351025883581717437 TO 7103712746369706722) expected true but got: false deleted?=false
query=NumericRangeTreeQuery:field=sn_value:{-6351025883581717437 TO 7103712746369706722]
  [beaster]    >        at __randomizedtesting.SeedInfo.seed([A757E5A079F28A1B]:0)
  [beaster]    >        at org.junit.Assert.fail(Assert.java:93)
  [beaster]    >        at org.apache.lucene.rangetree.TestRangeTree$4._run(TestRangeTree.java:502)
  [beaster]    >        at org.apache.lucene.rangetree.TestRangeTree$4.run(TestRangeTree.java:433)
  [beaster]   2> NOTE: test params are: codec=Asserting(Lucene53): {}, docValues:{}, sim=DefaultSimilarity,
locale=de_GR, timezone=Canada/Eastern
  [beaster]   2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_45 (64-bit)/cpus=16,threads=1,free=462995928,total=522715136
  [beaster]   2> NOTE: All tests run in this JVM: [TestRangeTree]
  [beaster] 
  [beaster] Tests with failures:
  [beaster]   - org.apache.lucene.rangetree.TestRangeTree.testAllEqual
  [beaster]   - org.apache.lucene.rangetree.TestRangeTree.testAllEqual
{noformat}

> Use 1D KD tree for alternative to postings based numeric range filters
> ----------------------------------------------------------------------
>
>                 Key: LUCENE-6697
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6697
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 5.3, Trunk
>
>         Attachments: LUCENE-6697.patch, LUCENE-6697.patch, LUCENE-6697.patch
>
>
> Today Lucene uses postings to index a numeric value at multiple
> precision levels for fast range searching.  It's somewhat costly: each
> numeric value is indexed with multiple terms (4 terms by default)
> ... I think a dedicated 1D BKD tree should be more compact and perform
> better.
> It should also easily generalize beyond 64 bits to arbitrary byte[],
> e.g. for LUCENE-5596, but I haven't explored that here.
> A 1D BKD tree just sorts all values, and then indexes adjacent leaf
> blocks of size 512-1024 (by default) values per block, and their
> docIDs, into a fully balanced binary tree.  Building the range filter
> is then just a recursive walk through this tree.
> It's the same structure we use for 2D lat/lon BKD tree, just with 1D
> instead.  I implemented it as a DocValuesFormat that also writes the
> numeric tree on the side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message