lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 小鱼儿 <>
Subject Re: Need suggestions on implementing a custom query (offload R-tree filter to fully in-memory) on Lucene-8.3
Date Wed, 04 Dec 2019 09:21:17 GMT
Hi, adrien

As to my native impl. which combines inverted index and R-tree distance
query(index data is fully loaded into memory), i use a bound box to do
filter and then use concise "contains" check to filter, so they are both
"distance query" (or i call it "point nearby query")

I have implemented this custom lucene Query, which filters the POI's in
10KM distance range, and then convert them to a Lucene BitSetIterator, and
test its performance: back to 20ms/1000QPS, retest, increase to
15ms/1400QPS. (doesn't know why), but the initial Lucene's BKD index
performance is only 150ms/130QPS, so this is a big win!

NOTE: I first subclassed the IndexSearher, and overridden the so called
"Low-Level" *search(Query query, Collector results)* method, and thought
Lucene would pass the defractored my custom Query object in. Well, I'm
wrong. But the custom Query subclass method finally works!

But the problem is, why is BKD index supported LatLonPoint.newDistanceQuery
's perf so bad? My 1w8 POIs' index data is only ~7MB on disk, so it's only
in 1 lucene "segment"? When loading them all into memory using mmap codec,
BKD index is stupidly scanning all POI locations? But this is only a guess.

BTW, the text-only query is avg 10ms/2000QPS, at the same level, in my
native in-memory inverted index and Lucene's index.

Adrien Grand <> 于2019年12月4日周三 下午4:14写道:

> Are you sure you are comparing apples to apples? The first paragraph
> mentions a range filter, which would be LatLonPoint#newBoxQuery, but
> then you mentioned LatLonPoint#newDistanceQuery, which is
> significantly more costly due to the need to compute distances.
> If you plan to combine text queries with your geo queries, I'd also
> advise to index both with LatLonPoint and LatLonDocValuesField, and
> then use IndexOrDocValuesQuery at query time. Typically something like
> this:
> ```
> Query textQuery = ...;
> Query latLonPointQuery = LatLonPoint.newBoxQuery("poi", www, xxx, yyy,
> zzz);
> Query latLonDocValuesQuery =
> LatLonDocValuesField.newSlowBoxQuery("poi", www, xxx, yyy, zzz);
> Query poiQuery = new IndexOrDocValuesQuery(latLonPointQuery,
> latLonDocValuesQuery);
> Query query = new BooleanQuery.Builder()
>     .add(textQuery, Occur.MUST)
>     .add(poiQuery, Occur.FILTER)
>     .build();
> ```
> On Wed, Dec 4, 2019 at 5:31 AM 小鱼儿 <> wrote:
> >
> > Background: i need to implement a document indexing and search for
> > POIs(point of interest) under LBS scene. A POI has name, address, and
> > location(LatLonPoint), and i want to combine a text query with a
> > geo-spatial 2d range filter.
> >
> > The problem is, when i first build a native in-memory index which use a
> > simple BitSet as DocIDSet type and STRTree class from the famous JTS
> lib, i
> > get 20ms/1000qps perf metrics with 1w8 POIs on my laptop(Windows 7 x64,
> use
> > mmap codec). But when i use Lucene-8.3 to implement the same
> > functionality(which use LatLonPoint.newDistanceQuery which seems use the
> > default BKD tree index), i only get 150ms/130qps which is a very bad
> > degrade?
> >
> > So my idea is, can i do a custom filter query, which builds a fully
> > in-memory R-tree index to boost the spatial2d range filter performance? I
> > need to access Lucene's internal DocIDSet class so i can do a fast merge
> > with no scoring needed. Hope this will improve the query performance.
> >
> > Any suggestions?
> --
> Adrien
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message