lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Wartes (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-8944) Improve geospatial garbage generation
Date Tue, 05 Apr 2016 19:15:25 GMT

    [ https://issues.apache.org/jira/browse/SOLR-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15226948#comment-15226948
] 

Jeff Wartes commented on SOLR-8944:
-----------------------------------

I hadn't refreshed and didn't see this comment before I added mine, but thanks for the info,
I appreciate the references and context. I'll take a look at what would be involved with DocIdSetBuilder.

I also feel like I should mention though, that class will be the third case of a hardcoded
magic fraction of maxDoc I've come across in the context of investigating allocations this
last week. It might be worth considering whether the gyrations around avoiding the creation
of these BitSets is more or less complicated than managing a pool would be.

> Improve geospatial garbage generation
> -------------------------------------
>
>                 Key: SOLR-8944
>                 URL: https://issues.apache.org/jira/browse/SOLR-8944
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Jeff Wartes
>              Labels: spatialrecursiveprefixtreefieldtype
>
> I’ve been continuing some analysis into JVM garbage sources in my Solr index. (5.4,
86M docs/core, 56k 99.9th percentile hit count with my query corpus)
> After applying SOLR-8922, I find my biggest source of garbage by a literal order of magnitude
(by size) is the long[] allocated by FixedBitSet. From the backtraces, it appears the biggest
source of FixBitSet creation in my case (by two orders of magnitude) is my use of queries
that involve geospatial filtering.
> Specifically, IntersectsPrefixTreeQuery.getDocIdSet, here:
> https://github.com/apache/lucene-solr/blob/569b6ca9ca439ee82734622f35f6b6342c0e9228/lucene/spatial-extras/src/java/org/apache/lucene/spatial/prefix/IntersectsPrefixTreeQuery.java#L60
> Has this been considered for optimization? I can think of a few paths:
> 1. Persistent Object pools - FixedBitSet size is allocated based on maxDoc, which presumably
changes less frequently than queries are issued. If an existing FixedBitSet were not available
from a pool, the worst case (create a new one) would be no worse than the current behavior.
The complication would be enforcement around when to return the object to the pool, but it
looks like this has some lifecycle hooks already.
> 2. I note that a thing called a SparseFixedBitSet already exists, and puts considerable
effort into allocating smaller chunks only as necessary. Is this not usable for this purpose?
How significant is the performance difference?
> I'd be happy to spend some time on a patch, but I was hoping for a little more data around
the current choices before choosing an approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message