lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <>
Subject Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter
Date Thu, 02 Aug 2012 06:07:38 GMT
On Thu, Aug 2, 2012 at 7:53 AM, roz dev <> wrote:
> Thanks Robert for these inputs.
> Since we do not really Snowball analyzer for this field, we would not use
> it for now. If this still does not address our issue, we would tweak thread
> pool as per eks dev suggestion - I am bit hesitant to do this change yet as
> we would be reducing thread pool which can adversely impact our throughput
> If Snowball Filter is being optimized for Solr 4 beta then it would be
> great for us. If you have already filed a JIRA for this then please let me
> know and I would like to follow it

AFAIK Robert already created and issue here:
and it seems fixed. Given the massive commit last night its already
committed and backported so it will be in 4.0-BETA.

> Thanks again
> Saroj
> On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir <> wrote:
>> On Tue, Jul 31, 2012 at 2:34 PM, roz dev <> wrote:
>> > Hi All
>> >
>> > I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing
>> that
>> > when we are indexing lots of data with 16 concurrent threads, Heap grows
>> > continuously. It remains high and ultimately most of the stuff ends up
>> > being moved to Old Gen. Eventually, Old Gen also fills up and we start
>> > getting into excessive GC problem.
>> Hi: I don't claim to know anything about how tomcat manages threads,
>> but really you shouldnt have all these objects.
>> In general snowball stemmers should be reused per-thread-per-field.
>> But if you have a lot of fields*threads, especially if there really is
>> high thread churn on tomcat, then this could be bad with snowball:
>> see eks dev's comment on
>> I think it would be useful to see if you can tune tomcat's threadpool
>> as he describes.
>> separately: Snowball stemmers are currently really ram-expensive for
>> stupid reasons.
>> each one creates a ton of Among objects, e.g. an EnglishStemmer today
>> is about 8KB.
>> I'll regenerate these and open a JIRA issue: as the snowball code
>> generator in their svn was improved
>> recently and each one now takes about 64 bytes instead (the Among's
>> are static and reused).
>> Still this wont really "solve your problem", because the analysis
>> chain could have other heavy parts
>> in initialization, but it seems good to fix.
>> As a workaround until then you can also just use the "good old
>> PorterStemmer" (PorterStemFilterFactory in solr).
>> Its not exactly the same as using Snowball(English) but its pretty
>> close and also much faster.
>> --

View raw message