lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Allison <talli...@apache.org>
Subject Re: Memory Leak in 7.3 to 7.4
Date Mon, 06 Aug 2018 14:58:51 GMT
+1 to Shawn's and Erick's points about isolating Tika in a separate jvm.

Y, please do let us know:  user@tika.apache.org  We might be able to
help out, and you, in turn, can help the community figure out what's
going on; see e.g.: https://issues.apache.org/jira/browse/TIKA-2703
On Sun, Aug 5, 2018 at 1:22 PM Shawn Heisey <apache@elyograg.org> wrote:
>
> On 8/2/2018 5:30 AM, Thomas Scheffler wrote:
> > my final verdict is the upgrade to Tika 1.17. If I downgrade the libraries just
for tika back to 1.16 and keep the rest of SOLR 7.4.0 the heap usage after about 85 % of the
index process and manual trigger of the garbage collector is about 60-70 MB (That low!!!)
> >
> > My problem now is that we have several setups that triggers this reliably but there
is no simple test case that „fails“ if Tika 1.17 or 1.18 is used. I also do not know if
the error is inside Tika or inside the glue code that makes Tika usable in SOLR.
>
> If downgrading Tika fixes the issue, then it doesn't seem (to me) very
> likely that Solr's glue code for ERH has a problem. If it's not Solr's
> code that has the problem, there will be nothing we can do about it
> other than change the Tika library included with Solr.
>
> Before filing an issue, you should discuss this with the Tika project on
> their mailing list.  They'll want to make sure that they can fix the
> problem in a future version.  It might not be an actual memory leak ...
> it could just be that one of the documents you're trying to index is one
> that Tika requires a huge amount of memory to handle.  But it could be a
> memory leak.
>
> If you know which document is being worked on when it runs out of
> memory, can you try not including that document in your indexing, to see
> if it still has a problem?
>
> Please note that it is strongly recommended that you do not use the
> Extracting Request Handler in production.  Tika is prone to many
> problems, and those problems will generally affect Solr if Tika is being
> run inside Solr.  Because of this, it is recommended that you write a
> separate program using Tika that handles extracting information from
> documents and sending that data to Solr.  If that program crashes, Solr
> remains operational.
>
> There is already an issue to upgrade Tika to the latest version in Solr,
> but you've said that you tried 1.18 already with no change to the
> problem.  So whatever the problem is, it will need to be solved in 1.19
> or later.
>
> Thanks,
> Shawn
>

Mime
View raw message