manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Priya Arora <pr...@smartshore.nl>
Subject Re: Manifoldcf server Error
Date Fri, 20 Dec 2019 10:07:58 GMT
Hi Markus,

Many thanks for your reply!!.

I tried this approach to reproduce the scenario in a different environment,
but the case  where I listed the error above is when I am crawling INTRANET
sites which can be accessible over a remote server. Also I have used
Transformation connectors:-Allow Documents, Tika Parser, Content Limiter(
10000000), Metadata Adjuster.

When tried reproducing the error with Public sites of the same domain and
on a different server(DEV), it was successful, with no error.Also there was
no any postgres related error.

Can it depends observer related configurations like Firewall etc, as this
case include some firewall,security related configurations.

Thanks
Priya




On Fri, Dec 20, 2019 at 3:23 PM Markus Schuch <markus_schuch@web.de> wrote:

> Hi Priya,
>
> in my experience, i would focus on the OutOfMemoryError (OOME).
> 8 Gigs can be enough, but they don't have to.
>
> At first i would check if the jvm is really getting the desired heap
> size. The dockered environment make that a little harder find find out,
> since you need to get access to the jvm metrics, e.g. via jmxremote.
> Beeing able to monitor the jvm metrics helps you with correlating the
> errors with the heap and garbage collection activity.
>
> The errors you see on postgresql jdbc driver might be very related to
> the OOME.
>
> Some question i would ask myself:
>
> Do the problems repeatingly occur only when crawling this specific
> content source or only with this specific output connection? Can you
> reproduce it outside of docker in a controlled dev environment? Or is it
> a more general problem with your manifoldcf instance?
>
> May be there are some huge files beeing crawled in your content source?
> To you have any kind of transformations configured? (e.g. content size
> limit?) You should try to see in the job's history if there are any
> patterns, like the error rises always after encountering the same
> document xy.
>
> Cheers
> Markus
>
>
>
> Am 20.12.2019 um 09:59 schrieb Priya Arora:
> > Hi  Markus ,
> >
> > Heap size defined is 8GB. Manifoldcf start-options-unix file  Xmx etc
> > parameters is defined to have memory 8192mb.
> >
> > It seems to be an issue with memory also, and also when manifoldcf tries
> > to communicate to Database. Do you explicitly define somewhere
> > connection timer when to communicate to postgres.
> > Postgres is installed as a part of docker image pull and then some
> > changes in properties.xml(of manifoldcf) to connect to database.
> > On the other hand Elastic search is also holding sufficient memory and
> > Manifoldcf is also provided with 8 cores CPU.
> >
> > Can you suggest some solution.
> >
> > Thanks
> > Priya
> >
> > On Fri, Dec 20, 2019 at 2:23 PM Markus Schuch <markus_schuch@web.de
> > <mailto:markus_schuch@web.de>> wrote:
> >
> >     Hi Priya,
> >
> >     your manifoldcf JVM suffers from high garbage collection pressure:
> >
> >         java.lang.OutOfMemoryError: GC overhead limit exceeded
> >
> >     What is your current heap size?
> >     Without knowing that, i suggest to increase the heap size. (java
> >     -Xmx...)
> >
> >     Cheers,
> >     Markus
> >
> >     Am 20.12.2019 um 09:02 schrieb Priya Arora:
> >     > Hi All,
> >     >
> >     > I am facing below error while accessing Manifoldcf. Requirement is
> to
> >     > crawl data from a website using Repository as "Web" and Output
> >     connector
> >     > as "Elastic Search"
> >     > Manifoldcf is configured inside a docker container and also
> >     postgres is
> >     > used a docker container.
> >     > When launching manifold getting below error
> >     > image.png
> >     >
> >     > When checked logs:-
> >     > *1)sudo docker exec -it 0b872dfafc5c tail -1000
> >     > /usr/share/manifoldcf/example/logs/manifoldcf.log*
> >     > FATAL 2019-12-20T06:06:13,176 (Stuffer thread) - Error tossed:
> Timer
> >     > already cancelled.
> >     > java.lang.IllegalStateException: Timer already cancelled.
> >     >         at java.util.Timer.sched(Timer.java:397) ~[?:1.8.0_232]
> >     >         at java.util.Timer.schedule(Timer.java:193) ~[?:1.8.0_232]
> >     >         at
> >     >
> org.postgresql.jdbc.PgConnection.addTimerTask(PgConnection.java:1113)
> >     > ~[postgresql-42.1.3.jar:42.1.3]
> >     >         at
> >     > org.postgresql.jdbc.PgStatement.startTimer(PgStatement.java:887)
> >     > ~[postgresql-42.1.3.jar:42.1.3]
> >     >         at
> >     >
> org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:427)
> >     > ~[postgresql-42.1.3.jar:42.1.3]
> >     >         at
> >     org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354)
> >     > ~[postgresql-42.1.3.jar:42.1.3]
> >     >         at
> >     >
> >
>  org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:169)
> >     > ~[postgresql-42.1.3.jar:42.1.3]
> >     >         at
> >     >
> >
>  org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:136)
> >     > ~[postgresql-42.1.3.jar:42.1.3]
> >     >         at
> >     > org.postgresql.jdbc.PgConnection.isValid(PgConnection.java:1311)
> >     > ~[postgresql-42.1.3.jar:42.1.3]
> >     >         at
> >     >
> >
>  org.apache.manifoldcf.core.jdbcpool.ConnectionPool.getConnection(ConnectionPool.java:92)
> >     > ~[mcf-core.jar:?]
> >     >         at
> >     >
> >
>  org.apache.manifoldcf.core.database.ConnectionFactory.getConnectionWithRetries(ConnectionFactory.java:126)
> >     > ~[mcf-core.jar:?]
> >     >         at
> >     >
> >
>  org.apache.manifoldcf.core.database.ConnectionFactory.getConnection(ConnectionFactory.java:75)
> >     > ~[mcf-core.jar:?]
> >     >         at
> >     >
> >
>  org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:797)
> >     > ~[mcf-core.jar:?]
> >     >         at
> >     >
> >
>  org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)
> >     > ~[mcf-core.jar:?]
> >     >         at
> >     >
> >
>  org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
> >     > ~[mcf-core.jar:?]
> >     >         at
> >     >
> >
>  org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
> >     > ~[mcf-core.jar:?]
> >     >         at
> >     >
> >
>  org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
> >     > ~[mcf-core.jar:?]
> >     >         at
> >     >
> >
>  org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:221)
> >     > ~[mcf-core.jar:?]
> >     >         at
> >     >
> >
>  org.apache.manifoldcf.crawler.jobs.Jobs.getActiveJobConnections(Jobs.java:736)
> >     > ~[mcf-pull-agent.jar:?]
> >     >         at
> >     >
> >
>  org.apache.manifoldcf.crawler.jobs.JobManager.getNextDocuments(JobManager.java:2869)
> >     > ~[mcf-pull-agent.jar:?]
> >     >         at
> >     >
> >
>  org.apache.manifoldcf.crawler.system.StufferThread.run(StufferThread.java:186)
> >     > [mcf-pull-agent.jar:?]
> >     > *2)sudo docker logs <CID> --tail 1000*
> >     > Exception in thread "PostgreSQL-JDBC-SharedTimer-1"
> >     > java.lang.OutOfMemoryError: GC overhead limit exceeded
> >     >         at java.util.ArrayList.iterator(ArrayList.java:840)
> >     >         at
> >     >
> >
>  java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1316)
> >     >         at
> java.net.InetAddress.getAllByName0(InetAddress.java:1277)
> >     >         at java.net.InetAddress.getAllByName(InetAddress.java:1193)
> >     >         at java.net.InetAddress.getAllByName(InetAddress.java:1127)
> >     >         at java.net.InetAddress.getByName(InetAddress.java:1077)
> >     >         at
> >     java.net.InetSocketAddress.<init>(InetSocketAddress.java:220)
> >     >         at org.postgresql.core.PGStream.<init>(PGStream.java:66)
> >     >         at
> >     >
> >
>  org.postgresql.core.QueryExecutorBase.sendQueryCancel(QueryExecutorBase.java:155)
> >     >         at
> >     > org.postgresql.jdbc.PgConnection.cancelQuery(PgConnection.java:971)
> >     >         at
> >     org.postgresql.jdbc.PgStatement.cancel(PgStatement.java:812)
> >     >         at
> org.postgresql.jdbc.PgStatement$1.run(PgStatement.java:880)
> >     >         at java.util.TimerThread.mainLoop(Timer.java:555)
> >     >         at java.util.TimerThread.run(Timer.java:505)
> >     > 2019-12-19 18:09:05,848 Job start thread ERROR Unable to write to
> >     stream
> >     > logs/manifoldcf.log for appender MyFile
> >     > 2019-12-19 18:09:05,848 Seeding thread ERROR Unable to write to
> stream
> >     > logs/manifoldcf.log for appender MyFile
> >     > 2019-12-19 18:09:05,848 Job reset thread ERROR Unable to write to
> >     stream
> >     > logs/manifoldcf.log for appender MyFile
> >     > 2019-12-19 18:09:05,848 Job notification thread ERROR Unable to
> >     write to
> >     > stream logs/manifoldcf.log for appender MyFile
> >     > 2019-12-19 18:09:05,849 Seeding thread ERROR An exception occurred
> >     > processing Appender MyFile
> >     > org.apache.logging.log4j.core.appender.AppenderLoggingException:
> Error
> >     > flushing stream logs/manifoldcf.log
> >     >         at
> >     >
> >
>  org.apache.logging.log4j.core.appender.OutputStreamManager.flush(OutputStreamManager.java:159).
> >     >
> >     > _Also tried the approach to clean up Database by truncating all
> >     > manifoldcf related tables, but still getting this error._
> >     >
> >     > Parameters defined in *postgresql conf *file is as suggested :- and
> >     > "max_pred_per_locks_transctions" is set to value "256".
> >     > image.png
> >
>

Mime
View raw message