manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Priya Arora <pr...@smartshore.nl>
Subject Re: Manifold Crawler Crashes
Date Thu, 20 Jun 2019 10:27:37 GMT
Hi Karl,
1) It's single process deployment process.
2) Not  able to access through bash(during crash happens)
3) Server Configuration:-
 For Crawler server - 16 GB RAM and 8-Core Intel(R) Xeon(R) CPU E5-2660 v3
@ 2.60GHz and
For Elasticsearch server - 48GB and 1-Core Intel(R) Xeon(R) CPU E5-2660 v3
@ 2.60GHz
4) Manifold configuration:-
Repository Max connection:-48
Output Max connections:-48

This crash happens when we are running more than two parallel jobs with
almost same configuration at a time.
[image: image.png]

Also, facing these warnings in the log file.It seems to be the reason for
crash.

agents process ran out of memory - shutting down
java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3308)
        at java.util.BitSet.ensureCapacity(BitSet.java:337)
        at java.util.BitSet.expandTo(BitSet.java:352)
        at java.util.BitSet.set(BitSet.java:447)
        at
de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)
        at
org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
        at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
        at
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
        at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
        at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
        at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
        at
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:47)
        at
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:83)
        at
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:141)
        at
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:288)
        at
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:284)
        at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
        at
org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)
        at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
        at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
        at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
        at
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
        at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
        at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
        at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
        at
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:47)
        at
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:83)
        at
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:141)
        at
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:288)
        at
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:284)
        at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
        at
org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)

On Thu, Jun 20, 2019 at 3:36 PM Karl Wright <daddywri@gmail.com> wrote:

> Hi Priya,
>
> Being unable to reach the web interface sounds like either a network issue
> or a problem with the app server.
>
> Can you describe the configuration you are running in?  Is this a
> multiprocess deployment or a single-process deployment?
>
> When your docker container dies, can you still reach it via the standard
> in-container bash tools?  What is happening there?
>
> Karl
>
>
> On Thu, Jun 20, 2019 at 5:54 AM Priya Arora <priya@smartshore.nl> wrote:
>
>> Hi Karl,
>>
>> Crash here means, "the site could not be reached" kind of HTML page
>> appears , when accessing http://localhost:3000/mcf-crawler-ui/index.jsp.
>> Explanation:- When running certain job on ManifoldCF server(2.13) after
>> sometime (of successful running state), suddenly browser gives me "the site
>> could not be reached" (this kind of error) and page does not reload until i
>> restart it through docker command.
>> once i will restart the container through docker MCF get to load again.
>>
>> Thanks
>> Priya
>>
>> On Thu, Jun 20, 2019 at 3:08 PM Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Please describe what you mean by "crash".  What actually happens?
>>>
>>> Karl
>>>
>>> On Thu, Jun 20, 2019, 2:04 AM Priya Arora <priya@smartshore.nl> wrote:
>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>> I am running multiple jobs(2,3) simultaneously on Manifold server and
>>>> the configuration is
>>>>
>>>> 1) For Crawler server - 16 GB RAM and 8-Core Intel(R) Xeon(R) CPU
>>>> E5-2660 v3 @ 2.60GHz and
>>>>
>>>> 2) For Elasticsearch server - 48GB and 1-Core Intel(R) Xeon(R) CPU
>>>> E5-2660 v3 @ 2.60GHz
>>>> Job working is to fetch data from some public and intranet sites and
>>>> then ingesting data into Elastic search.
>>>>
>>>> Maximum connection on both Repository connections and Output connection
>>>> is 48(for all 3 jobs).
>>>>
>>>> What problem i am facing here is when i am running multiple jobs the
>>>> manifold crashes after some time and there is nothing inside manifold.log
>>>> files that hints out me some error.
>>>> Is the maximum connections increases(48+48+48) while running all three
>>>> jobs together?
>>>> So do i need to divide max connections(48) among all three jobs?
>>>> How many connections maximum we can have to run the jobs individually
>>>> and simultaneously.
>>>>
>>>> what should be the maximum allowed number of max handles in
>>>> properties.xml file and postgres config file?
>>>>
>>>> So the problem is to figure out what is the reason for the crawler
>>>> crash.
>>>> Can you please help me on that as soon as possible.
>>>>
>>>> Thanks and regards
>>>> Priya
>>>> priya@smartshore.nl
>>>>
>>>>
>>>>

Mime
View raw message