storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mostafa Gomaa <mgo...@trendak.com>
Subject Re: Identifying the source of the memory error in Storm
Date Tue, 07 Feb 2017 08:55:14 GMT
I had a similar issue and I solved it by setting this option worker.heap.
memory.mb


On Feb 7, 2017 10:45 AM, "Navin Ipe" <navin.ipe@searchlighthealth.com>
wrote:

> Hi,
>
> Even though I ran the topology on a server with 30GB RAM, it still crashed.
> I had set *stormConfig.put(Config.TOPOLOGY_WORKER_CHILDOPTS, "-Xmx" +
> "15g");*
>
> But still, when I see the workers using htop, their virtual memory is
> shown as 15G, but toward the right side of the screen, under the command
> column it shows "java -Xmx2048m and a few other options". I assume this was
> the command that storm used to start the worker.
>
> So howcome my memory setting isn't getting used by the worker? Why is it
> still using 2GB instead of 15GB?
> Also, out of the 30GB, 25GB was getting used. How could that happen when I
> have only 4 slots and 4 workers running? The exact same thing was taking up
> just 5GB on a system with 10GB RAM, where I configured -Xmx to "2g".
>
> Could you help me understand this?
>
>
> On Mon, Feb 6, 2017 at 2:29 PM, Navin Ipe <navin.ipe@searchlighthealth.com
> > wrote:
>
>> Thank you. Been monitoring it via JConsole, and these are what I see:
>> Supervisor used memory: 61MB
>>
>> *Supervisor committed memory: 171MB*
>> *Supervisor Max memory: 239.1MB*
>>
>> Nimbus used memory: 44.3MB
>> Nimbus committed memory: 169.3MB
>> Nimbus max memory: 954.7MB
>>
>> Zookeeper used memory: 224MB
>> Zookeeper committed memory: 529MB
>> Zookeeper Max memory: 1.9GB
>>
>> Worker used memory: 941MB
>>
>> *Worker committed memory: 1.4GB*
>> *Worker Max memory: 1.9GB*
>>
>> So from what it looks like, even if the worker memory is managed and kept
>> low, the supervisor can crash because of low memory. So the solution
>> appears to be to increase supervisor memory in storm.yaml, use bigger RAM
>> and use swap space.
>>
>> If you have any other opinions, please let me know.
>>
>>
>> On Sun, Feb 5, 2017 at 7:10 PM, Andrea Gazzarini <gxsubs@gmail.com>
>> wrote:
>>
>>> Hi Navin,
>>> I think this line is a good starting point for your analysis:
>>>
>>>
>>>
>>> *"There is insufficient memory for the Java Runtime Environment to
>>> continue." *I don't believe this scenario is caught by the JVM as a
>>> checked exception: in my opinion it belongs to the "Error" class, and that
>>> would explain why the catch block is never reached.
>>> In addition, your assumption could be also right: the part of code that
>>> raises the exception could be everywhere in the worker code, not
>>> necessarily within your class; this because memory errors, differently from
>>> what in general happens for exceptions, don't have a deterministic point of
>>> failure, they depends on the system state at a given moment.
>>>
>>> Please expand a bit (or investigate on yourself) your architecture,
>>> nodes, hardware resources and any information that can helps understanding
>>> your context. Tools like JVisualVM, JConsole, Storm GUI are precious
>>> friends in this contexts.
>>>
>>> Best,
>>> Andrea
>>>
>>>
>>> On 05/02/17 12:53, Navin Ipe wrote:
>>>
>>>
>>>
>>> *Hi, *
>>> *I have a bolt which emits around 15000 tuples sometimes. Sometimes it
>>> emits more than 20000 tuples. I think when this happens, there's a memory
>>> issue and the workers get restarted. This is what worker.log.err contains:*
>>>
>>>
>>>
>>>
>>>
>>> * Java HotSpot(TM) 64-Bit Server VM warning: INFO:
>>> os::commit_memory(0x00000000f1000000, 62914560, 0) failed; error='Cannot
>>> allocate memory' (errno=12) # There is insufficient memory for the Java
>>> Runtime Environment to continue. # Native memory allocation (mmap) failed
>>> to map 62914560 bytes for committing reserved memory. # An error report
>>> file with more information is saved as: #
>>> /home/storm/apache-storm-1.0.0/storm-local/workers/6a1a70ad-d094-437a-a9c5-e837fc1b3535/hs_err_pid2766.log*
>>>
>>> *The odd part is, that in all my bolts I have *
>>>
>>>
>>>
>>> *     @Override     public void execute(Tuple tuple) {         try { *
>>>
>>> *..some code; including the code that emits tuples *
>>>
>>> *} catch(Exception ex) {logger.info <http://logger.info>("The exception
>>> {}, {}", ex.getCause(), ex.getMessage());}     }*
>>>
>>> *But in the logs I never see the string "The exception". But worker.log
>>> shows:*
>>>
>>>
>>>
>>>
>>>
>>>
>>> *2017-02-05 09:14:01.320 STDERR [INFO] Java HotSpot(TM) 64-Bit Server VM
>>> warning: INFO: os::commit_memory(0x00000000e6f80000, 37748736, 0) failed;
>>> error='Cannot allocate memory' (errno=12) 2017-02-05 09:14:01.320 STDERR
>>> [INFO] # 2017-02-05 09:14:01.330 STDERR [INFO] # There is insufficient
>>> memory for the Java Runtime Environment to continue. 2017-02-05
>>> 09:14:01.330 STDERR [INFO] # Native memory allocation (mmap) failed to map
>>> 37748736 bytes for committing reserved memory. 2017-02-05 09:14:01.331
>>> STDERR [INFO] # An error report file with more information is saved as:
>>> 2017-02-05 09:14:01.331 STDERR [INFO] #
>>> /home/storm/apache-storm-1.0.0/storm-local/workers/2685b445-c4a9-4f7e-94e1-1ce3fe13de47/hs_err_pid3022.log
>>> 2017-02-05 09:14:06.904 o.a.s.d.worker [INFO] Launching worker for
>>> HydraCellGen-138-1486283223 on 3fc3c05e-9769-4033-bf7d-df609d6c4963:6701
>>> with id 575bd7ed-a3fc-4f7f-a7d0-cdd4054c9fc5 and conf
>>> {"topology.builtin.metrics.bucket.size.secs" 60, "nimbus.childopts"
>>> "-Xmx1024m",... etc*
>>>
>>> *These are the settings I'm using for the topology:*
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *         Config stormConfig = new Config();
>>> stormConfig.setNumWorkers(20);         stormConfig.setNumAckers(20);
>>>         stormConfig.put(Config.TOPOLOGY_DEBUG, false);
>>> stormConfig.put(Config.TOPOLOGY_TRANSFER_BUFFER_SIZE,          1024);
>>>         stormConfig.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE,
>>> 65536);
>>> stormConfig.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE,    65536);
>>>         stormConfig.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 2);
>>> stormConfig.put(Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS, 2200);
>>> stormConfig.put(Config.STORM_ZOOKEEPER_SERVERS, Arrays.asList(new
>>> String[]{"localhost"}));
>>> stormConfig.put(Config.TOPOLOGY_WORKER_CHILDOPTS, "-Xmx" + "2g");*
>>>
>>>
>>>
>>> *So am I right in assuming the exception is not thrown in my code but is
>>> thrown in the worker thread? Do such exceptions happen when the worker
>>> isn't able to receive too many tuples in its queue? *
>>> *What can I do to avoid this problem?*
>>>
>>> --
>>> Regards,
>>> Navin
>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>> Navin
>>
>
>
>
> --
> Regards,
> Navin
>

Mime
View raw message