storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Navin Ipe <navin....@searchlighthealth.com>
Subject Re: Identifying the source of the memory error in Storm
Date Wed, 08 Feb 2017 09:25:27 GMT
Thank you Mostafa.

On Tue, Feb 7, 2017 at 2:25 PM, Mostafa Gomaa <mgomaa@trendak.com> wrote:

> I had a similar issue and I solved it by setting this option worker.heap.
> memory.mb
>
>
> On Feb 7, 2017 10:45 AM, "Navin Ipe" <navin.ipe@searchlighthealth.com>
> wrote:
>
>> Hi,
>>
>> Even though I ran the topology on a server with 30GB RAM, it still
>> crashed.
>> I had set *stormConfig.put(Config.TOPOLOGY_WORKER_CHILDOPTS, "-Xmx" +
>> "15g");*
>>
>> But still, when I see the workers using htop, their virtual memory is
>> shown as 15G, but toward the right side of the screen, under the command
>> column it shows "java -Xmx2048m and a few other options". I assume this was
>> the command that storm used to start the worker.
>>
>> So howcome my memory setting isn't getting used by the worker? Why is it
>> still using 2GB instead of 15GB?
>> Also, out of the 30GB, 25GB was getting used. How could that happen when
>> I have only 4 slots and 4 workers running? The exact same thing was taking
>> up just 5GB on a system with 10GB RAM, where I configured -Xmx to "2g".
>>
>> Could you help me understand this?
>>
>>
>> On Mon, Feb 6, 2017 at 2:29 PM, Navin Ipe <navin.ipe@searchlighthealth.c
>> om> wrote:
>>
>>> Thank you. Been monitoring it via JConsole, and these are what I see:
>>> Supervisor used memory: 61MB
>>>
>>> *Supervisor committed memory: 171MB*
>>> *Supervisor Max memory: 239.1MB*
>>>
>>> Nimbus used memory: 44.3MB
>>> Nimbus committed memory: 169.3MB
>>> Nimbus max memory: 954.7MB
>>>
>>> Zookeeper used memory: 224MB
>>> Zookeeper committed memory: 529MB
>>> Zookeeper Max memory: 1.9GB
>>>
>>> Worker used memory: 941MB
>>>
>>> *Worker committed memory: 1.4GB*
>>> *Worker Max memory: 1.9GB*
>>>
>>> So from what it looks like, even if the worker memory is managed and
>>> kept low, the supervisor can crash because of low memory. So the solution
>>> appears to be to increase supervisor memory in storm.yaml, use bigger RAM
>>> and use swap space.
>>>
>>> If you have any other opinions, please let me know.
>>>
>>>
>>> On Sun, Feb 5, 2017 at 7:10 PM, Andrea Gazzarini <gxsubs@gmail.com>
>>> wrote:
>>>
>>>> Hi Navin,
>>>> I think this line is a good starting point for your analysis:
>>>>
>>>>
>>>>
>>>> *"There is insufficient memory for the Java Runtime Environment to
>>>> continue." *I don't believe this scenario is caught by the JVM as a
>>>> checked exception: in my opinion it belongs to the "Error" class, and that
>>>> would explain why the catch block is never reached.
>>>> In addition, your assumption could be also right: the part of code that
>>>> raises the exception could be everywhere in the worker code, not
>>>> necessarily within your class; this because memory errors, differently from
>>>> what in general happens for exceptions, don't have a deterministic point
of
>>>> failure, they depends on the system state at a given moment.
>>>>
>>>> Please expand a bit (or investigate on yourself) your architecture,
>>>> nodes, hardware resources and any information that can helps understanding
>>>> your context. Tools like JVisualVM, JConsole, Storm GUI are precious
>>>> friends in this contexts.
>>>>
>>>> Best,
>>>> Andrea
>>>>
>>>>
>>>> On 05/02/17 12:53, Navin Ipe wrote:
>>>>
>>>>
>>>>
>>>> *Hi, *
>>>> *I have a bolt which emits around 15000 tuples sometimes. Sometimes it
>>>> emits more than 20000 tuples. I think when this happens, there's a memory
>>>> issue and the workers get restarted. This is what worker.log.err contains:*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> * Java HotSpot(TM) 64-Bit Server VM warning: INFO:
>>>> os::commit_memory(0x00000000f1000000, 62914560, 0) failed; error='Cannot
>>>> allocate memory' (errno=12) # There is insufficient memory for the Java
>>>> Runtime Environment to continue. # Native memory allocation (mmap) failed
>>>> to map 62914560 bytes for committing reserved memory. # An error report
>>>> file with more information is saved as: #
>>>> /home/storm/apache-storm-1.0.0/storm-local/workers/6a1a70ad-d094-437a-a9c5-e837fc1b3535/hs_err_pid2766.log*
>>>>
>>>> *The odd part is, that in all my bolts I have *
>>>>
>>>>
>>>>
>>>> *     @Override     public void execute(Tuple tuple) {         try { *
>>>>
>>>> *..some code; including the code that emits tuples *
>>>>
>>>> *} catch(Exception ex) {logger.info <http://logger.info>("The exception
>>>> {}, {}", ex.getCause(), ex.getMessage());}     }*
>>>>
>>>> *But in the logs I never see the string "The exception". But worker.log
>>>> shows:*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *2017-02-05 09:14:01.320 STDERR [INFO] Java HotSpot(TM) 64-Bit Server
>>>> VM warning: INFO: os::commit_memory(0x00000000e6f80000, 37748736, 0)
>>>> failed; error='Cannot allocate memory' (errno=12) 2017-02-05 09:14:01.320
>>>> STDERR [INFO] # 2017-02-05 09:14:01.330 STDERR [INFO] # There is
>>>> insufficient memory for the Java Runtime Environment to continue.
>>>> 2017-02-05 09:14:01.330 STDERR [INFO] # Native memory allocation (mmap)
>>>> failed to map 37748736 bytes for committing reserved memory. 2017-02-05
>>>> 09:14:01.331 STDERR [INFO] # An error report file with more information is
>>>> saved as: 2017-02-05 09:14:01.331 STDERR [INFO] #
>>>> /home/storm/apache-storm-1.0.0/storm-local/workers/2685b445-c4a9-4f7e-94e1-1ce3fe13de47/hs_err_pid3022.log
>>>> 2017-02-05 09:14:06.904 o.a.s.d.worker [INFO] Launching worker for
>>>> HydraCellGen-138-1486283223 on 3fc3c05e-9769-4033-bf7d-df609d6c4963:6701
>>>> with id 575bd7ed-a3fc-4f7f-a7d0-cdd4054c9fc5 and conf
>>>> {"topology.builtin.metrics.bucket.size.secs" 60, "nimbus.childopts"
>>>> "-Xmx1024m",... etc*
>>>>
>>>> *These are the settings I'm using for the topology:*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *         Config stormConfig = new Config();
>>>> stormConfig.setNumWorkers(20);         stormConfig.setNumAckers(20);
>>>>         stormConfig.put(Config.TOPOLOGY_DEBUG, false);
>>>> stormConfig.put(Config.TOPOLOGY_TRANSFER_BUFFER_SIZE,          1024);
>>>>         stormConfig.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE,
>>>> 65536);
>>>> stormConfig.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE,    65536);
>>>>         stormConfig.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 2);
>>>> stormConfig.put(Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS, 2200);
>>>> stormConfig.put(Config.STORM_ZOOKEEPER_SERVERS, Arrays.asList(new
>>>> String[]{"localhost"}));
>>>> stormConfig.put(Config.TOPOLOGY_WORKER_CHILDOPTS, "-Xmx" + "2g");*
>>>>
>>>>
>>>>
>>>> *So am I right in assuming the exception is not thrown in my code but
>>>> is thrown in the worker thread? Do such exceptions happen when the worker
>>>> isn't able to receive too many tuples in its queue? *
>>>> *What can I do to avoid this problem?*
>>>>
>>>> --
>>>> Regards,
>>>> Navin
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Navin
>>>
>>
>>
>>
>> --
>> Regards,
>> Navin
>>
>


-- 
Regards,
Navin

Mime
View raw message