It would of been far more useful if they measured the systems in terms of dollars, as each system makes different tradeoffs. Certainly when you enable acking you may become bottlenecked on CPU at that point instead of being bottlenecked on disk/kafka. So one thing you can do is move to hardware with higher class CPUs to solve the bottleneck. The system they built is persisting intermediary queues between components in a topology. So while this will reduce CPU load by not needing an acking system, you will need more disks as potentially any of the intermediately queues can start to fill up now, you need to reserve capacity for worst case scenario. Potentially in terms of dollars the tradeoff to use more disks has marginally better total cost. 




On Fri, Apr 4, 2014 at 6:55 PM, Benjamin Black <b@b3k.us> wrote:
No part of the post made any sense to me. There is a significant performance hit when moving to reliable operation in any system and Storm is clearly doing a good job if a custom built solution can only manage 25% more throughput.


On Fri, Apr 4, 2014 at 4:10 PM, Neelesh <neeleshs@gmail.com> wrote:
Its an interesting read. The blog is vague on some details - with ACK on, the throughput was 80K/s. With their custom solution its 100K/s. Assuming they were both deployed on similar hardware (I do not know , the blog does not confirm either way), the difference is not something that warrants a custom framework to me. Obviously its working better for Loggly. 


On Fri, Apr 4, 2014 at 8:26 AM, Otis Gospodnetic <otis.gospodnetic@gmail.com> wrote:
Hi,

Apparently Loggly decided to ditch Storm when they got hit by the 2.5x performance degradation factor after turning on ACKing:

How does one minimize this performance hit?
Or maybe newer versions of Storm perform better with ACK? (Loggly tested 0.82, they say)

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/