nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russell Bateman <russell.bate...@perfectsearchcorp.com>
Subject Re: Nifi hardware recommendation
Date Fri, 14 Oct 2016 16:44:09 GMT
Ali,

"not recommended to dedicate more than 8-10 GM to JVM heap space" by 
whom? Do you have links/references establishing this? I couldn't find 
anyone saying this or why.

Russ

On 10/13/2016 05:47 PM, Ali Nazemian wrote:
> Hi,
>
> I have another question regarding the hardware recommendation. As far 
> as I found out, Nifi uses on-heap memory currently, and it will not 
> try to load the whole object in memory. From the garbage collection 
> perspective, it is not recommended to dedicate more than 8-10 GB to 
> JVM heap space. In this case, may I say spending money on system 
> memory is useless? Probably 16 GB per each system is enough according 
> to this architecture. Unless some architecture changes appear in the 
> future to use off-heap memory as well. However, I found some articles 
> about best practices, and in terms of memory recommendation it does 
> not make sense. Would you please clarify this part for me?
> Thank you very much.
>
> Best regards,
> Ali
>
>
> On Thu, Oct 13, 2016 at 11:38 PM, Ali Nazemian <alinazemian@gmail.com 
> <mailto:alinazemian@gmail.com>> wrote:
>
>     Thank you very much.
>     I would be more than happy to provide some benchmark results after
>     the implementation.
>     Sincerely yours,
>     Ali
>
>     On Thu, Oct 13, 2016 at 11:32 PM, Joe Witt <joe.witt@gmail.com
>     <mailto:joe.witt@gmail.com>> wrote:
>
>         Ali,
>
>         I agree with your assumption.  It would be great to test that
>         out and provide some numbers but intuitively I agree.
>
>         I could envision certain scatter/gather data flows that could
>         challenge that sequential access assumption but honestly with
>         how awesome disk caching is in Linux these days in think
>         practically speaking this is the right way to think about it.
>
>         Thanks
>         Joe
>
>         On Thu, Oct 13, 2016 at 8:29 AM, Ali Nazemian
>         <alinazemian@gmail.com <mailto:alinazemian@gmail.com>> wrote:
>
>             Dear Joe,
>
>             Thank you very much. That was a really great explanation.
>             I investigated the Nifi architecture, and it seems that
>             most of the read/write operations for flow file repo and
>             provenance repo are random. However, for content repo most
>             of the read/write operations are sequential. Let's say
>             cost does not matter. In this case, even choosing SSD for
>             content repo can not provide huge performance gain instead
>             of HDD. Am I right? Hence, it would be better to spend
>             content repo SSD money on network infrastructure.
>
>             Best regards,
>             Ali
>
>             On Thu, Oct 13, 2016 at 10:22 PM, Joe Witt
>             <joe.witt@gmail.com <mailto:joe.witt@gmail.com>> wrote:
>
>                 Ali,
>
>                 You have a lot of nice resources to work with there. 
>                 I'd recommend the series of RAID-1 configuration
>                 personally provided you keep in mind this means you
>                 can only lose a single disk for any one partition.  As
>                 long as they're being monitored and would be quickly
>                 replaced this in practice works well.  If there could
>                 be lapses in monitoring or time to replace then it is
>                 perhaps safer to go with more redundancy or an
>                 alternative RAID type.
>
>                 I'd say do the OS, app installs w/user and audit db
>                 stuff, application logs on one physical RAID volume. 
>                 Have a dedicated physical volume for the flow file
>                 repository.  It will not be able to use all the space
>                 but it certainly could benefit from having no other
>                 contention.  This could be a great thing to have SSDs
>                 for actually.  And for the remaining volumes split
>                 them up for content and provenance as you have. You
>                 get to make the overall performance versus retention
>                 decision. Frankly, you have a great system to work
>                 with and I suspect you're going to see excellent
>                 results anyway.
>
>                 Conservatively speaking expect say 50MB/s of
>                 throughput per volume in the content repository so if
>                 you end up with 8 of them could achieve upwards of
>                 400MB/s sustained. You'll also then want to make sure
>                 you have a good 10G based network setup as well.  Or,
>                 you could dial back on the speed tradeoff and simply
>                 increase retention or disk loss tolerance.  Lots of
>                 ways to play the game.
>
>                 There are no published SSD vs HDD performance
>                 benchmarks that I am aware of though this is a good
>                 idea.  Having a hybrid of SSDs and HDDs could offer a
>                 really solid performance/retention/cost tradeoff.  For
>                 example having SSDs for the
>                 OS/logs/provenance/flowfile with HDDs for the content
>                 - that would be quite nice.  At that rate to take full
>                 advantage of the system you'd need to have very strong
>                 network infrastructure between NiFi and any systems it
>                 is interfacing with  and your flows would need to be
>                 well tuned for GC/memory efficiency.
>
>                 Thanks
>                 Joe
>
>                 On Thu, Oct 13, 2016 at 2:50 AM, Ali Nazemian
>                 <alinazemian@gmail.com <mailto:alinazemian@gmail.com>>
>                 wrote:
>
>                     Dear Nifi Users/ developers,
>                     Hi,
>
>                     I was wondering is there any benchmark about the
>                     question that is it better to dedicate disk
>                     control to Nifi or using RAID for this purpose?
>                     For example, which of these scenarios is
>                     recommended from the performance point of view?
>                     Scenario 1:
>                     24 disk in total
>                     2 disk- raid 1 for OS and fileflow repo
>                     2 disk- raid 1 for provenance repo1
>                     2 disk- raid 1 for provenance repo2
>                     2 disk- raid 1 for content repo1
>                     2 disk- raid 1 for content repo2
>                     2 disk- raid 1 for content repo3
>                     2 disk- raid 1 for content repo4
>                     2 disk- raid 1 for content repo5
>                     2 disk- raid 1 for content repo6
>                     2 disk- raid 1 for content repo7
>                     2 disk- raid 1 for content repo8
>                     2 disk- raid 1 for content repo9
>
>
>                     Scenario 2:
>                     24 disk in total
>                     2 disk- raid 1 for OS and fileflow repo
>                     4 disk- raid 10 for provenance repo1
>                     18 disk- raid 10 for content repo1
>
>                     Moreover, is there any benchmark for SSD vs HDD
>                     performance for Nifi?
>                     Thank you very much.
>
>                     Best regards,
>                     Ali
>
>
>
>
>
>             -- 
>             A.Nazemian
>
>
>
>
>
>     -- 
>     A.Nazemian
>
>
>
>
> -- 
> A.Nazemian


Mime
View raw message