nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil H <gippyp...@gmail.com>
Subject Re: Ideal hardware for NiFi
Date Fri, 14 Sep 2018 01:23:14 GMT
Potentially. We're looking to see how the multiple disks help before
committing to spending money on new hardware :)

On Fri, 14 Sep 2018 at 10:48, Joe Witt <joe.witt@gmail.com> wrote:

> phil,
>
> as you add dirs it will just start using them.  if you want to no longer
> use the current dir it might be more involved.
>
> does that help?
>
> thanks
>
> On Thu, Sep 13, 2018, 4:36 PM Phil H <gippyphil@gmail.com> wrote:
>
> > Follow up question - how do I transition to this new structure? Should I
> > shut down NiFi and move the contents of the legacy single directories
> into
> > one of the new ones? For example:
> >
> > mv /usr/nifi/content_repository
> > /nifi/repos/content-1
> >
> > TIA
> > Phil
> >
> >
> > On Wed, 12 Sep 2018 at 06:15, Mark Payne <markap14@hotmail.com> wrote:
> >
> > > Phil,
> > >
> > > For the content repository, you can configure the directory by changing
> > > the value of
> > > the "nifi.content.repository.directory.default" property in
> > > nifi.properties. The suffix here,
> > > "default" is the name of this "container". You can have multiple
> > > containers by adding extra
> > > properties. So, for example, you could set:
> > >
> > > nifi.content.repository.directory.content1=
> > > /nifi/repos/content-1
> > >
> > > nifi.content.repository.directory.content2=/nifi/repos/content-2
> > > nifi.content.repository.directory.content3=/nifi/repos/content-3
> > > nifi.content.repository.directory.content4=/nifi/repos/content-4
> > >
> > > Similarly, the Provenance Repo property is named
> > > "nifi.provenance.repository.directory.default"
> > > and can have any number of "containers":
> > >
> > > nifi.provenance.repository.directory.prov1=/nifi/repos/prov-1
> > > nifi.provenance.repository.directory.prov2=/nifi/repos/prov-2
> > > nifi.provenance.repository.directory.prov3=/nifi/repos/prov-3
> > > nifi.provenance.repository.directory.prov4=/nifi/repos/prov-4
> > >
> > > When NiFi writes to these, it does a Round Robin so that if you're
> > writing
> > > to 4 Flow Files'
> > > content simultaneously with different threads, you're able to get the
> > full
> > > throughput of each
> > > disk. (So if you have 4 disks for your content repo, each capable of
> > > writing 100 MB/sec, then
> > > your effective write rate to the content repo is 400 MB/sec). Similar
> > with
> > > Provenance Repository.
> > >
> > > Doing this also will allow you to hold a larger 'archive' of content
> and
> > > provenance data, because
> > > it will span the archive across all of the listed directories, as well.
> > >
> > > Thanks
> > > -Mark
> > >
> > >
> > >
> > > > On Sep 11, 2018, at 3:35 PM, Phil H <gippyphil@gmail.com> wrote:
> > > >
> > > > Thanks Mark, this is great advice.
> > > >
> > > > Disk access is certainly an issue with the current set up. I will
> > > certainly
> > > > shoot for NVMe disks in the build. How does NiFi get configured to
> span
> > > > it's repositories across multiple physical disks?
> > > >
> > > > Thanks,
> > > > Phil
> > > >
> > > > On Wed, 12 Sep 2018 at 01:32, Mark Payne <markap14@hotmail.com>
> wrote:
> > > >
> > > >> Phil,
> > > >>
> > > >> As Sivaprasanna mentioned, your bottleneck will certainly depend on
> > your
> > > >> flow.
> > > >> There's nothing inherent about NiFi or the JVM, AFAIK that would
> limit
> > > >> you. I've
> > > >> seen NiFi run on VM's containing 4-8 cores, and I've seen it run on
> > bare
> > > >> metal
> > > >> on servers containing 96+ cores. Most often, I see people with a lot
> > of
> > > >> CPU cores
> > > >> but insufficient disk, so if you're running several cores ensure
> that
> > > >> you're using
> > > >> SSD's / NVMe's or enough spinning disks to accommodate the flow.
> NiFi
> > > does
> > > >> a good
> > > >> job of spanning the content and FlowFile repositories across
> multiple
> > > >> disks to take
> > > >> full advantage of the hardware, and scales the CPU vertically by way
> > of
> > > >> multiple
> > > >> Processors and multiple concurrent tasks (threads) on a given
> > Processor.
> > > >>
> > > >> It really comes down to what you're doing in your flow, though. If
> > > you've
> > > >> got 96 cores and
> > > >> you're trying to perform 5 dozen transformations against a large
> > number
> > > of
> > > >> FlowFiles
> > > >> but have only a single spinning disk, then those 96 cores will
> likely
> > go
> > > >> to waste, because
> > > >> your disk will bottleneck you.
> > > >>
> > > >> Likewise, if you have 10 SSD's and only 8 cores you're likely going
> to
> > > >> waste a lot of
> > > >> disk because you won't have the CPU needed to reach the disks' full
> > > >> potential.
> > > >> So you'll need to strike the correct balance for your use case.Since
> > you
> > > >> have the
> > > >> flow running right now, I would recommend looking at things like
> `top`
> > > and
> > > >> `iostat` in order
> > > >> to understand if you're reaching your limit on CPU, disk, etc.
> > > >>
> > > >> As far as RAM is concerned, NiFI typically only needs 4-8 GB of ram
> > for
> > > >> the heap. However,
> > > >> more RAM means that your operating system can make better use of
> disk
> > > >> caching, which
> > > >> can certainly speed things up, especially if you're reading the
> > content
> > > >> several times for
> > > >> each FlowFile.
> > > >>
> > > >> Does this help at all?
> > > >>
> > > >> Thanks
> > > >> -Mark
> > > >>
> > > >>
> > > >>> On Sep 10, 2018, at 6:05 AM, Phil H <gippyphil@gmail.com>
wrote:
> > > >>>
> > > >>> Thanks for that. Sorry I should have been more specific - we have
a
> > > flow
> > > >>> running already on non-dedicated hardware. Looking to identify
any
> > > >>> limitations in NiFi/JVM that would limit how much parallelism
it
> can
> > > take
> > > >>> advantage of
> > > >>>
> > > >>> On Mon, 10 Sep 2018 at 14:32, Sivaprasanna <
> > sivaprasanna246@gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>>> Phil,
> > > >>>>
> > > >>>> The hardware requirements are driven by the nature of the
dataflow
> > you
> > > >> are
> > > >>>> developing. If you're looking to play around with NiFi and
gain
> some
> > > >>>> hands-on experience, go for a 4 core 8GB RAM i.e. any modern
> > > >>>> laptops/computer would do the job. In my case, where I'm having
> 100s
> > > of
> > > >>>> dataflows, I have it clustered with 3 nodes. Each having 16GB
RAM
> > and
> > > >> 4(8)
> > > >>>> cores. I went with SSDs of smaller size because my flows are
> > involved
> > > in
> > > >>>> writing to object stores like Google Cloud Storage, Azure
Blob and
> > > >> Amazon
> > > >>>> S3 and NoSQL DBs. Hope this helps.
> > > >>>>
> > > >>>> -
> > > >>>> Sivaprasanna
> > > >>>>
> > > >>>> On Mon, Sep 10, 2018 at 4:09 AM Phil H <gippyphil@gmail.com>
> wrote:
> > > >>>>
> > > >>>>> Hi all,
> > > >>>>>
> > > >>>>> I've been asked to spec some hardware for a NiFi installation.
> Does
> > > >>>> anyone
> > > >>>>> have any advice? My gut feel is lots of processor cores
and RAM,
> > with
> > > >>>> less
> > > >>>>> emphasis on storage (small fast disks). Are there any
limitations
> > on
> > > >> how
> > > >>>>> many cores the JRE/NiFi can actually make use of, or any
other
> > > >>>>> considerations like that I should be aware of?
> > > >>>>>
> > > >>>>> Most likely will be pairs of servers in a cluster, but
again any
> > > advice
> > > >>>> to
> > > >>>>> the contrary would be appreciated.
> > > >>>>>
> > > >>>>> Cheers,
> > > >>>>> Phil
> > > >>>>>
> > > >>>>
> > > >>
> > > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message