nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <joe.w...@gmail.com>
Subject Re: Ideal hardware for NiFi
Date Fri, 14 Sep 2018 03:36:50 GMT
The ff disk needs to be the quickest disk and should have no other
contention just like a db trans log would request.

The prov repo should also have its pwn disk.

The content repo can have one or more physical disks.

The best case is each repo is on physically separate disks/underlying
storage.  Not always an option i realize but for maximize performance it
matters.

then its about proper config and optimal flow design.

Its ok if your ff repo disk is always busy...thats a good thing.  If iostat
shows always 100% usage then prob things arent ideal.

thanks

On Thu, Sep 13, 2018, 9:16 PM Phil H <gippyphil@gmail.com> wrote:

> Hi joe,
>
> I moved the content and providence repositories off to two new disks, but
> it seems like the vast majority of the writes are still occurring on the
> disk where the flowfile and database repositories are. I note they don't
> appear to be able to be split across disks in the same way?
>
> On Fri, 14 Sep 2018 at 12:37, Joe Witt <joe.witt@gmail.com> wrote:
>
> > if they are physically seperate the diff should be quite noticable.
> >
> > On Thu, Sep 13, 2018, 7:36 PM Phil H <gippyphil@gmail.com> wrote:
> >
> > > Potentially. We're looking to see how the multiple disks help before
> > > committing to spending money on new hardware :)
> > >
> > > On Fri, 14 Sep 2018 at 10:48, Joe Witt <joe.witt@gmail.com> wrote:
> > >
> > > > phil,
> > > >
> > > > as you add dirs it will just start using them.  if you want to no
> > longer
> > > > use the current dir it might be more involved.
> > > >
> > > > does that help?
> > > >
> > > > thanks
> > > >
> > > > On Thu, Sep 13, 2018, 4:36 PM Phil H <gippyphil@gmail.com> wrote:
> > > >
> > > > > Follow up question - how do I transition to this new structure?
> > Should
> > > I
> > > > > shut down NiFi and move the contents of the legacy single
> directories
> > > > into
> > > > > one of the new ones? For example:
> > > > >
> > > > > mv /usr/nifi/content_repository
> > > > > /nifi/repos/content-1
> > > > >
> > > > > TIA
> > > > > Phil
> > > > >
> > > > >
> > > > > On Wed, 12 Sep 2018 at 06:15, Mark Payne <markap14@hotmail.com>
> > wrote:
> > > > >
> > > > > > Phil,
> > > > > >
> > > > > > For the content repository, you can configure the directory
by
> > > changing
> > > > > > the value of
> > > > > > the "nifi.content.repository.directory.default" property in
> > > > > > nifi.properties. The suffix here,
> > > > > > "default" is the name of this "container". You can have multiple
> > > > > > containers by adding extra
> > > > > > properties. So, for example, you could set:
> > > > > >
> > > > > > nifi.content.repository.directory.content1=
> > > > > > /nifi/repos/content-1
> > > > > >
> > > > > > nifi.content.repository.directory.content2=/nifi/repos/content-2
> > > > > > nifi.content.repository.directory.content3=/nifi/repos/content-3
> > > > > > nifi.content.repository.directory.content4=/nifi/repos/content-4
> > > > > >
> > > > > > Similarly, the Provenance Repo property is named
> > > > > > "nifi.provenance.repository.directory.default"
> > > > > > and can have any number of "containers":
> > > > > >
> > > > > > nifi.provenance.repository.directory.prov1=/nifi/repos/prov-1
> > > > > > nifi.provenance.repository.directory.prov2=/nifi/repos/prov-2
> > > > > > nifi.provenance.repository.directory.prov3=/nifi/repos/prov-3
> > > > > > nifi.provenance.repository.directory.prov4=/nifi/repos/prov-4
> > > > > >
> > > > > > When NiFi writes to these, it does a Round Robin so that if
> you're
> > > > > writing
> > > > > > to 4 Flow Files'
> > > > > > content simultaneously with different threads, you're able to
get
> > the
> > > > > full
> > > > > > throughput of each
> > > > > > disk. (So if you have 4 disks for your content repo, each capable
> > of
> > > > > > writing 100 MB/sec, then
> > > > > > your effective write rate to the content repo is 400 MB/sec).
> > Similar
> > > > > with
> > > > > > Provenance Repository.
> > > > > >
> > > > > > Doing this also will allow you to hold a larger 'archive' of
> > content
> > > > and
> > > > > > provenance data, because
> > > > > > it will span the archive across all of the listed directories,
as
> > > well.
> > > > > >
> > > > > > Thanks
> > > > > > -Mark
> > > > > >
> > > > > >
> > > > > >
> > > > > > > On Sep 11, 2018, at 3:35 PM, Phil H <gippyphil@gmail.com>
> wrote:
> > > > > > >
> > > > > > > Thanks Mark, this is great advice.
> > > > > > >
> > > > > > > Disk access is certainly an issue with the current set
up. I
> will
> > > > > > certainly
> > > > > > > shoot for NVMe disks in the build. How does NiFi get configured
> > to
> > > > span
> > > > > > > it's repositories across multiple physical disks?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Phil
> > > > > > >
> > > > > > > On Wed, 12 Sep 2018 at 01:32, Mark Payne <markap14@hotmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > >> Phil,
> > > > > > >>
> > > > > > >> As Sivaprasanna mentioned, your bottleneck will certainly
> depend
> > > on
> > > > > your
> > > > > > >> flow.
> > > > > > >> There's nothing inherent about NiFi or the JVM, AFAIK
that
> would
> > > > limit
> > > > > > >> you. I've
> > > > > > >> seen NiFi run on VM's containing 4-8 cores, and I've
seen it
> run
> > > on
> > > > > bare
> > > > > > >> metal
> > > > > > >> on servers containing 96+ cores. Most often, I see
people
> with a
> > > lot
> > > > > of
> > > > > > >> CPU cores
> > > > > > >> but insufficient disk, so if you're running several
cores
> ensure
> > > > that
> > > > > > >> you're using
> > > > > > >> SSD's / NVMe's or enough spinning disks to accommodate
the
> flow.
> > > > NiFi
> > > > > > does
> > > > > > >> a good
> > > > > > >> job of spanning the content and FlowFile repositories
across
> > > > multiple
> > > > > > >> disks to take
> > > > > > >> full advantage of the hardware, and scales the CPU
vertically
> by
> > > way
> > > > > of
> > > > > > >> multiple
> > > > > > >> Processors and multiple concurrent tasks (threads)
on a given
> > > > > Processor.
> > > > > > >>
> > > > > > >> It really comes down to what you're doing in your flow,
> though.
> > If
> > > > > > you've
> > > > > > >> got 96 cores and
> > > > > > >> you're trying to perform 5 dozen transformations against
a
> large
> > > > > number
> > > > > > of
> > > > > > >> FlowFiles
> > > > > > >> but have only a single spinning disk, then those 96
cores will
> > > > likely
> > > > > go
> > > > > > >> to waste, because
> > > > > > >> your disk will bottleneck you.
> > > > > > >>
> > > > > > >> Likewise, if you have 10 SSD's and only 8 cores you're
likely
> > > going
> > > > to
> > > > > > >> waste a lot of
> > > > > > >> disk because you won't have the CPU needed to reach
the disks'
> > > full
> > > > > > >> potential.
> > > > > > >> So you'll need to strike the correct balance for your
use
> > > case.Since
> > > > > you
> > > > > > >> have the
> > > > > > >> flow running right now, I would recommend looking at
things
> like
> > > > `top`
> > > > > > and
> > > > > > >> `iostat` in order
> > > > > > >> to understand if you're reaching your limit on CPU,
disk, etc.
> > > > > > >>
> > > > > > >> As far as RAM is concerned, NiFI typically only needs
4-8 GB
> of
> > > ram
> > > > > for
> > > > > > >> the heap. However,
> > > > > > >> more RAM means that your operating system can make
better use
> of
> > > > disk
> > > > > > >> caching, which
> > > > > > >> can certainly speed things up, especially if you're
reading
> the
> > > > > content
> > > > > > >> several times for
> > > > > > >> each FlowFile.
> > > > > > >>
> > > > > > >> Does this help at all?
> > > > > > >>
> > > > > > >> Thanks
> > > > > > >> -Mark
> > > > > > >>
> > > > > > >>
> > > > > > >>> On Sep 10, 2018, at 6:05 AM, Phil H <gippyphil@gmail.com>
> > wrote:
> > > > > > >>>
> > > > > > >>> Thanks for that. Sorry I should have been more
specific - we
> > > have a
> > > > > > flow
> > > > > > >>> running already on non-dedicated hardware. Looking
to
> identify
> > > any
> > > > > > >>> limitations in NiFi/JVM that would limit how much
parallelism
> > it
> > > > can
> > > > > > take
> > > > > > >>> advantage of
> > > > > > >>>
> > > > > > >>> On Mon, 10 Sep 2018 at 14:32, Sivaprasanna <
> > > > > sivaprasanna246@gmail.com>
> > > > > > >>> wrote:
> > > > > > >>>
> > > > > > >>>> Phil,
> > > > > > >>>>
> > > > > > >>>> The hardware requirements are driven by the
nature of the
> > > dataflow
> > > > > you
> > > > > > >> are
> > > > > > >>>> developing. If you're looking to play around
with NiFi and
> > gain
> > > > some
> > > > > > >>>> hands-on experience, go for a 4 core 8GB RAM
i.e. any modern
> > > > > > >>>> laptops/computer would do the job. In my case,
where I'm
> > having
> > > > 100s
> > > > > > of
> > > > > > >>>> dataflows, I have it clustered with 3 nodes.
Each having
> 16GB
> > > RAM
> > > > > and
> > > > > > >> 4(8)
> > > > > > >>>> cores. I went with SSDs of smaller size because
my flows are
> > > > > involved
> > > > > > in
> > > > > > >>>> writing to object stores like Google Cloud
Storage, Azure
> Blob
> > > and
> > > > > > >> Amazon
> > > > > > >>>> S3 and NoSQL DBs. Hope this helps.
> > > > > > >>>>
> > > > > > >>>> -
> > > > > > >>>> Sivaprasanna
> > > > > > >>>>
> > > > > > >>>> On Mon, Sep 10, 2018 at 4:09 AM Phil H <gippyphil@gmail.com
> >
> > > > wrote:
> > > > > > >>>>
> > > > > > >>>>> Hi all,
> > > > > > >>>>>
> > > > > > >>>>> I've been asked to spec some hardware for
a NiFi
> > installation.
> > > > Does
> > > > > > >>>> anyone
> > > > > > >>>>> have any advice? My gut feel is lots of
processor cores and
> > > RAM,
> > > > > with
> > > > > > >>>> less
> > > > > > >>>>> emphasis on storage (small fast disks).
Are there any
> > > limitations
> > > > > on
> > > > > > >> how
> > > > > > >>>>> many cores the JRE/NiFi can actually make
use of, or any
> > other
> > > > > > >>>>> considerations like that I should be aware
of?
> > > > > > >>>>>
> > > > > > >>>>> Most likely will be pairs of servers in
a cluster, but
> again
> > > any
> > > > > > advice
> > > > > > >>>> to
> > > > > > >>>>> the contrary would be appreciated.
> > > > > > >>>>>
> > > > > > >>>>> Cheers,
> > > > > > >>>>> Phil
> > > > > > >>>>>
> > > > > > >>>>
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message