nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Thomsen <mikerthom...@gmail.com>
Subject Re: Best practices for handling large files
Date Tue, 06 Jun 2017 23:23:41 GMT
Thanks, that's actually what I ended up doing. In case anyone comes along
looking for this. The approach we used for development was:

GetFile -> SplitText (50k chunks) -> SplitText (1 line/flowfile) -> the rest

On Fri, Apr 7, 2017 at 1:11 PM, Andy LoPresto <alopresto@apache.org> wrote:

> Mike,
>
> Are the files a single coherent piece of information (i.e. a video file)
> or collections of smaller atomic units of data (i.e. CSV, JSON batches)? In
> the first case, it’s important to ensure that the processors which deal
> with the content do so in a streaming manner so as not to exhaust your heap
> (and ensure any customer processors you develop do the same), and and with
> the latter, when splitting and merging these records, we generally propose
> a two-step approach, where a single giant file is split into medium size
> flowfiles, and then each of these is split into individual records (i.e. 1
> * 1MM -> 10 * 100K -> 10 * 100K * 1 as opposed to 1 * 1MM -> 1MM * 1).
>
> Other than that, be sure to follow the best practices for configuration in
> the Admin Guide [1] and read about performance expectations [2].
>
> [1] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#
> configuration-best-practices
> [2] https://nifi.apache.org/docs/nifi-docs/html/overview.
> html#performance-expectations-and-characteristics-of-nifi
>
>
> Andy LoPresto
> alopresto@apache.org
> *alopresto.apache@gmail.com <alopresto.apache@gmail.com>*
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Apr 7, 2017, at 5:26 AM, Mike Thomsen <mikerthomsen@gmail.com> wrote:
>
> I have one flow that will have to handle files that are anywhere from
> 500mb to several GB in size. The current plan is to store the in HDFS or S3
> and then bring them down for processing in NiFi. Are there any suggestions
> on how to handle such large single files?
>
> Thanks,
>
> Mike
>
>
>

Mime
View raw message