nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Thomsen <mikerthom...@gmail.com>
Subject Bulk inserting into HBase with NiFi
Date Tue, 06 Jun 2017 23:21:52 GMT
We have a very large body of CSV files (well over 1TB) that need to be
imported into HBase. For a single 20GB segment, we are looking at having to
push easily 100M flowfiles into HBase and most of the JSON files generated
are rather small (like 20-250 bytes).

It's going very slowly, and I assume that is because we're taxing the disk
very heavily because of the content and provenance repositories coming into
play. So I'm wondering if anyone has a suggestion on a good NiFiesque way
of solving this. Right now, I'm considering two options:

1. Looking for a way to inject the HBase controller service into an
ExecuteScript processor so I can handle the data in large chunks (splitting
text and generating a List<Put> inside the processor myself and doing one
huge Put)

2. Creating a library that lets me generate HFiles from within an
ExecuteScript processor.

What I really need is something fast within NiFi that would let me generate
huge blocks of updates for HBase and push them out. Any ideas?

Thanks,

Mike

Mime
View raw message