nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierre Villard <>
Subject Re: Fetch Contents of HDFS Directory as a Part of a Larger Flow
Date Thu, 03 May 2018 15:45:37 GMT
Hi Shawn,

If you know the path of the files to retrieve in HDFS, you could use
FetchHDFS processor.
If you need to retrieve all the files within the directory created by Hive,
I guess you could list the existing files calling the REST API of WebHDFS
and then use the FetchHDFS processor.

Not sure that's the best solution to your requirement though.


2018-05-03 17:35 GMT+02:00 Shawn Weeks <>:

> I'm building a rest service with the HTTP Request and Response
> Processors to support data extracts from Hive. Since some of the extracts
> can be quiet large using the SelectHiveQL Processor isn't a performant
> option and instead I'm trying to use on demand Hive Temporary Tables to do
> the heavy lifting via CTAS(Create Table as Select). Since GetHDFS doesn't
> support an incoming connection I'm trying to figure out another way to
> fetch the files Hive creates and return them as a download in the web
> service. Has anyone else worked out a good solution for fetching the
> contents of a directory from HDFS as a part of larger flow?
> Thanks
> Shawn

View raw message