nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Bende <bbe...@gmail.com>
Subject Re: Fetch Contents of HDFS Directory as a Part of a Larger Flow
Date Thu, 03 May 2018 15:51:03 GMT
Another option would be if the Hadoop client was installed on the NiFi
node then you could use one of the script processors to make a call to
"hadoop fs -ls ...".

If the response is so large that it requires heavy lifting of writing
out temp tables to HDFS and then fetching those files into NiFi, and
most likely merging to a single response flow file, is that really
expected to happen in the context of a single web request/response?

On Thu, May 3, 2018 at 11:45 AM, Pierre Villard
<pierre.villard.fr@gmail.com> wrote:
> Hi Shawn,
>
> If you know the path of the files to retrieve in HDFS, you could use
> FetchHDFS processor.
> If you need to retrieve all the files within the directory created by Hive,
> I guess you could list the existing files calling the REST API of WebHDFS
> and then use the FetchHDFS processor.
>
> Not sure that's the best solution to your requirement though.
>
> Pierre
>
> 2018-05-03 17:35 GMT+02:00 Shawn Weeks <sweeks@weeksconsulting.us>:
>>
>> I'm building a rest service with the HTTP Request and Response Processors
>> to support data extracts from Hive. Since some of the extracts can be quiet
>> large using the SelectHiveQL Processor isn't a performant option and instead
>> I'm trying to use on demand Hive Temporary Tables to do the heavy lifting
>> via CTAS(Create Table as Select). Since GetHDFS doesn't support an incoming
>> connection I'm trying to figure out another way to fetch the files Hive
>> creates and return them as a download in the web service. Has anyone else
>> worked out a good solution for fetching the contents of a directory from
>> HDFS as a part of larger flow?
>>
>>
>> Thanks
>>
>> Shawn
>
>

Mime
View raw message