nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Srinivasan <james.sriniva...@gmail.com>
Subject Re: Fetch Contents of HDFS Directory as a Part of a Larger Flow
Date Thu, 03 May 2018 19:11:25 GMT
We handle a similar situation using CTAS and then retrieve the resulting
data using webhdfs.

James

On Thu, 3 May 2018, 17:18 Bryan Bende, <bbende@gmail.com> wrote:

> The two step idea makes sense...
>
> If you did want to go with the OS call you would probably want
> ExecuteStreamCommand.
>
> On Thu, May 3, 2018 at 12:06 PM, Shawn Weeks <sweeks@weeksconsulting.us>
> wrote:
> > I'm thinking about ways to do the operation in two steps where the first
> > request starts the process of generating the data and returns an uuid and
> > the second request can check on the status and download the file. Still
> have
> > to workout how to collect the output from the Hive table so I'll look at
> the
> > rest calls. Not sure of a good way to make an OS call as ExecuteProcess
> > doesn't support inputs either.
> >
> >
> > Thanks
> >
> > Shawn
> >
> > ________________________________
> > From: Bryan Bende <bbende@gmail.com>
> > Sent: Thursday, May 3, 2018 10:51:03 AM
> > To: users@nifi.apache.org
> > Subject: Re: Fetch Contents of HDFS Directory as a Part of a Larger Flow
> >
> > Another option would be if the Hadoop client was installed on the NiFi
> > node then you could use one of the script processors to make a call to
> > "hadoop fs -ls ...".
> >
> > If the response is so large that it requires heavy lifting of writing
> > out temp tables to HDFS and then fetching those files into NiFi, and
> > most likely merging to a single response flow file, is that really
> > expected to happen in the context of a single web request/response?
> >
> > On Thu, May 3, 2018 at 11:45 AM, Pierre Villard
> > <pierre.villard.fr@gmail.com> wrote:
> >> Hi Shawn,
> >>
> >> If you know the path of the files to retrieve in HDFS, you could use
> >> FetchHDFS processor.
> >> If you need to retrieve all the files within the directory created by
> >> Hive,
> >> I guess you could list the existing files calling the REST API of
> WebHDFS
> >> and then use the FetchHDFS processor.
> >>
> >> Not sure that's the best solution to your requirement though.
> >>
> >> Pierre
> >>
> >> 2018-05-03 17:35 GMT+02:00 Shawn Weeks <sweeks@weeksconsulting.us>:
> >>>
> >>> I'm building a rest service with the HTTP Request and Response
> Processors
> >>> to support data extracts from Hive. Since some of the extracts can be
> >>> quiet
> >>> large using the SelectHiveQL Processor isn't a performant option and
> >>> instead
> >>> I'm trying to use on demand Hive Temporary Tables to do the heavy
> lifting
> >>> via CTAS(Create Table as Select). Since GetHDFS doesn't support an
> >>> incoming
> >>> connection I'm trying to figure out another way to fetch the files Hive
> >>> creates and return them as a download in the web service. Has anyone
> else
> >>> worked out a good solution for fetching the contents of a directory
> from
> >>> HDFS as a part of larger flow?
> >>>
> >>>
> >>> Thanks
> >>>
> >>> Shawn
> >>
> >>
>

Mime
View raw message