nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From scott <tcots8...@gmail.com>
Subject Re: Simple CSV to Parquet without Hadoop
Date Wed, 15 Aug 2018 14:16:36 GMT
If I install a Hadoop client on my NiFi host, would I be able to get past
this error?
I don't understand why this processor depends on Hadoop. Other projects
like Drill and Spark don't have such a dependency to be able to write
Parquet files.

On Tue, Aug 14, 2018 at 2:58 PM, Juan Pablo Gardella <
gardellajuanpablo@gmail.com> wrote:

> It's a warning. You can ignore that.
>
> On Tue, 14 Aug 2018 at 18:53 Bryan Bende <bbende@gmail.com> wrote:
>
>> Scott,
>>
>> Sorry I did not realize the Hadoop client would be looking for this
>> winutils.exe when running on Windows.
>>
>> On linux and MacOS you don't need anything external installed outside
>> of NiFi so I wasn't expecting this.
>>
>> Not sure if there is any other good option here regarding Parquet.
>>
>> Thanks,
>>
>> Bryan
>>
>>
>> On Tue, Aug 14, 2018 at 5:31 PM, scott <tcots8888@gmail.com> wrote:
>> > Hi Bryan,
>> > I'm fine if I have to trick the API, but don't I still need Hadoop
>> installed
>> > somewhere? After creating the core-site.xml as you described, I get the
>> > following errors:
>> >
>> > Failed to locate the winutils binary in the hadoop binary path
>> > IOException: Could not locate executable null\bin\winutils.exe in the
>> Hadoop
>> > binaries
>> > Unable to load native-hadoop library for your platform... using
>> builtin-java
>> > classes where applicable
>> > Failed to write due to java.io.IOException: No FileSystem for scheme
>> >
>> > BTW, I'm using NiFi version 1.5
>> >
>> > Thanks,
>> > Scott
>> >
>> >
>> > On Tue, Aug 14, 2018 at 12:44 PM, Bryan Bende <bbende@gmail.com> wrote:
>> >>
>> >> Scott,
>> >>
>> >> Unfortunately the Parquet API itself is tied to the Hadoop Filesystem
>> >> object which is why NiFi can't read and write Parquet directly to flow
>> >> files (i.e. they don't provide a way to read/write to/from Java input
>> >> and output streams).
>> >>
>> >> The best you can do is trick the Hadoop API into using the local
>> >> file-system by creating a core-site.xml with the following:
>> >>
>> >> <configuration>
>> >>     <property>
>> >>         <name>fs.defaultFS</name>
>> >>         <value>file:///</value>
>> >>     </property>
>> >> </configuration>
>> >>
>> >> That will make PutParquet or FetchParquet work with your local
>> >> file-system.
>> >>
>> >> Thanks,
>> >>
>> >> Bryan
>> >>
>> >>
>> >> On Tue, Aug 14, 2018 at 3:22 PM, scott <tcots8888@gmail.com> wrote:
>> >> > Hello NiFi community,
>> >> > Is there a simple way to read CSV files and write them out as Parquet
>> >> > files
>> >> > without Hadoop? I run NiFi on Windows and don't have access to a
>> Hadoop
>> >> > environment. I'm trying to write the output of my ETL in a compressed
>> >> > and
>> >> > still query-able format. Is there something I should be using
>> instead of
>> >> > Parquet?
>> >> >
>> >> > Thanks for your time,
>> >> > Scott
>> >
>> >
>>
>

Mime
View raw message