nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From scott <tcots8...@gmail.com>
Subject Re: Simple CSV to Parquet without Hadoop
Date Wed, 15 Aug 2018 19:23:00 GMT
Just tested in my Centos VM, worked like a charm without Hadoop. I'll open
a Jira bug on PutParquet, doesn't seem to run on Windows.
Still not sure what I can do. Converting our production Windows NiFi
install to Docker would be a major effort.
Has anyone heard of a Parquet writer tool I can download and call from NiFi?

On Wed, Aug 15, 2018 at 12:01 PM, Mike Thomsen <mikerthomsen@gmail.com>
wrote:

> > Mike, that's a good tip. I'll test that, but unfortunately, I've already
> committed to Windows.
>
> You can run both Docker and the standard NiFi docker image on Windows.
>
> On Wed, Aug 15, 2018 at 2:52 PM scott <tcots8888@gmail.com> wrote:
>
>> Mike, that's a good tip. I'll test that, but unfortunately, I've already
>> committed to Windows.
>> What about a script? Is there some tool you know of that can just be
>> called by NiFi to convert an input CSV file to a Parquet file?
>>
>> On Wed, Aug 15, 2018 at 8:32 AM, Mike Thomsen <mikerthomsen@gmail.com>
>> wrote:
>>
>>> Scott,
>>>
>>> You can also try Docker on Windows. Something like this should work:
>>>
>>> docker run -d --name nifi-test -v C:/nifi_temp:/opt/data_output -p
>>> 8080:8080 apache/nifi:latest
>>>
>>> I don't have Windows either, but Docker seems to work fine for my
>>> colleagues that have to use it on Windows. That should bridge C:\nifi_temp
>>> and /opt/data_output between host and container and remap localhost:8080 to
>>> the container on 8080 so you don't have to mess with a Hadoop client just
>>> to try out some Parquet stuff.
>>>
>>> Mike
>>>
>>> On Wed, Aug 15, 2018 at 11:20 AM scott <tcots8888@gmail.com> wrote:
>>>
>>>> Thanks Bryan. I'll give the Hadoop client a try.
>>>>
>>>> On Wed, Aug 15, 2018 at 7:51 AM, Bryan Bende <bbende@gmail.com> wrote:
>>>>
>>>>> I think there is a good chance that installing the Hadoop client would
>>>>> solve the issue, but I can't say for sure since I don't have a Windows
>>>>> machine to test.
>>>>>
>>>>> The processor depends on the Apache Parquet Java client library which
>>>>> depends on Apache Hadoop client [1], and the Hadoop client has a
>>>>> limitation on Windows where it requires something additional.
>>>>>
>>>>> [1] https://github.com/apache/parquet-mr/blob/master/
>>>>> parquet-avro/pom.xml#L62-L65
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Aug 15, 2018 at 10:16 AM, scott <tcots8888@gmail.com> wrote:
>>>>> > If I install a Hadoop client on my NiFi host, would I be able to
get
>>>>> past
>>>>> > this error?
>>>>> > I don't understand why this processor depends on Hadoop. Other
>>>>> projects like
>>>>> > Drill and Spark don't have such a dependency to be able to write
>>>>> Parquet
>>>>> > files.
>>>>> >
>>>>> > On Tue, Aug 14, 2018 at 2:58 PM, Juan Pablo Gardella
>>>>> > <gardellajuanpablo@gmail.com> wrote:
>>>>> >>
>>>>> >> It's a warning. You can ignore that.
>>>>> >>
>>>>> >> On Tue, 14 Aug 2018 at 18:53 Bryan Bende <bbende@gmail.com>
wrote:
>>>>> >>>
>>>>> >>> Scott,
>>>>> >>>
>>>>> >>> Sorry I did not realize the Hadoop client would be looking
for this
>>>>> >>> winutils.exe when running on Windows.
>>>>> >>>
>>>>> >>> On linux and MacOS you don't need anything external installed
>>>>> outside
>>>>> >>> of NiFi so I wasn't expecting this.
>>>>> >>>
>>>>> >>> Not sure if there is any other good option here regarding
Parquet.
>>>>> >>>
>>>>> >>> Thanks,
>>>>> >>>
>>>>> >>> Bryan
>>>>> >>>
>>>>> >>>
>>>>> >>> On Tue, Aug 14, 2018 at 5:31 PM, scott <tcots8888@gmail.com>
>>>>> wrote:
>>>>> >>> > Hi Bryan,
>>>>> >>> > I'm fine if I have to trick the API, but don't I still
need
>>>>> Hadoop
>>>>> >>> > installed
>>>>> >>> > somewhere? After creating the core-site.xml as you
described, I
>>>>> get the
>>>>> >>> > following errors:
>>>>> >>> >
>>>>> >>> > Failed to locate the winutils binary in the hadoop
binary path
>>>>> >>> > IOException: Could not locate executable null\bin\winutils.exe
>>>>> in the
>>>>> >>> > Hadoop
>>>>> >>> > binaries
>>>>> >>> > Unable to load native-hadoop library for your platform...
using
>>>>> >>> > builtin-java
>>>>> >>> > classes where applicable
>>>>> >>> > Failed to write due to java.io.IOException: No FileSystem
for
>>>>> scheme
>>>>> >>> >
>>>>> >>> > BTW, I'm using NiFi version 1.5
>>>>> >>> >
>>>>> >>> > Thanks,
>>>>> >>> > Scott
>>>>> >>> >
>>>>> >>> >
>>>>> >>> > On Tue, Aug 14, 2018 at 12:44 PM, Bryan Bende <bbende@gmail.com>
>>>>> wrote:
>>>>> >>> >>
>>>>> >>> >> Scott,
>>>>> >>> >>
>>>>> >>> >> Unfortunately the Parquet API itself is tied to
the Hadoop
>>>>> Filesystem
>>>>> >>> >> object which is why NiFi can't read and write Parquet
directly
>>>>> to flow
>>>>> >>> >> files (i.e. they don't provide a way to read/write
to/from Java
>>>>> input
>>>>> >>> >> and output streams).
>>>>> >>> >>
>>>>> >>> >> The best you can do is trick the Hadoop API into
using the local
>>>>> >>> >> file-system by creating a core-site.xml with the
following:
>>>>> >>> >>
>>>>> >>> >> <configuration>
>>>>> >>> >>     <property>
>>>>> >>> >>         <name>fs.defaultFS</name>
>>>>> >>> >>         <value>file:///</value>
>>>>> >>> >>     </property>
>>>>> >>> >> </configuration>
>>>>> >>> >>
>>>>> >>> >> That will make PutParquet or FetchParquet work
with your local
>>>>> >>> >> file-system.
>>>>> >>> >>
>>>>> >>> >> Thanks,
>>>>> >>> >>
>>>>> >>> >> Bryan
>>>>> >>> >>
>>>>> >>> >>
>>>>> >>> >> On Tue, Aug 14, 2018 at 3:22 PM, scott <tcots8888@gmail.com>
>>>>> wrote:
>>>>> >>> >> > Hello NiFi community,
>>>>> >>> >> > Is there a simple way to read CSV files and
write them out as
>>>>> >>> >> > Parquet
>>>>> >>> >> > files
>>>>> >>> >> > without Hadoop? I run NiFi on Windows and
don't have access
>>>>> to a
>>>>> >>> >> > Hadoop
>>>>> >>> >> > environment. I'm trying to write the output
of my ETL in a
>>>>> >>> >> > compressed
>>>>> >>> >> > and
>>>>> >>> >> > still query-able format. Is there something
I should be using
>>>>> >>> >> > instead of
>>>>> >>> >> > Parquet?
>>>>> >>> >> >
>>>>> >>> >> > Thanks for your time,
>>>>> >>> >> > Scott
>>>>> >>> >
>>>>> >>> >
>>>>> >
>>>>> >
>>>>>
>>>>
>>>>
>>

Mime
View raw message