nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From scott <tcots8...@gmail.com>
Subject Re: Simple CSV to Parquet without Hadoop
Date Tue, 21 Aug 2018 18:20:53 GMT
Matt,
After installing winutils, setting my PATH and HADOOP_HOME appropriately, I
get past that one error.
However, I still have this error:

2018-08-21 10:55:50,838 ERROR [Timer-Driven Process Thread-1]
o.a.nifi.processors.parquet.PutParquet
PutParquet[id=3e674cc6-0165-1000-d4ac-8d4b225485a2] Failed to write due to
java.io.IOException: No FileSystem for scheme: null: {}
java.io.IOException: No FileSystem for scheme: null

Do you have any suggestion how to resolve this?


On Wed, Aug 15, 2018 at 12:43 PM, Matt Burgess <mattyb149@apache.org> wrote:

> I don't think you have to install Hadoop on Windows in order to get it
> to work, just the winutils.exe and I guess put it wherever it's
> looking for it (that might be configurable via an environment variable
> or something).
>
> There are pre-built binaries [1] for various versions of Hadoop, even
> though you'll be writing to a local file system you'll want to match
> the version of winutils.exe with the version of Hadoop (usually 2.7.3
> for slightly older NiFi versions or 3.0.0 for the latest version(s) I
> think) for best results.
>
> Regards,
> Matt
>
> [1] https://github.com/steveloughran/winutils
>
> On Wed, Aug 15, 2018 at 3:23 PM scott <tcots8888@gmail.com> wrote:
> >
> > Just tested in my Centos VM, worked like a charm without Hadoop. I'll
> open a Jira bug on PutParquet, doesn't seem to run on Windows.
> > Still not sure what I can do. Converting our production Windows NiFi
> install to Docker would be a major effort.
> > Has anyone heard of a Parquet writer tool I can download and call from
> NiFi?
> >
> > On Wed, Aug 15, 2018 at 12:01 PM, Mike Thomsen <mikerthomsen@gmail.com>
> wrote:
> >>
> >> > Mike, that's a good tip. I'll test that, but unfortunately, I've
> already committed to Windows.
> >>
> >> You can run both Docker and the standard NiFi docker image on Windows.
> >>
> >> On Wed, Aug 15, 2018 at 2:52 PM scott <tcots8888@gmail.com> wrote:
> >>>
> >>> Mike, that's a good tip. I'll test that, but unfortunately, I've
> already committed to Windows.
> >>> What about a script? Is there some tool you know of that can just be
> called by NiFi to convert an input CSV file to a Parquet file?
> >>>
> >>> On Wed, Aug 15, 2018 at 8:32 AM, Mike Thomsen <mikerthomsen@gmail.com>
> wrote:
> >>>>
> >>>> Scott,
> >>>>
> >>>> You can also try Docker on Windows. Something like this should work:
> >>>>
> >>>> docker run -d --name nifi-test -v C:/nifi_temp:/opt/data_output -p
> 8080:8080 apache/nifi:latest
> >>>>
> >>>> I don't have Windows either, but Docker seems to work fine for my
> colleagues that have to use it on Windows. That should bridge C:\nifi_temp
> and /opt/data_output between host and container and remap localhost:8080 to
> the container on 8080 so you don't have to mess with a Hadoop client just
> to try out some Parquet stuff.
> >>>>
> >>>> Mike
> >>>>
> >>>> On Wed, Aug 15, 2018 at 11:20 AM scott <tcots8888@gmail.com> wrote:
> >>>>>
> >>>>> Thanks Bryan. I'll give the Hadoop client a try.
> >>>>>
> >>>>> On Wed, Aug 15, 2018 at 7:51 AM, Bryan Bende <bbende@gmail.com>
> wrote:
> >>>>>>
> >>>>>> I think there is a good chance that installing the Hadoop client
> would
> >>>>>> solve the issue, but I can't say for sure since I don't have
a
> Windows
> >>>>>> machine to test.
> >>>>>>
> >>>>>> The processor depends on the Apache Parquet Java client library
> which
> >>>>>> depends on Apache Hadoop client [1], and the Hadoop client has
a
> >>>>>> limitation on Windows where it requires something additional.
> >>>>>>
> >>>>>> [1] https://github.com/apache/parquet-mr/blob/master/
> parquet-avro/pom.xml#L62-L65
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Aug 15, 2018 at 10:16 AM, scott <tcots8888@gmail.com>
> wrote:
> >>>>>> > If I install a Hadoop client on my NiFi host, would I be
able to
> get past
> >>>>>> > this error?
> >>>>>> > I don't understand why this processor depends on Hadoop.
Other
> projects like
> >>>>>> > Drill and Spark don't have such a dependency to be able
to write
> Parquet
> >>>>>> > files.
> >>>>>> >
> >>>>>> > On Tue, Aug 14, 2018 at 2:58 PM, Juan Pablo Gardella
> >>>>>> > <gardellajuanpablo@gmail.com> wrote:
> >>>>>> >>
> >>>>>> >> It's a warning. You can ignore that.
> >>>>>> >>
> >>>>>> >> On Tue, 14 Aug 2018 at 18:53 Bryan Bende <bbende@gmail.com>
> wrote:
> >>>>>> >>>
> >>>>>> >>> Scott,
> >>>>>> >>>
> >>>>>> >>> Sorry I did not realize the Hadoop client would
be looking for
> this
> >>>>>> >>> winutils.exe when running on Windows.
> >>>>>> >>>
> >>>>>> >>> On linux and MacOS you don't need anything external
installed
> outside
> >>>>>> >>> of NiFi so I wasn't expecting this.
> >>>>>> >>>
> >>>>>> >>> Not sure if there is any other good option here
regarding
> Parquet.
> >>>>>> >>>
> >>>>>> >>> Thanks,
> >>>>>> >>>
> >>>>>> >>> Bryan
> >>>>>> >>>
> >>>>>> >>>
> >>>>>> >>> On Tue, Aug 14, 2018 at 5:31 PM, scott <tcots8888@gmail.com>
> wrote:
> >>>>>> >>> > Hi Bryan,
> >>>>>> >>> > I'm fine if I have to trick the API, but don't
I still need
> Hadoop
> >>>>>> >>> > installed
> >>>>>> >>> > somewhere? After creating the core-site.xml
as you described,
> I get the
> >>>>>> >>> > following errors:
> >>>>>> >>> >
> >>>>>> >>> > Failed to locate the winutils binary in the
hadoop binary path
> >>>>>> >>> > IOException: Could not locate executable
> null\bin\winutils.exe in the
> >>>>>> >>> > Hadoop
> >>>>>> >>> > binaries
> >>>>>> >>> > Unable to load native-hadoop library for your
platform...
> using
> >>>>>> >>> > builtin-java
> >>>>>> >>> > classes where applicable
> >>>>>> >>> > Failed to write due to java.io.IOException:
No FileSystem for
> scheme
> >>>>>> >>> >
> >>>>>> >>> > BTW, I'm using NiFi version 1.5
> >>>>>> >>> >
> >>>>>> >>> > Thanks,
> >>>>>> >>> > Scott
> >>>>>> >>> >
> >>>>>> >>> >
> >>>>>> >>> > On Tue, Aug 14, 2018 at 12:44 PM, Bryan Bende
<
> bbende@gmail.com> wrote:
> >>>>>> >>> >>
> >>>>>> >>> >> Scott,
> >>>>>> >>> >>
> >>>>>> >>> >> Unfortunately the Parquet API itself is
tied to the Hadoop
> Filesystem
> >>>>>> >>> >> object which is why NiFi can't read and
write Parquet
> directly to flow
> >>>>>> >>> >> files (i.e. they don't provide a way to
read/write to/from
> Java input
> >>>>>> >>> >> and output streams).
> >>>>>> >>> >>
> >>>>>> >>> >> The best you can do is trick the Hadoop
API into using the
> local
> >>>>>> >>> >> file-system by creating a core-site.xml
with the following:
> >>>>>> >>> >>
> >>>>>> >>> >> <configuration>
> >>>>>> >>> >>     <property>
> >>>>>> >>> >>         <name>fs.defaultFS</name>
> >>>>>> >>> >>         <value>file:///</value>
> >>>>>> >>> >>     </property>
> >>>>>> >>> >> </configuration>
> >>>>>> >>> >>
> >>>>>> >>> >> That will make PutParquet or FetchParquet
work with your
> local
> >>>>>> >>> >> file-system.
> >>>>>> >>> >>
> >>>>>> >>> >> Thanks,
> >>>>>> >>> >>
> >>>>>> >>> >> Bryan
> >>>>>> >>> >>
> >>>>>> >>> >>
> >>>>>> >>> >> On Tue, Aug 14, 2018 at 3:22 PM, scott
<tcots8888@gmail.com>
> wrote:
> >>>>>> >>> >> > Hello NiFi community,
> >>>>>> >>> >> > Is there a simple way to read CSV
files and write them out
> as
> >>>>>> >>> >> > Parquet
> >>>>>> >>> >> > files
> >>>>>> >>> >> > without Hadoop? I run NiFi on Windows
and don't have
> access to a
> >>>>>> >>> >> > Hadoop
> >>>>>> >>> >> > environment. I'm trying to write
the output of my ETL in a
> >>>>>> >>> >> > compressed
> >>>>>> >>> >> > and
> >>>>>> >>> >> > still query-able format. Is there
something I should be
> using
> >>>>>> >>> >> > instead of
> >>>>>> >>> >> > Parquet?
> >>>>>> >>> >> >
> >>>>>> >>> >> > Thanks for your time,
> >>>>>> >>> >> > Scott
> >>>>>> >>> >
> >>>>>> >>> >
> >>>>>> >
> >>>>>> >
> >>>>>
> >>>>>
> >>>
> >
>

Mime
View raw message