nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sudeep mishra <sudeepshekh...@gmail.com>
Subject Re: How to validate records in Hadoop using NiFi?
Date Mon, 11 Jan 2016 06:42:10 GMT
Thank you Joe.

Sqoop to HDFS data load is outside the NiFi flow. Once the data is pushed
to HDFS then I have to process each record and perform validations.

By Validation i meant that we will be picking a particular column for each
record store in HDFS and the performing a SQL query against another
database.

On Sun, Jan 10, 2016 at 9:17 AM, Joe Witt <joe.witt@gmail.com> wrote:

> Hello Sudeep,
>
> "Which NiFi processor can I use to split each record (separated by a
> new line character)"
>
>   For this the SplitText processor is rather helpful if you want to
> split each line.  I recommend you do two SplitText processors in a
> chain where one splits on every 1000 lines for example and then the
> next one splits each line.  As long as you have back-pressure setup
> this means you could split arbitrarily larger (in terms of number of
> lines) source files and have good behavior.
>
> ..."and perform validations?"
>
>   Consider if you want to validate each line in a text file and route
> valid lines one way and invalid lines another way.  If this is the
> case then you may be able to avoid using SplitText and simply use
> RouteText instead as it can operate on the original file in a line by
> line manner and perform expression based validation.  This would
> operate in bulk and be quite efficient.
>
> "For validations I want to verify a particular column value for each
> record using a SQL query"
>
>   Our ExecuteSQL processor is designed for executing SQL against a
> JDBC accessible database.  It is not helpful at this point for
> executing queries on line oriented data even if that data were valid
> DML or something.  Interesting idea but not something we support at
> this time.
>
> I'm interested to understand your case more if you don't mind though.
> You mention you're getting data from Sqoop into HDFS.  How is NiFi
> involved in that flow - is it after data lands in HDFS you're pulling
> it into NiFi?
>
> Thanks
> Joe
>
> On Sat, Jan 9, 2016 at 10:32 PM, sudeep mishra <sudeepshekharm@gmail.com>
> wrote:
> > Hi,
> >
> > I am pushing some database records into HDFS using Sqoop.
> >
> > I want to perform some validations on each record in the HDFS data. Which
> > NiFi processor can I use to split each record (separated by a new line
> > character) and perform validations?
> >
> > For validations I want to verify a particular column value for each
> record
> > using a SQL query. I can see an ExecuteQuery processor. How can I
> > dynamically pass query parameters to it. Also is there a way to execute
> the
> > queries in bulk rather for each record.
> >
> > Kindly suggest.
> >
> > Apprecuate your help.
> >
> >
> > Thanks & Regards,
> >
> > Sudeep Shekhar Mishra
> >
> >
> >
> >
> >
> > --
> > Thanks & Regards,
> >
> > Sudeep Shekhar Mishra
> >
> > +91-9167519029
> > sudeepshekharm@gmail.com
>



-- 
Thanks & Regards,

Sudeep Shekhar Mishra

+91-9167519029
sudeepshekharm@gmail.com

Mime
View raw message