nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Kawamura <ijokaruma...@gmail.com>
Subject Re: Nulls in input data throwing exceptions when using QueryRecord
Date Thu, 08 Nov 2018 02:42:27 GMT
Hi Mandeep,

Thanks for reporting the issue and detailed explanation. That's very helpful!
I was able to reproduce the issue and found a possible solution.
Filed a JIRA, a PR will be submitted shortly to fix it.
https://issues.apache.org/jira/browse/NIFI-5802

Thanks,
Koji
On Wed, Nov 7, 2018 at 8:54 PM Mandeep Gill <mandeep@nstack.com> wrote:
>
> Hi,
>
> We're hitting a couple of issues working with nulls when using QueryRecord using both
NiFi 1.7.1 and 1.8.0.
>
> Things work as expected for strings, however when using other primitive types as defined
by the avro schema, such as boolean, long, and double, null values in the input data aren't
converted to NULLs within the SQL engine / Calcite. Instead they appear to remain as java
null values and throw NPEs when attempting to use them within a query or simply return them
as the output.
>
> To give some examples, given the following record data and schema (tested using both
JSON and Avro record reader/writers)
>
> [ {  "str_test" : "hello1",  "bool_test" : true }, {  "str_test" : null,  "bool_test"
: null } ]
>
> {
>   "type": "record",
>   "name": "schema",
>   "fields": [
>     {
>       "name": "str_test",
>       "type": [ "string", "null" ],
>       "default": null
>     },
>     {
>       "name": "bool_test",
>       "type": [ "boolean", "null" ],
>       "default": null
>     }
>   ]
> }
>
> The following queries return the empty resultset,
>
> select 'res' as res from FLOWFILE where bool_test IS NULL
> select 'res' as res from FLOWFILE where bool_test IS UNKNOWN
>
> and the query below returns a resultset of count 2,
>
> select 'res' from FLOWFILE where bool_test IS NOT NULL
>
> The query below works as expected, suggesting things work fine for strings
>
> select 'res' as res from FLOWFILE where str_test IS NULL
>
> However, finally the following query throws a NullPointerException (see [1]) on trying
to convert the null to a boolean within the output writer
>
> select * from FLOWFILE where bool_test IS NOT NULL
>
> The null values for these types seem to be treated as distinct to the NULLs within the
SQL engine, as the following query returns the empty resultset.
>
> select 'res' as res from FLOWFILE where CAST(NULL as boolean) IS DISTINCT FROM bool_test
>
> and the following query gives an RuntimeException (see [2]),
>
> select (COALESCE(bool_test, TRUE)) as res from flowfile
>
> Given all this we're unable to make use of datasets with nulls, are nulls only supported
for strings or is there perhaps something we're doing wrong here in our setup/config. One
thing we've noticed when running a simple "SELECT * from FLOWFILE" returns a nullable type
for strings in the output avro schema but not for other primitives, even if they were nullable
in the input schema - which could be related.
>
> Cheers,
> Mandeep
>
> [1] org.apache.nifi.processor.exception.ProcessException: IOException thrown from QueryRecord[id=43ee29ff-0166-1000-28bd-06dd07c1425d]:
java.io.IOException: org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.NullPointerException:
null of boolean in field bool_test of org.apache.nifi.nifiRecord
> at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2667)
> at org.apache.nifi.processors.standard.QueryRecord.onTrigger(QueryRecord.java:309)
> at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
> at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
> at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: org.apache.avro.file.DataFileWriter$AppendWriteException:
java.lang.NullPointerException: null of boolean in field bool_test of org.apache.nifi.nifiRecord
> at org.apache.nifi.processors.standard.QueryRecord$1.process(QueryRecord.java:327)
> at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2648)
> ... 12 common frames omitted
> Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.NullPointerException:
null of boolean in field bool_test of org.apache.nifi.nifiRecord
> at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:308)
> at org.apache.nifi.avro.WriteAvroResultWithSchema.writeRecord(WriteAvroResultWithSchema.java:61)
> at org.apache.nifi.serialization.AbstractRecordSetWriter.write(AbstractRecordSetWriter.java:59)
> at org.apache.nifi.serialization.AbstractRecordSetWriter.write(AbstractRecordSetWriter.java:52)
> at org.apache.nifi.processors.standard.QueryRecord$1.process(QueryRecord.java:324)
> ... 13 common frames omitted
> Caused by: java.lang.NullPointerException: null of boolean in field bool_test of org.apache.nifi.nifiRecord
> at org.apache.avro.generic.GenericDatumWriter.npe(GenericDatumWriter.java:132)
> at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:126)
> at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
> at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:60)
> at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:302)
> ... 17 common frames omitted
> Caused by: java.lang.NullPointerException: null
> at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:121)
> at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
> at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:153)
> at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:143)
> at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:105)
> ... 20 common frames omitted
>
>
> [2] org.apache.nifi.processor.exception.ProcessException: IOException thrown from QueryRecord[id=43ee29ff-0166-1000-28bd-06dd07c1425d]:
java.io.IOException: java.lang.RuntimeException: Cannot convert null to boolean
> at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2667)
> at org.apache.nifi.processors.standard.QueryRecord.onTrigger(QueryRecord.java:309)
> at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
> at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
> at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: java.lang.RuntimeException: Cannot convert null to boolean
> at org.apache.nifi.processors.standard.QueryRecord$1.process(QueryRecord.java:327)
> at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2648)
> ... 12 common frames omitted
> Caused by: java.lang.RuntimeException: Cannot convert null to boolean
> at org.apache.calcite.runtime.SqlFunctions.cannotConvert(SqlFunctions.java:1460)
> at org.apache.calcite.runtime.SqlFunctions.toBoolean(SqlFunctions.java:1483)
> at Baz$1$1.current(Unknown Source)
> at org.apache.calcite.linq4j.Linq4j$EnumeratorIterator.next(Linq4j.java:684)
> at org.apache.calcite.avatica.util.IteratorCursor.next(IteratorCursor.java:46)
> at org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:217)
> at org.apache.nifi.serialization.record.ResultSetRecordSet.next(ResultSetRecordSet.java:84)
> at org.apache.nifi.serialization.AbstractRecordSetWriter.write(AbstractRecordSetWriter.java:51)
> at org.apache.nifi.processors.standard.QueryRecord$1.process(QueryRecord.java:324)
> ... 13 common frames omitted
>
> --
>
> Mandeep Gill
>
> nstack.com / +44 7961822575

Mime
View raw message