nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mandeep Gill <mand...@nstack.com>
Subject Re: Nulls in input data throwing exceptions when using QueryRecord
Date Fri, 09 Nov 2018 15:15:40 GMT
Hi there,

Thanks so much for your help and the quick turnaround! Can confirm that the
patch works well and resolves the main issues we were hitting with nulls.

We have been hitting a few more issues with QueryProcessor that I've listed
below - there's a couple and I'm happy to create bug reports for each and
help fix them if/where we can - where would be the best place to submit
them going forwards?

1. When using the JSONRecordSetWriter service with QueryProcessor, strings
are truncated to their first character, so "select 'hello' as world"
returns "[{"world": "h"}]"
2. count work slightly odd with nulls, so given the following data,

      {"id": "129984bf31e025599c0e9232df5c7b7c", "price": 19.47},
      {"id": "a6cfcb7c7178b9d18c50f2f2dc41dab3", "price": null}

the query "select count(*) as c from flowfile where price is null"
returns [{"c":
0}]. This may be a upstream Calcite issue however.

3. When returning the result of any SQL functions, e.g. "select count(*)
from flowfile", the auto-generated fields names, e.g. "EXPR$1" are not
valid Avro field names and so causes an exception [1] unless explicitly
renamed as per the previous example

4. When returning a single-width column, any rows that are null are
silently discarded. As an example, given the data as above, the query "select
price from flowfile" will drop the second row.

Please let me know if can provide any further information, we'd love to
help and start contributing although we're still getting familiar with Java
and the NiFi codebase.

Cheers!
Mandeep

[1] Using JsonRecordSetWriter:
org.apache.nifi.processor.exception.ProcessException: IOException thrown
from QueryRecord[id=f3ba8f30-0166-1000-d46d-d8f7ddfd96b6]:
java.io.IOException: org.apache.avro.SchemaParseException: Illegal
character in: EXPR$0
at
org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2667)
at
org.apache.nifi.processors.standard.QueryRecord.onTrigger(QueryRecord.java:309)
at
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
at
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
at
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: org.apache.avro.SchemaParseException:
Illegal character in: EXPR$0
at
org.apache.nifi.processors.standard.QueryRecord$1.process(QueryRecord.java:327)
at
org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2648)
... 12 common frames omitted
Caused by: org.apache.avro.SchemaParseException: Illegal character in:
EXPR$0
at org.apache.avro.Schema.validateName(Schema.java:1151)
at org.apache.avro.Schema.access$200(Schema.java:81)
at org.apache.avro.Schema$Field.<init>(Schema.java:403)
at org.apache.avro.Schema$Field.<init>(Schema.java:423)
at org.apache.avro.Schema$Field.<init>(Schema.java:415)
at org.apache.nifi.avro.AvroTypeUtil.buildAvroField(AvroTypeUtil.java:122)
at org.apache.nifi.avro.AvroTypeUtil.buildAvroSchema(AvroTypeUtil.java:113)
at org.apache.nifi.avro.AvroTypeUtil.extractAvroSchema(AvroTypeUtil.java:93)
at
org.apache.nifi.schema.access.WriteAvroSchemaAttributeStrategy.getAttributes(WriteAvroSchemaAttributeStrategy.java:62)
at
org.apache.nifi.json.WriteJsonResult.writeRecord(WriteJsonResult.java:137)
at
org.apache.nifi.serialization.AbstractRecordSetWriter.write(AbstractRecordSetWriter.java:59)
at
org.apache.nifi.serialization.AbstractRecordSetWriter.write(AbstractRecordSetWriter.java:52)
at
org.apache.nifi.processors.standard.QueryRecord$1.process(QueryRecord.java:324)
... 13 common frames omitted


On Thu, 8 Nov 2018 at 21:46 Pierre Villard <pierre.villard.fr@gmail.com>
wrote:

> Hi Mandeep,
>
> Thanks for reporting this issue! Koji filed the JIRA [1] and submitted a
> PR for it [2]. I just merged it into master and it will be released with
> NiFi 1.9.0. You can also build the standard processors NAR from the master
> branch if you need the fix quickly.
>
> [1] https://issues.apache.org/jira/browse/NIFI-5802
> [2] https://github.com/apache/nifi/pull/3158
>
> Pierre
>
> Le mer. 7 nov. 2018 à 12:54, Mandeep Gill <mandeep@nstack.com> a écrit :
>
>> Hi,
>>
>> We're hitting a couple of issues working with nulls when using
>> QueryRecord using both NiFi 1.7.1 and 1.8.0.
>>
>> Things work as expected for strings, however when using other primitive
>> types as defined by the avro schema, such as boolean, long, and double,
>> null values in the input data aren't converted to NULLs within the SQL
>> engine / Calcite. Instead they appear to remain as java null values and
>> throw NPEs when attempting to use them within a query or simply return them
>> as the output.
>>
>> To give some examples, given the following record data and schema (tested
>> using both JSON and Avro record reader/writers)
>>
>> [ {  "str_test" : "hello1",  "bool_test" : true }, {  "str_test" : null,  "bool_test"
: null } ]
>>
>> {
>>   "type": "record",
>>   "name": "schema",
>>   "fields": [
>>     {
>>       "name": "str_test",
>>       "type": [ "string", "null" ],
>>       "default": null
>>     },
>>     {
>>       "name": "bool_test",
>>       "type": [ "boolean", "null" ],
>>       "default": null
>>     }
>>   ]
>> }
>>
>> The following queries return the empty resultset,
>>
>> select 'res' as res from FLOWFILE where bool_test IS NULL
>> select 'res' as res from FLOWFILE where bool_test IS UNKNOWN
>>
>> and the query below returns a resultset of count 2,
>>
>> select 'res' from FLOWFILE where bool_test IS NOT NULL
>>
>> The query below works as expected, suggesting things work fine for strings
>>
>> select 'res' as res from FLOWFILE where str_test IS NULL
>>
>> However, finally the following query throws a NullPointerException (see
>> [1]) on trying to convert the null to a boolean within the output writer
>>
>> select * from FLOWFILE where bool_test IS NOT NULL
>>
>> The null values for these types seem to be treated as distinct to the
>> NULLs within the SQL engine, as the following query returns the empty
>> resultset.
>>
>> select 'res' as res from FLOWFILE where CAST(NULL as boolean) IS DISTINCT FROM bool_test
>>
>> and the following query gives an RuntimeException (see [2]),
>>
>> select (COALESCE(bool_test, TRUE)) as res from flowfile
>>
>> Given all this we're unable to make use of datasets with nulls, are nulls
>> only supported for strings or is there perhaps something we're doing wrong
>> here in our setup/config. One thing we've noticed when running a simple
>> "SELECT * from FLOWFILE" returns a nullable type for strings in the output
>> avro schema but not for other primitives, even if they were nullable in the
>> input schema - which could be related.
>>
>> Cheers,
>> Mandeep
>>
>> [1] org.apache.nifi.processor.exception.ProcessException: IOException
>> thrown from QueryRecord[id=43ee29ff-0166-1000-28bd-06dd07c1425d]:
>> java.io.IOException:
>> org.apache.avro.file.DataFileWriter$AppendWriteException:
>> java.lang.NullPointerException: null of boolean in field bool_test of
>> org.apache.nifi.nifiRecord
>> at
>> org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2667)
>> at
>> org.apache.nifi.processors.standard.QueryRecord.onTrigger(QueryRecord.java:309)
>> at
>> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
>> at
>> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
>> at
>> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
>> at
>> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>> Caused by: java.io.IOException:
>> org.apache.avro.file.DataFileWriter$AppendWriteException:
>> java.lang.NullPointerException: null of boolean in field bool_test of
>> org.apache.nifi.nifiRecord
>> at
>> org.apache.nifi.processors.standard.QueryRecord$1.process(QueryRecord.java:327)
>> at
>> org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2648)
>> ... 12 common frames omitted
>> Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException:
>> java.lang.NullPointerException: null of boolean in field bool_test of
>> org.apache.nifi.nifiRecord
>> at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:308)
>> at
>> org.apache.nifi.avro.WriteAvroResultWithSchema.writeRecord(WriteAvroResultWithSchema.java:61)
>> at
>> org.apache.nifi.serialization.AbstractRecordSetWriter.write(AbstractRecordSetWriter.java:59)
>> at
>> org.apache.nifi.serialization.AbstractRecordSetWriter.write(AbstractRecordSetWriter.java:52)
>> at
>> org.apache.nifi.processors.standard.QueryRecord$1.process(QueryRecord.java:324)
>> ... 13 common frames omitted
>> Caused by: java.lang.NullPointerException: null of boolean in field
>> bool_test of org.apache.nifi.nifiRecord
>> at
>> org.apache.avro.generic.GenericDatumWriter.npe(GenericDatumWriter.java:132)
>> at
>> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:126)
>> at
>> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
>> at
>> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:60)
>> at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:302)
>> ... 17 common frames omitted
>> Caused by: java.lang.NullPointerException: null
>> at
>> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:121)
>> at
>> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
>> at
>> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:153)
>> at
>> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:143)
>> at
>> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:105)
>> ... 20 common frames omitted
>>
>>
>> [2] org.apache.nifi.processor.exception.ProcessException: IOException
>> thrown from QueryRecord[id=43ee29ff-0166-1000-28bd-06dd07c1425d]:
>> java.io.IOException: java.lang.RuntimeException: Cannot convert null to
>> boolean
>> at
>> org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2667)
>> at
>> org.apache.nifi.processors.standard.QueryRecord.onTrigger(QueryRecord.java:309)
>> at
>> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
>> at
>> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
>> at
>> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
>> at
>> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>> Caused by: java.io.IOException: java.lang.RuntimeException: Cannot
>> convert null to boolean
>> at
>> org.apache.nifi.processors.standard.QueryRecord$1.process(QueryRecord.java:327)
>> at
>> org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2648)
>> ... 12 common frames omitted
>> Caused by: java.lang.RuntimeException: Cannot convert null to boolean
>> at
>> org.apache.calcite.runtime.SqlFunctions.cannotConvert(SqlFunctions.java:1460)
>> at
>> org.apache.calcite.runtime.SqlFunctions.toBoolean(SqlFunctions.java:1483)
>> at Baz$1$1.current(Unknown Source)
>> at
>> org.apache.calcite.linq4j.Linq4j$EnumeratorIterator.next(Linq4j.java:684)
>> at
>> org.apache.calcite.avatica.util.IteratorCursor.next(IteratorCursor.java:46)
>> at
>> org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:217)
>> at
>> org.apache.nifi.serialization.record.ResultSetRecordSet.next(ResultSetRecordSet.java:84)
>> at
>> org.apache.nifi.serialization.AbstractRecordSetWriter.write(AbstractRecordSetWriter.java:51)
>> at
>> org.apache.nifi.processors.standard.QueryRecord$1.process(QueryRecord.java:324)
>> ... 13 common frames omitted
>>
>> --
>>
>> Mandeep Gill
>>
>> nstack.com <http://www.nstack.com/> / +44 7961822575
>> <+44%207961%20822575>
>>
> --

Mandeep Gill

nstack.com <http://www.nstack.com/> / +44 7961822575

Mime
View raw message