nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Burgess <mattyb...@apache.org>
Subject Re: [EXT] Re: Convert JSON to single line
Date Thu, 24 Jan 2019 22:13:22 GMT
If you have fairly straightforward JSON, you can use InferAvroSchema
first, it will write an attribute to the flow file called
"avro.schema", then in your JsonTreeReader you can specify the
strategy as "Use Schema Text" and keep the default value of the Schema
Text property (which is "${avro.schema}").

Another (likely slower) way is to use a scripting processor, Groovy
has a JsonOutput class that will write the JSON on a single line by
default. The script is only a few lines in ExecuteScript (which can be
slow), but here's a full script you can use in InvokeScriptedProcessor
(the faster scripting processor), it's got a lot of boilerplate in it,
the two main methods are JsonSlurper.parse() and JsonOutput.toJson():

import groovy.json.*

class GroovyProcessor implements Processor {
    def REL_SUCCESS = new
Relationship.Builder().name("success").description('FlowFiles that
were successfully processed are routed here').build()
    def REL_FAILURE = new
Relationship.Builder().name("failure").description('FlowFiles that
were not successfully processed are routed here').build()
    def ComponentLog log
    void initialize(ProcessorInitializationContext context) { log =
context.logger }
    Set<Relationship> getRelationships() { return [REL_FAILURE,
REL_SUCCESS] as Set }
    Collection<ValidationResult> validate(ValidationContext context) { null }
    PropertyDescriptor getPropertyDescriptor(String name) { null }
    void onPropertyModified(PropertyDescriptor descriptor, String
oldValue, String newValue) { }
    List<PropertyDescriptor> getPropertyDescriptors() { null }
    String getIdentifier() { null }
    void onTrigger(ProcessContext context, ProcessSessionFactory
sessionFactory) throws ProcessException {
        def session = sessionFactory.createSession()
        try {
            def flowFile = session.get()
            if(!flowFile) return
            try {
                def inStream = session.read(flowFile)
                def jsonObj = new groovy.json.JsonSlurper().parse(inStream)
                inStream.close()
                flowFile = session.write(flowFile, {outStream ->
outStream.write(JsonOutput.toJson(jsonObj).bytes) } as
OutputStreamCallback)
                session.transfer(flowFile, REL_SUCCESS)
            } catch(e) {
                log.error('Couldn\'t process', e)
                session.transfer(flowFile, REL_FAILURE)
            }
            session.commit()
        } catch (final Throwable t) {
            log.error('{} failed to process due to {}; rolling back
session', [this, t] as Object[])
            session.rollback(true)
            throw t
}}}
processor = new GroovyProcessor()


Regards,
Matt

On Thu, Jan 24, 2019 at 3:02 PM Vincent, Mike <mvincent@mitre.org> wrote:
>
> Running 1.8 so I see MergeRecord.  I think I need to use JSONTreeReader as the reader,
but it requires a schema.  I don't have a schema for the JSON; wondering why I can't just
take the JSON I'm already receiving and not pretty print it - squash it to one line?  I'm
new to NiFI so please pardon what may be an elementary request.
>
> Cheers,
>
> Michael J. Vincent
> Lead Network Systems Engineer | The MITRE Corporation | Network Technology & Security
(T864) | +1 (781) 271-8381
>
> -----Original Message-----
> From: Matt Burgess <mattyb149@apache.org>
> Sent: Thursday, January 24, 2019 1:17 PM
> To: users@nifi.apache.org
> Subject: [EXT] Re: Convert JSON to single line
>
> Michael,
>
> As of NiFi 1.7.0, if you use MergeRecord instead of MergeContent, you can choose a JsonRecordSetWriter
with "Pretty Print JSON" set to false and "Output Grouping" set to "One Line Per Object",
that should output one JSON per line (as well as merge individual flow files/records together).
Any record-based processor would work in that case, so if MergeRecord isn't an option, then
ConvertRecord will work just as well. In either case, the JsonRecordSetWriter should be set
to inherit the schema from the reader.
>
> Regards,
> Matt
>
> On Thu, Jan 24, 2019 at 1:09 PM Vincent, Mike <mvincent@mitre.org> wrote:
> >
> > I’m ingesting Windows Event logs with ConsumeWIndowsEventLog and then using TransformXML
according to:
> >
> >
> >
> > https://community.hortonworks.com/articles/29474/nifi-converting-xml-t
> > o-json.html
> >
> >
> >
> > To make them JSON.  The flow continues to MergeContent, CompressContent and then
PutS3Object.
> >
> >
> >
> > The issue I’m having is when examining the content of the uploaded files (i.e,
download and unzip them), they are JSON pretty-printed structures rather than single-line
JSON “records”.
> >
> >
> >
> > For example:
> >
> >
> >
> > {
> >
> >     Field: 1,
> >
> >     Field: 2
> >
> > }
> >
> >
> >
> > I don’t want that, I *want*:
> >
> >
> >
> > {Field:1,Field:2}
> >
> >
> >
> > What Convert / Transform / other processor can I use and how to configure to squish
the JSON structure to a single line record *before* MergeContent?
> >
> >
> >
> > Cheers,
> >
> >
> >
> > Michael J. Vincent
> >
> > Lead Network Systems Engineer | The MITRE Corporation | Network
> > Technology & Security (T864) | +1 (781) 271-8381
> >
> >

Mime
View raw message