drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <par0...@yahoo.com.INVALID>
Subject Re: Help needed : logfile plugin multiline parsing
Date Tue, 23 Jul 2019 19:57:13 GMT
Hi Vincent,

The log (regex) plugin uses newlines a the delimiter between records and so it cannot currently
handle newlines within a record. That is, the plugin really only works for single-line messages,
or cases in which we want to ignore all but the header line (say).

If you are up for a Java coding effort, you could modify the plugin to take another config
parameter which is the record delimiter. (The text (CSV) plugin already does this.) You would
need a unique marker that gives a context-free record split. The project would welcome such
a contribution. If you made such an enhancement, you could have the plugin look for, say,
double-newline as the record delimiter.

Recall that Drill works with HDFS files. Each scan operator may be given a block of a file.
When reading the second or later block of a file, the reader must scan forward to find the
start of the next record using the record delimiter.


For now, I'd suggest transforming your file to replace newlines with some other character,
and replace any existing record delimiter with newline. Then you can use the log (regex) plugin.

That is:

[Thu May  2 00:17:50 2019]Local/ACTUAL///1/Info(1200450)

External [GLOBAL] macro [@PHASE_INPUT] registered OK


[Thu May  2 00:17:50 2019]Local/ACTUAL///1/Info(1019008)

Reading Application Definition For [ACTUAL]

Becomes:

[Thu May  2 00:17:50 2019]Local/ACTUAL///1/Info(1200450)|External [GLOBAL] macro [@PHASE_INPUT]
registered OK[Thu May  2 00:17:50 2019]Local/ACTUAL///1/Info(1019008)|Reading Application
Definition For [ACTUAL]


Maybe Charles has a better idea?

Thanks,
- Paul

 

    On Tuesday, July 23, 2019, 05:11:26 AM PDT, Vincent BENATIER <vbenatier@sp2.fr>
wrote:  
 
 Hi all,

I was if the logfile plugin can handle multiline parsing ? 

When I try my regex syntax online, it works well but I seems that the
"\\r\\n" are note recognized when trying to configure a logfile plugin in
Apache Drill.
Or perhaps I there another way to do but I could not find anything in the
documentation or in the "Learning Apache Drill" book.

Someone could help ?

Vincent

Regex syntaxes I tried
--------------------------
"(\\[.+\\])(.+\\r\\n)(.+)"
"(\\[.+\\])(.+)(\\r\\n.+)"
"(\\[.+\\])(.+) \\r\\n (.+)"

File sample
--------------
[Thu May  2 00:17:50 2019]Local/ACTUAL///1/Info(1200450)
External [GLOBAL] macro [@PHASE_INPUT] registered OK

[Thu May  2 00:17:50 2019]Local/ACTUAL///1/Info(1019008)
Reading Application Definition For [ACTUAL]

[Thu May  2 00:17:50 2019]Local/ACTUAL///1/Info(1019009)
Reading Database Definition For [Actual]

[Thu May  2 00:17:50 2019]Local/ACTUAL///1/Info(1019021)
Reading Database Mapping For [ACTUAL]

[Thu May  2 00:17:50 2019]Local/ACTUAL///1/Info(1019010)
Writing Application Definition For [ACTUAL]

[Thu May  2 00:17:50 2019]Local/ACTUAL///1/Info(1019011)
Writing Database Definition For [Actual]

  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message