I'm recently run drill to explore data from hadoop cluster. But, I have
problem with when running the query againts our data source. The query
always failed due to malformed json data. Because, the data itself pretty
raw, It maybe contains some malformed json format in and there. Its
difficult to do cleaning itself due to the data size (around hundreds of
gigs text file).
My question is, Is there anyway to exclude/skip the malformed records, and
make it into separate result, just like the spark/shark do? How to solve
this problem elegantly?
Thank you.
Regards,
Fritz
|