drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lee, David" <David....@blackrock.com>
Subject RE: RE: Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a record
Date Tue, 28 Aug 2018 15:19:10 GMT
The other JSON format is officially JSONL.. Can we in the next version of drill in Storage
Plugins by default include jsonl in extensions??

http://jsonlines.org/

From:

    "json": {
      "type": "json",
      "extensions": [
        "json"
      ]
    },

To

    "json": {
      "type": "json",
      "extensions": [
        "json", "jsonl"
      ]
    },

After working with both JSON and JSONL, JSONL is so much easier to work with using other tools
and programming languages..

A simple linux GREP command can be used to find data, but trying to GREP a JSON file with
no line breaks just returns back a wall of text..


-----Original Message-----
From: Paul Rogers [mailto:par0328@yahoo.com.INVALID] 
Sent: Monday, August 27, 2018 5:47 PM
To: user@drill.apache.org
Subject: Re: RE: Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle
of a record

[EXTERNAL EMAIL]


Hi David,

JSON files are never splittable: there is no single-character way to find the start of a JSON
record within a file.

Drill is supposed to support two JSON formats: the array format from the earlier post, and
the non-JSON (but very common) list of objects format in this example.

Thanks,
- Paul



    On Monday, August 27, 2018, 5:38:32 PM PDT, Lee, David <David.Lee@blackrock.com>
wrote:

 Get rid of the opening and closing brackets and see if you can turn the commas into newlines..
The file needs to be splittable I think to reduce memory overhead vs parsing a giant string...

{"var1": "foo", "var2":"bar"}
{"var1": "fo", "var2": "baz"}
{"var1": "f2o", "var2": "baz2"}
{"var1": "f3o", "var2": "baz3"}
{"var1": "f4o", "var2": "baz4"}
{"var1": "f5o", "var2": "baz5"}

-----Original Message-----
From: scott [mailto:tcots8888@gmail.com]
Sent: Monday, August 27, 2018 4:59 PM
To: user@drill.apache.org
Subject: Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a record

[EXTERNAL EMAIL]


Hi All,
I'm getting an error querying some of my json files.
The error I'm getting is: Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the
middle of a record. Current token was START_ARRAY

The json files are in array format, like [ { "var1": "foo", "var2":
"bar"},{"var1": "fo", "var2": "baz"}]

I found a ticket that indicates this format is not supported by Drill yet,
DRILL-1755 <https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.apache.org_jira_browse_DRILL-2D1755&d=DwIBaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=SpeiLeBTifecUrj1SErsTRw4nAqzMxT043sp_gndNeI&m=G0Hsj4vSq2tBbv1c1dW6zC3pOzA_kSuhlQoFvFKpdJo&s=Dh8nYVKoOA8nQ3XdDmauSethwq9x4ric2_MsYMcfDdc&e=>
, but I find it hard to believe there is no workaround or solution since this was reported
4 years back. Does anyone have a solution or workaround to this problem?

Thanks,
Scott


This message may contain information that is confidential or privileged. If you are not the
intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers
for further information.  Please refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-policy
for more information about BlackRock’s Privacy Policy.

For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.

© 2018 BlackRock, Inc. All rights reserved.
Mime
View raw message