drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <par0...@yahoo.com.INVALID>
Subject Re: RE: Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a record
Date Tue, 28 Aug 2018 06:58:06 GMT
Hi Scott,

I created a file, "test.json", using the data from your e-mail:

[ { "var1": "foo", "var2":"bar"},{"var1": "fo", "var2": "baz"}]

The oldest build I have readily available is Drill 1.13. I ran that as a server, then connected
with sqlline as a client. I ran a query:

select * from `test.json`;
+-------+-------+| var1  | var2  |+-------+-------+| foo   | bar   || fo    | baz  
|+-------+-------+

I can try with Drill 1.12, once I find and download it. Or, you can try with Drill 1.14 (the
latest release.)

I do wonder, however, if we are talking about the same thing. My test puts your JSON in a
JSON file with ".json" extension so that Drill choses the JSON parser. I'm using default JSON
(session) options.

Is this what you are doing? Or, is your JSON coming from some other source? Kafka? A field
from a CSV file, say?

Thanks,
- Paul

 

    On Monday, August 27, 2018, 10:31:00 PM PDT, scott <tcots8888@gmail.com> wrote:
 
 
 Paul,
I'm using version 1.12. Can you tell me what version you think that was
fixed in? The ticket I referenced is still open, with no comments.

Scott

On Mon, Aug 27, 2018 at 5:47 PM Paul Rogers <par0328@yahoo.com.invalid>
wrote:

> Hi David,
>
> JSON files are never splittable: there is no single-character way to find
> the start of a JSON record within a file.
>
> Drill is supposed to support two JSON formats: the array format from the
> earlier post, and the non-JSON (but very common) list of objects format in
> this example.
>
> Thanks,
> - Paul
>
>
>
>    On Monday, August 27, 2018, 5:38:32 PM PDT, Lee, David <
> David.Lee@blackrock.com> wrote:
>
>  Get rid of the opening and closing brackets and see if you can turn the
> commas into newlines.. The file needs to be splittable I think to reduce
> memory overhead vs parsing a giant string...
>
> {"var1": "foo", "var2":"bar"}
> {"var1": "fo", "var2": "baz"}
> {"var1": "f2o", "var2": "baz2"}
> {"var1": "f3o", "var2": "baz3"}
> {"var1": "f4o", "var2": "baz4"}
> {"var1": "f5o", "var2": "baz5"}
>
> -----Original Message-----
> From: scott [mailto:tcots8888@gmail.com]
> Sent: Monday, August 27, 2018 4:59 PM
> To: user@drill.apache.org
> Subject: Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the
> middle of a record
>
> [EXTERNAL EMAIL]
>
>
> Hi All,
> I'm getting an error querying some of my json files.
> The error I'm getting is: Error: DATA_READ ERROR: Error parsing JSON -
> Cannot read from the middle of a record. Current token was START_ARRAY
>
> The json files are in array format, like [ { "var1": "foo", "var2":
> "bar"},{"var1": "fo", "var2": "baz"}]
>
> I found a ticket that indicates this format is not supported by Drill yet,
> DRILL-1755 <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.apache.org_jira_browse_DRILL-2D1755&d=DwIBaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=SpeiLeBTifecUrj1SErsTRw4nAqzMxT043sp_gndNeI&m=G0Hsj4vSq2tBbv1c1dW6zC3pOzA_kSuhlQoFvFKpdJo&s=Dh8nYVKoOA8nQ3XdDmauSethwq9x4ric2_MsYMcfDdc&e=>
> , but I find it hard to believe there is no workaround or solution since
> this was reported
> 4 years back. Does anyone have a solution or workaround to this problem?
>
> Thanks,
> Scott
>
>
> This message may contain information that is confidential or privileged.
> If you are not the intended recipient, please advise the sender immediately
> and delete this message. See
> http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for
> further information.  Please refer to
> http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for
> more information about BlackRock’s Privacy Policy.
>
> For a list of BlackRock's office addresses worldwide, see
> http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.
>
> © 2018 BlackRock, Inc. All rights reserved.
>  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message