drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <par0...@yahoo.com.INVALID>
Subject Re: RE: Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a record
Date Tue, 28 Aug 2018 18:22:30 GMT
Hi Scott,

Bingo. Just tried this very case with the sample file from the previous post. Got exactly
the failure in the post you provided. I notice that a "select *" query returns immediately,
but a "count(*)" query hangs for the 30+ seconds before it errors out. Mine is only a two-record
file, so taking 30 seconds to fail is excessive.

Clearly, something is wrong. At the very least, a count(*) should simply read all records
and discard the data, using exactly the same JSON parser as for a "SELECT *" query. That Drill
is not doing so suggests that perhaps the code is trying to be clever to optimize for the
"count(*)" case, and is doing so incorrectly.

Here is a clunky workaround: just add a WHERE clause that accepts all records:

SELECT COUNT(*) FROM `test.json` WHERE 1 = 1;
+---------+| EXPR$0  |+---------+| 2       |+---------+

As it turns out, I'm in the (very slow) process of issuing PRs for a revised JSON record reader
to handle other issues. A side effect of that change is that the new implementation does use
the same parse path for both the "SELECT *" an "SELECT count(*)" paths. So, even if someone
cannot fix this bug short term, there is a longer-term fix coming.

Thanks,
- Paul

 

    On Tuesday, August 28, 2018, 8:46:11 AM PDT, scott <tcots8888@gmail.com> wrote:
 
 
 Paul,
Thanks for prompting the right questions. I went back and took another look
at my queries. It turns out that there is some condition that causes this
error when running functions like "count(*)" on the data to cause this
error, where a normal unqualified select does not. I also ran across this
article from MapR that led me to conclude Drill just doesn't support it.

https://mapr.com/support/s/article/Apache-Drill-cannot-read-from-middle-of-a-record?language=en_US

I think if we can confirm exactly which conditions cause the problem, we
should open a high priority Jira. What do you think?


On Mon, Aug 27, 2018 at 11:58 PM Paul Rogers <par0328@yahoo.com.invalid>
wrote:

> Hi Scott,
>
> I created a file, "test.json", using the data from your e-mail:
>
> [ { "var1": "foo", "var2":"bar"},{"var1": "fo", "var2": "baz"}]
>
> The oldest build I have readily available is Drill 1.13. I ran that as a
> server, then connected with sqlline as a client. I ran a query:
>
> select * from `test.json`;
> +-------+-------+| var1  | var2  |+-------+-------+| foo  | bar  || fo
>  | baz  |+-------+-------+
>
> I can try with Drill 1.12, once I find and download it. Or, you can try
> with Drill 1.14 (the latest release.)
>
> I do wonder, however, if we are talking about the same thing. My test puts
> your JSON in a JSON file with ".json" extension so that Drill choses the
> JSON parser. I'm using default JSON (session) options.
>
> Is this what you are doing? Or, is your JSON coming from some other
> source? Kafka? A field from a CSV file, say?
>
> Thanks,
> - Paul
>
>
>
>    On Monday, August 27, 2018, 10:31:00 PM PDT, scott <
> tcots8888@gmail.com> wrote:
>
>  Paul,
> I'm using version 1.12. Can you tell me what version you think that was
> fixed in? The ticket I referenced is still open, with no comments.
>
> Scott
>
> On Mon, Aug 27, 2018 at 5:47 PM Paul Rogers <par0328@yahoo.com.invalid>
> wrote:
>
> > Hi David,
> >
> > JSON files are never splittable: there is no single-character way to find
> > the start of a JSON record within a file.
> >
> > Drill is supposed to support two JSON formats: the array format from the
> > earlier post, and the non-JSON (but very common) list of objects format
> in
> > this example.
> >
> > Thanks,
> > - Paul
> >
> >
> >
> >    On Monday, August 27, 2018, 5:38:32 PM PDT, Lee, David <
> > David.Lee@blackrock.com> wrote:
> >
> >  Get rid of the opening and closing brackets and see if you can turn the
> > commas into newlines.. The file needs to be splittable I think to reduce
> > memory overhead vs parsing a giant string...
> >
> > {"var1": "foo", "var2":"bar"}
> > {"var1": "fo", "var2": "baz"}
> > {"var1": "f2o", "var2": "baz2"}
> > {"var1": "f3o", "var2": "baz3"}
> > {"var1": "f4o", "var2": "baz4"}
> > {"var1": "f5o", "var2": "baz5"}
> >
> > -----Original Message-----
> > From: scott [mailto:tcots8888@gmail.com]
> > Sent: Monday, August 27, 2018 4:59 PM
> > To: user@drill.apache.org
> > Subject: Error: DATA_READ ERROR: Error parsing JSON - Cannot read from
> the
> > middle of a record
> >
> > [EXTERNAL EMAIL]
> >
> >
> > Hi All,
> > I'm getting an error querying some of my json files.
> > The error I'm getting is: Error: DATA_READ ERROR: Error parsing JSON -
> > Cannot read from the middle of a record. Current token was START_ARRAY
> >
> > The json files are in array format, like [ { "var1": "foo", "var2":
> > "bar"},{"var1": "fo", "var2": "baz"}]
> >
> > I found a ticket that indicates this format is not supported by Drill
> yet,
> > DRILL-1755 <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.apache.org_jira_browse_DRILL-2D1755&d=DwIBaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=SpeiLeBTifecUrj1SErsTRw4nAqzMxT043sp_gndNeI&m=G0Hsj4vSq2tBbv1c1dW6zC3pOzA_kSuhlQoFvFKpdJo&s=Dh8nYVKoOA8nQ3XdDmauSethwq9x4ric2_MsYMcfDdc&e=
> >
> > , but I find it hard to believe there is no workaround or solution since
> > this was reported
> > 4 years back. Does anyone have a solution or workaround to this problem?
> >
> > Thanks,
> > Scott
> >
> >
> > This message may contain information that is confidential or privileged.
> > If you are not the intended recipient, please advise the sender
> immediately
> > and delete this message. See
> > http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers
> for
> > further information.  Please refer to
> > http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for
> > more information about BlackRock’s Privacy Policy.
> >
> > For a list of BlackRock's office addresses worldwide, see
> > http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.
> >
> > © 2018 BlackRock, Inc. All rights reserved.
> >  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message