drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <par0...@yahoo.com.INVALID>
Subject Re: Apache drill not able to query lengthy JSON
Date Fri, 10 Aug 2018 23:07:32 GMT
Hi Gayathri,

Please provide a few more details. Drill handles JSON in many forms; you did not indicate
which form you are trying to use.

The most common form of JSON is in a file, with each record as an object. This is not true
JSON, but is a very common format. Example:

{a: 10, b: "fred"}{a: 20, b: "Wilma"}
Drill also handles JSON as one big array. This is true JSON, but seems to not occur very often
in practice:

[ {a: 10, b: "fred"}.  {a: 20, b: "Wilma"} ]

In both cases, the JSON parser which Drill uses correctly parses one record at a time. That
is, even if we had a file with 10 billion records, Drill reads them one at a time. Drill batches
up data into "record batches"of something like a few thousand records; so the amount of memory
used to hold the data is independent of file size on disk.

The doc link you listed is interesting; I don't believe it is actually true. The passage says
"Drill cannot manage lengthy JSON objects, such as a gigabit JSON file. Finding the beginning
and end of records can be time consuming and require scanning the whole file." This is plain
wrong. Drill does not attempt to split JSON into blocks: each file is read by a single "record
reader" which starts at byte 0 and works its way to the end, one byte at a time, as described
above. It is true that Drill cannot parallelize reads of huge JSON files, but the (single-threaded)
read should work.

You cited an error about a string being too long. You also mentioned not storing JSON in the
file system. Are you trying to read JSON from some other data source as a big long string?
Can you provide a bit more details?

Thanks,
- Paul

 

    On Friday, August 10, 2018, 11:48:50 AM PDT, Gayathri Selvaraj <gayathri.selvaraaj@gmail.com>
wrote:  
 
 Hi Team,

I am using Apache drill to query JSON files. The size of JSON file which am
having is more than a GB. Because of that, Apache drill is throwing error
saying "string is too long".

In the following link, I learnt that Apache drill currently do not support
lengthy JSON (
https://drill.apache.org/docs/json-data-model/#lengthy-json-objects).

According to my requirement, I should not store the JSON in File system. It
should be in memory only.

Do you have any work around for this? Any solution is really appreciated.

Expecting a quick response from you.

Thanks,
Gayathri.
  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message