drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <par0...@yahoo.com.INVALID>
Subject Re: Querying encrypted JSON file
Date Sun, 12 Apr 2020 18:37:38 GMT
Hi Prabhakar,

Looking at the Drill code, the existing compression support (via "codecs") is in the FileSystemPlugin
class, [1]. Looks like Drill uses the compression codec feature of Hadoop [2] based on a CompressionCodec
class [3].

This means that you just need to use standard Hadoop mechanisms to define a custom codec.

If you are storing JSON, it might be worthwhile combining compression and encryption together,
since JSON files tend to be large (especially if the JSON is indented.) Perhaps one of the
existing Hadoop codecs (see [2]) might do the job for you.

Here it might be worth pointing out that you'll need a file system to store the files. If
your use case is small enough that your files fit on a single machine, you can use a single
Drillbit to query local files. If the set of files is large, then one node will not provide
adequate performance so you'll need a Drill cluster. For that, you'll need a distributed file
system: HDFS, MapR-FS, S3 or whatever.

Note also that JSON is a convenient, but inefficient, format. If you have to encrypt files,
we already suggested compressing them as well. However, JSON files are not block-splittable:
if you have a big JSON file, it must be read in a single thread. (Not as much of a problem
if you instead have many smaller files.) A format such as Parquet is better suited for queries.
So, if you must convert your file to encrypt it, consider converting the files to Parquet
to get better query performance. Drill can even do the conversion for you with the CREATE
TABLE AS (CTAS) command.

- Paul

[1] https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java#L141

[2] https://netjs.blogspot.com/2018/04/data-compression-in-hadoop.html

[3] https://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/CompressionCodec.html

[4] https://stackoverflow.com/questions/37608227/adding-custom-code-to-hadoop-spark-compression-codec

    On Sunday, April 12, 2020, 12:03:13 AM PDT, Prabhakar Bhosaale <bhosale.p.v@gmail.com>
 Hi Paul,
Thanks  for details. As of now i have not finalized on any encryption
tecnique as first i wanted to understand drill capabilities on encryption
and decryption.
To give you more details on my requirent. I will be archiving data in
JSON format from database. And that archived data will be acceased using
drill for reporting pupose. I am already zipping up JSON files using gzip.
But for security reasons i need to encrypt the files also. Thx


On Sun, Apr 12, 2020, 11:38 Paul Rogers <par0328@yahoo.com.invalid> wrote:

> Hi Prabhakar,
> Depending on how you perform encryption, you may be able to treat it
> similar to compression. Drill handles compression (zip, gzip, etc.) via an
> extra layer of functionality on top of any format plugin. That means,
> rather than writing a new JSON file reader, you write a new compression
> plugin (which will actually do decryption). I have not added one of these,
> but I'll poke around to see if I can find some pointers.
> On the other hand, if encryption is part of the access protocol (such as
> S3), then you can configure it via the S3 client.
> Can you describe a bit more how you encrypt your files and what is needed
> to decrypt?
> Thanks,
> - Paul
>    On Saturday, April 11, 2020, 10:39:15 PM PDT, Prabhakar Bhosaale <
> bhosale.p.v@gmail.com> wrote:
>  Hi Ted,
> Thanks for your reply. Could you please give some more details on how to
> write to create file format, how to use it. Any pointers will be
> appreciated. Thx
> Regards
> Prabhakar
> On Sun, Apr 12, 2020, 00:19 Ted Dunning <ted.dunning@gmail.com> wrote:
> > Yes.
> >
> > You need to write a special file format for that, though.
> >
> >
> > On Sat, Apr 11, 2020 at 6:58 AM Prabhakar Bhosaale <
> bhosale.p.v@gmail.com>
> > wrote:
> >
> > > Hi All,
> > > I have a  encrypted JSON file. is there any way in drill to query the
> > > encrypted JSON file? Thanks
> > >
> > > Regards
> > > Prabhakar
> > >
> >
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message