drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Reshetov <alexander.v.reshe...@gmail.com>
Subject Re: Report issues with sensitive data
Date Fri, 03 Apr 2015 20:44:51 GMT

Andries, Ted, thanks for quick replies.
Yes, I'm using latest official build of 0.8.

I made some investigations of possible issues and also found way to
hide sensitive data.
Please see issue regarding this [1].

In that process I found one strange behavior which I assume lead to this issue.
(if dataset files are missed then they are still uploading)

[1] https://issues.apache.org/jira/browse/DRILL-2677

On Wed, Apr 1, 2015 at 7:46 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> One idea is to post a log-synth [1] schema that generates data the same
> shape as your real data.  If you can generate fake data that causes the
> same problem you give developers a huge head start in solving your problem.
> For the record, are you using the recently announced 0.8 version of Drill?
> [1] https://github.com/tdunning/log-synth
> On Wed, Apr 1, 2015 at 3:29 AM, Alexander Reshetov <
> alexander.v.reshetov@gmail.com> wrote:
>> Hello all,
>> I have 80GB dataset of JSONs which have many nested arrays.
>> I'm trying to flatten it and make some calculations, but I got
>> exceptions after reading about 2/3 of file.
>> I could (and want) to post an issue in Jira, but I cannot attach my dataset
>> because it has sensitive data and also it's too large.
>> It there any way to help to investigate issues without posting my dataset?
>> To give a hit about issue I've attached file with exception text.

View raw message