drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bob Rudis <...@rud.is>
Subject Re: New Drill R "dplyr" interface
Date Fri, 30 Dec 2016 16:23:02 GMT
Aye. I'll file an issue. I can definitely reproduce it in any Drill
context I have.

(I'm hyper-threading here as your CSV reply is later than this one).

If one has a CSV or JSON, then I'm not sure why one would use R for
the Drill machinations. Just as easy (if not easier) to fire off a
small script in `drill-localhost` or `drill-embedded` to do the work.
I'm still (it's prbly just my inability to grok this use case) not
sure why one wld turn an in-memory R data frame to a parquet file
unless one really works with multi-GB data frames directly in R (which
I have to then ask "why" when there's Drill, Spark, H2O, etc that do a
good amt of that work better, albeit lacking in many of the more
advanced stats/ML pkgs R has available to it).

On Mon, Dec 26, 2016 at 3:24 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> On Mon, Dec 26, 2016 at 7:22 AM, Bob Rudis <bob@rud.is> wrote:
>
>> Also, when attempting to make a "data frame to parquet" function which
>> sends a CTAS query with > 1000 (VALUES((1,2,3,...)) rows Drill tosses
>> stack exceptions and hangs (in the embedded console, localhost console
>> and REST POST query).
>>
>
> Can you be more explicit here (and possibly file a bug on JIRA
> <https://issues.apache.org/jira/browse/DRILL/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel>
> )?
>
> I just tried this with 1100 rows and it worked fine.
>
> If you can attach an actual CTAS query to the JIRA, that would be awesome.

Mime
View raw message