drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Omernik <j...@omernik.com>
Subject Re: Reading and converting Parquet files intended for Impala
Date Wed, 08 Jun 2016 15:36:30 GMT
So on this subject, I believe
https://issues.apache.org/jira/browse/DRILL-4464 maybe related, while the
error messages are slightly different with tweaking of settings, I can
reproduce my problem with the test data that's included on the JIRA.  I do
believe my problem is reproducible with this issue, and I posted to JIRA
the similarities.

Thanks!

John

On Mon, May 30, 2016 at 7:06 PM, John Omernik <john@omernik.com> wrote:

> what I don't understand is the substitution in general. Why have
>  export SERVER_GC_OPTS=${SERVER_GC_OPTS/"-Xloggc:<FILE-
> PATH>"/"-Xloggc:${loggc}"}
>
> instead of
>
> export SERVER_GC_OPTS="${SERVER_GC_OPTS/} -Xloggc:${loggc}"
>
> The latter seems much more straight forward and understandable, and less
> prone to odd ball issues. Maybe, one other if test to ensure that ${loggc}
> is set as well.
>
>  if [ -n "$SERVER_GC_OPTS" && -n "${loggc}" ]; then
> export SERVER_GC_OPTS="${SERVER_GC_OPTS/} -Xloggc:${loggc}"
> fi
>
> I guess I am just a big fan of simplification...
>
> On Mon, May 30, 2016 at 5:01 PM, Paul Rogers <progers@maprtech.com> wrote:
>
>> Hi John,
>>
>> The Drill scripts need quite a bit of TLC. (See DRILL-4581.)
>> drill-config.sh tries to set up both the Drillbit (server) and sqlline
>> (client). Work was needed to fully separate the two. The CLIENT_GC_OPTS are
>> only for sqlline, SERVER_GC_OPTS are for the drillbit.
>>
>> The problem is that SERVER_GC_OPTS does two things that conflict. If it
>> only did logging, it would work:
>>
>> $ loggc=/foo/bar.log
>> $ export SERVER_GC_OPTS="-Xloggc:<FILE-PATH>”
>> $ echo ${SERVER_GC_OPTS/"-Xloggc:<FILE-PATH>"/"-Xloggc:${loggc}”}
>> -Xloggc:/foo/bar.log
>>
>> But, current version of drill-env.sh helpfully adds other stuff to
>> SERVER_GC_OPTS, which makes the substitution fail:
>>
>> export SERVER_GC_OPTS="-XX:+CMSClassUnloadingEnabled -XX:+UseG1GC "
>>
>> Sigh… More bugs to fix… I’ve added this issue as a comment to DRILL-4581.
>>
>> For now, just work around the problem using DRILL_JAVA_OPTS. The
>> following exists today in drill-env.sh:
>>
>> export DRILL_JAVA_OPTS="-Xms$DRILL_HEAP -Xmx$DRILL_HEAP…
>>
>> Add another line:
>>
>> export DRILL_JAVA_OPTS=“$DRILL_JAVA_OPTS -Xloggc:/path/to/gc.log"
>>
>> You’ll have to specify the log path, but it sounds like you do that
>> anyway for your Mesos setup.
>>
>> By the way, another change we’re making for DoY is to split drill-env.sh
>> into three parts: Drill defaults move into drill-config.sh,
>> distribution-specific stuff moves into its own file, and drill-env.sh will
>> contain only site-specific settings.
>>
>> - Paul
>>
>> > On May 30, 2016, at 5:28 AM, John Omernik <john@omernik.com> wrote:
>> >
>> > More importantly, I am not sure how the strings inside the curly braces
>> > actually works either, based on testing... (echoing out the
>> SERVER_GC_OPTS
>> > and CLIENT_GC_OPTS) It's not actually working
>> >
>> > If I am reading the bash correctly, than it's looking to, if
>> SERVER_GC_OPTS
>> > (or CLIENT) is set (-n = return true if the length of the string is
>> > nonzero, since the Variable is interpreted, we are checking wether
>> there is
>> > something in the variable)  then we should be adding the xloggc (both of
>> > them) to the SERVER_GC_OPTS (and client).
>> >
>> > As you can see with the testing, the SERVER_GC_OPTS is only the value
>> that
>> > I am setting from my drill-env.sh (default setting)  which is loaded by
>> > drill-config.sh sourced earlier in the drillbit.sh.  Thus, this code in
>> > drillbit.sh is effectively doing nothing ... I guess my thought process
>> > here would be to have someone help decide what is intended here, (I am
>> not
>> > sure "Nothing" is intended based on the amount of code) and then we can
>> do
>> > some updating here to clarify and ensure efficacy.
>> >
>> >
>> > Testing:
>> >
>> > if [ -n "$SERVER_GC_OPTS" ]; then
>> >
>> >  export
>> SERVER_GC_OPTS=${SERVER_GC_OPTS/"-Xloggc:<FILE-PATH>"/"-Xloggc:${loggc}"}
>> >
>> > fi
>> >
>> > if [ -n "$CLIENT_GC_OPTS" ]; then
>> >
>> >  export
>> CLIENT_GC_OPTS=${CLIENT_GC_OPTS/"-Xloggc:<FILE-PATH>"/"-Xloggc:${loggc}"}
>> >
>> > fi
>> >
>> > echo "Server: $SERVER_GC_OPTS"
>> >
>> > echo "Client: $CLIENT_GC_OPTS"
>> >
>> > exit 1
>> >
>> >
>> > Server: -XX:+CMSClassUnloadingEnabled -XX:+UseG1GC
>> >
>> > Client:
>> >
>> > On Mon, May 30, 2016 at 6:43 AM, John Omernik <john@omernik.com> wrote:
>> >
>> >> So based on Paul's drilbit.sh comment and this, I decided to go ensure
>> I
>> >> was enabling the proper GC logging because I am skipping the
>> drillbit.sh.
>> >> I looked at the drillbit.sh, and frankly, It looks like a goofy error
>> may
>> >> be in that... the <FILE-PATH> seems to be in documentation for other
>> >> hadoop-ish projects, but I don't think Java or BASH does anything with
>> it.
>> >> Thus having that in the drillbit.sh (which to me shouldn't be changed)
>> >> seems to be a mistake... (Yes the -Xloggc after it may just overwrite
>> what
>> >> was passed in the <FILE-PATH> but am I correct in saying that this
is
>> >> actually just a mistake that in the drillbit.sh, and all it does is add
>> >> confusion? I hope I am wrong here and I get to learn something, but I
>> ust
>> >> can't see how <FILE-PATH> is interpreted by bash or java....
>> >>
>> >>
>> >> John
>> >>
>> >>
>> >>
>> >> if [ -n "$SERVER_GC_OPTS" ]; then
>> >>
>> >>  export SERVER_GC_OPTS=${SERVER_GC_OPTS/"-Xloggc:<FILE-PATH>"/
>> >> "-Xloggc:${loggc}"}
>> >>
>> >> fi
>> >>
>> >> if [ -n "$CLIENT_GC_OPTS" ]; then
>> >>
>> >>  export CLIENT_GC_OPTS=${CLIENT_GC_OPTS/"-Xloggc:<FILE-PATH>"/
>> >> "-Xloggc:${loggc}"}
>> >>
>> >> fi
>> >>
>> >> On Mon, May 30, 2016 at 3:42 AM, Ted Dunning <ted.dunning@gmail.com>
>> >> wrote:
>> >>
>> >>> On Sun, May 29, 2016 at 2:29 PM, John Omernik <john@omernik.com>
>> wrote:
>> >>>
>> >>>> (It's a very weird situation that the bits get into,
>> >>>> everything hangs, somethings work, other things seem to be a in
>> >>>> an in-between between working and not working etc.  Like describe
>> table
>> >>>> operations eventually return but after 10+seconds.  I resolve this
by
>> >>>> restarting all bits, and then things are right as rain.
>> >>>>
>> >>>
>> >>> Sounds like GC pressure, possibly.
>> >>>
>> >>> The GC logging that was mentioned in connection with drill.sh would
be
>> >>> helpful here.
>> >>>
>> >>
>> >>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message