hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergio Peña (JIRA) <j...@apache.org>
Subject [jira] [Commented] (HIVE-13507) Improved logging for ptest
Date Wed, 20 Apr 2016 18:33:25 GMT

    [ https://issues.apache.org/jira/browse/HIVE-13507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250463#comment-15250463
] 

Sergio Peña commented on HIVE-13507:
------------------------------------

hi [~sseth], I will need to revert this patch as it is causing some issues with the ptest
infra.
While I was running some tests, I found that ptest is spinning a lot of instances due to an
error exception:

{noformat}
2016-04-20 13:01:25 INFO  CloudExecutionContextProvider:213 - Attempting to create 12 nodes
2016-04-20 13:02:34 INFO  CloudExecutionContextProvider:281 - Verify number of hots: 1
2016-04-20 13:02:34 INFO  CloudExecutionContextProvider:291 - Verifying node: {id=us-west-1/i-b245ef07,
providerId=i-b245ef07, name=spena-hive-spark-ptest-slaves-b245ef07, location={scope=ZONE,
id=us-west-1c, description=us-west-1c, parent=us-west-1, iso3166Codes=[US-CA]}, group=spena-hive-spark-ptest-slaves,
imageId=us-west-1/ami-1ac6dc5f, os={family=unrecognized, arch=paravirtual, version=, description=360379543683/hive-spark-ptest-7,
is64Bit=true}, status=RUNNING[running], loginPort=22, hostname=ip-10-236-128-180, privateAddresses=[10.236.128.180],
publicAddresses=[54.241.234.115], hardware={id=c3.2xlarge, providerId=c3.2xlarge, processors=[{cores=8.0,
speed=3.5}], ram=15360, volumes=[{type=LOCAL, size=80.0, device=/dev/sdb, bootDevice=false,
durable=false}, {type=LOCAL, size=80.0, device=/dev/sdc, bootDevice=false, durable=false},
{id=vol-df82d662, type=SAN, device=/dev/sda1, bootDevice=true, durable=true}], hypervisor=xen,
supportsImage=And(ALWAYS_TRUE,Or(isWindows(),requiresVirtualizationType(paravirtual)),ALWAYS_TRUE,is64Bit())},
loginUser=root, tags=[group=spena-hive-spark-ptest-slaves], userMetadata={owner=sergio.pena,
Name=spena-hive-spark-ptest-slaves-b245ef07}}
2016-04-20 13:02:34 INFO  CloudExecutionContextProvider:45 - Starting LocalCommandId=ssh -v
-i /home/hiveptest/.ssh/hive-ptest-user-key  -l hiveptest 54.241.234.115 'pkill -f java':
{}1
2016-04-20 13:02:35 INFO  CloudExecutionContextProvider:60 - Finished LocalCommandId=1. ElapsedTime(seconds)=0
2016-04-20 13:02:35 ERROR CloudExecutionContextProvider:296 - Node {id=us-west-1/i-b245ef07,
providerId=i-b245ef07, name=spena-hive-spark-ptest-slaves-b245ef07, location={scope=ZONE,
id=us-west-1c, description=us-west-1c, parent=us-west-1, iso3166Codes=[US-CA]}, group=spena-hive-spark-ptest-slaves,
imageId=us-west-1/ami-1ac6dc5f, os={family=unrecognized, arch=paravirtual, version=, description=360379543683/hive-spark-ptest-7,
is64Bit=true}, status=RUNNING[running], loginPort=22, hostname=ip-10-236-128-180, privateAddresses=[10.236.128.180],
publicAddresses=[54.241.234.115], hardware={id=c3.2xlarge, providerId=c3.2xlarge, processors=[{cores=8.0,
speed=3.5}], ram=15360, volumes=[{type=LOCAL, size=80.0, device=/dev/sdb, bootDevice=false,
durable=false}, {type=LOCAL, size=80.0, device=/dev/sdc, bootDevice=false, durable=false},
{id=vol-df82d662, type=SAN, device=/dev/sda1, bootDevice=true, durable=true}], hypervisor=xen,
supportsImage=And(ALWAYS_TRUE,Or(isWindows(),requiresVirtualizationType(paravirtual)),ALWAYS_TRUE,is64Bit())},
loginUser=root, tags=[group=spena-hive-spark-ptest-slaves], userMetadata={owner=sergio.pena,
Name=spena-hive-spark-ptest-slaves-b245ef07}} is bad on startup
java.lang.IllegalStateException: This stopwatch is already stopped.
        at com.google.common.base.Preconditions.checkState(Preconditions.java:150) ~[guava-15.0.jar:?]
        at com.google.common.base.Stopwatch.stop(Stopwatch.java:177) ~[guava-15.0.jar:?]
        at org.apache.hive.ptest.execution.LocalCommand.getExitCode(LocalCommand.java:59)
~[LocalCommand.class:?]
        at org.apache.hive.ptest.execution.ssh.SSHCommandExecutor.execute(SSHCommandExecutor.java:72)
~[SSHCommandExecutor.class:?]
        at org.apache.hive.ptest.execution.context.CloudExecutionContextProvider$3.run(CloudExecutionContextProvider.java:293)
[CloudExecutionContextProvider$3.class:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_45]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_45]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[?:1.7.0_45]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[?:1.7.0_45]
        at java.lang.Thread.run(Thread.java:744) [?:1.7.0_45]
2016-04-20 13:02:35 INFO  CloudExecutionContextProvider:354 - Submitting termination for {id=us-west-1/i-b245ef07,
providerId=i-b245ef07, name=spena-hive-spark-ptest-slaves-b245ef07, location={scope=ZONE,
id=us-west-1c, description=us-west-1c, parent=us-west-1, iso3166Codes=[US-CA]}, group=spena-hive-spark-ptest-slaves,
imageId=us-west-1/ami-1ac6dc5f, os={family=unrecognized, arch=paravirtual, version=, description=360379543683/hive-spark-ptest-7,
is64Bit=true}, status=RUNNING[running], loginPort=22, hostname=ip-10-236-128-180, privateAddresses=[10.236.128.180],
publicAddresses=[54.241.234.115], hardware={id=c3.2xlarge, providerId=c3.2xlarge, processors=[{cores=8.0,
speed=3.5}], ram=15360, volumes=[{type=LOCAL, size=80.0, device=/dev/sdb, bootDevice=false,
durable=false}, {type=LOCAL, size=80.0, device=/dev/sdc, bootDevice=false, durable=false},
{id=vol-df82d662, type=SAN, device=/dev/sda1, bootDevice=true, durable=true}], hypervisor=xen,
supportsImage=And(ALWAYS_TRUE,Or(isWindows(),requiresVirtualizationType(paravirtual)),ALWAYS_TRUE,is64Bit())},
loginUser=root, tags=[group=spena-hive-spark-ptest-slaves], userMetadata={owner=sergio.pena,
Name=spena-hive-spark-ptest-slaves-b245ef07}}
2016-04-20 13:02:35 INFO  CloudExecutionContextProvider:226 - Successfully created 0 nodes
2016-04-20 13:02:35 INFO  CloudExecutionContextProvider:233 - Pausing creation process for
60 seconds
2016-04-20 13:03:35 INFO  CloudExecutionContextProvider:213 - Attempting to create 12 nodes
{noformat}

As you see, due to the error, there were 0  nodes created (and they're supposed to be terminated)
,but for some reason Amazon is not terminating them, so this got in a loop for a long time.

I don't know why the error, but I reverted the patch locally on the Ptest server, and everything
is working normally again.

> Improved logging for ptest
> --------------------------
>
>                 Key: HIVE-13507
>                 URL: https://issues.apache.org/jira/browse/HIVE-13507
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Sergio Peña
>             Fix For: 2.1.0
>
>         Attachments: HIVE-13507.01.patch
>
>
> Include information about batch runtimes, outlier lists, host completion times, etc.
Try identifying tests which cause the build to take a long time while holding onto resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message