drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <par0...@yahoo.com.INVALID>
Subject Re: Drill on YARN Questions
Date Mon, 14 Jan 2019 15:46:50 GMT
Hi,

Can you reach the AM web UI? The Web UI URL was shown below. It also should have been given
when you started DoY.

I notice that you're using SSL/TLS access. Doing so requires the right certificates on the
AM host. Again, trying to connect via your browser may help identify if that works.

If the Web UI works, then check the host name and port number in your browser compared to
that shown in the error message.

The resize command on the command line does nothing other than some validation, then it sends
the URL shown below. You can try entering the URL directly into your browser. Again, if that
fails, there is something amiss with your config. If that works, then we'll have to figure
out what might be wrong with the DoY command line tool.

Please try out the above and let us know what you learn.

Thanks,
- Paul

 

    On Monday, January 14, 2019, 7:30:44 AM PST, Kwizera hugues Teddy <nbted2017@gmail.com>
wrote:  
 
 Hello all,

I am experiencing an error on Resize and Status .
The errors are from the REST call on the AM.

command : $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE status
Result:
Application ID: xxxxxxxxxxxxxxxx Application State: RUNNING Host:
xxxxxxxxxxxxxxxx Queue: root.xxxxx.default User: xxxxxxxx Start Time:
2019-01-14 14:56:29 Application Name: Drill-on-YARN-cluster_01 Tracking
URL: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Failed to get AM
status
REST request failed: https://xxxxxxxxxxxxxxx:9048/rest/status

Command : $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE resize
Result :
      Resizing cluster for Application ID:
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      Resize failed: REST request failed:
https://xxxxxxxxxxxxxxx:9048/rest/shrink/1

 I didn't found how I can resolve this issue. maybe someone can help me

Thanks.



On Sat, Jan 12, 2019 at 8:30 AM Kwizera hugues Teddy <nbted2017@gmail.com>
wrote:

> Hello ,
>
> Other option work .
>
> As you say an update is needed in docs  and the remove of wrong
> information.
>
> Thanks.
>
> On Sat, Jan 12, 2019, 08:10 Abhishek Girish <agirish@apache.org wrote:
>
>> Hello Teddy,
>>
>> I don't recollect a restart option for the drill-on-yarn.sh script. I've
>> always used a combination of stop and start, like Paul mentions. Could you
>> please try if that works and get back to us? We could certainly have a
>> minor enhancement to support restart - until then i'll request Bridget to
>> update the documentation.
>>
>> Regards,
>> Abhishek
>>
>> On Fri, Jan 11, 2019 at 11:05 PM Kwizera hugues Teddy <
>> nbted2017@gmail.com>
>> wrote:
>>
>> > Hello Paul ,
>> >
>> > Thanks you for your response with some interesting information(files in
>> > /tmp).
>> >
>> > For my side all other command line  work normally(start|stop|status...|)
>> > but no restart(this option not recognized). I tried to search the code
>> > source and I found that the restart command is not implemented . then I
>> > wonder why the documentation does not match the source code ?.
>> >
>> > Thanks .Teddy
>> >
>> >
>> > On Sat, Jan 12, 2019, 02:39 Paul Rogers <par0328@yahoo.com.invalid
>> wrote:
>> >
>> > > Let's try to troubleshoot. Does the combination of stop and start
>> work?
>> > If
>> > > so, then there could be a bug with the restart command itself.
>> > >
>> > > If neither start nor stop work, it could be that you are missing the
>> > > application ID file created when you first started DoY. Some
>> background.
>> > >
>> > > When we submit an app to YARN, YARN gives us an app ID. We need this
>> in
>> > > order to track down the app master for DoY so we can send it commands
>> > later.
>> > >
>> > > When the command line tool starts DoY, it writes the YARN app ID to a
>> > > file. Can't remember the details, but it is probably in the
>> $DRILL_SITE
>> > > directory. The contents are, as I recall, a long hexadecimal string.
>> > >
>> > > When you invoke the command line, the tool reads this file to figure
>> to
>> > > track down the DoY app master. The tool then sends commands to the app
>> > > master: in this case, a request to shut down. Then, for reset, the
>> tool
>> > > will communicate with YARN to start a new instance.
>> > >
>> > > The tool is suppose to give detailed error messages. Did you get any?
>> > That
>> > > might tell us which of these steps failed.
>> > >
>> > > Can you connect to the DoY Web UI at the URL provided when you started
>> > > DoY? If you can, this means that the DoY App Master is up and running.
>> > >
>> > > Are you running the client from the same node on which you started it?
>> > > That file I mentioned is local to the "DoY client" machine; it is not
>> in
>> > > DFS.
>> > >
>> > > Then, there is one more very obscure bug you can check. On some
>> > > distributions, the YARN task files are written to the /tmp directory.
>> > Some
>> > > Linux systems remove these files from time to time. Once the files are
>> > > gone, YARN can no longer control its containers: it won't be able to
>> stop
>> > > the app master or the Drillbit containers. There are two fixes.
>> First, go
>> > > kill all the processes by hand. Then, move the YARN state files out of
>> > > /tmp, or exclude YARN's files from the periodic cleanup.
>> > >
>> > > Try some of the above and let us know what you find.
>> > >
>> > > Also, perhaps Abhishek can offer some suggestions as he tested the
>> heck
>> > > out of the feature and may have additional suggestions.
>> > >
>> > > Thanks,
>> > > - Paul
>> > >
>> > >
>> > >
>> > >    On Friday, January 11, 2019, 7:46:55 AM PST, Kwizera hugues Teddy
>> <
>> > > nbted2017@gmail.com> wrote:
>> > >
>> > >  hello,
>> > >
>> > >  2 weeks ago, I began to discover DoY. Today by reading drill
>> documents (
>> > > https://drill.apache.org/docs/appendix-a-release-note-issues/ ) I saw
>> > that
>> > > we can restart drill cluster by :
>> > >
>> > >  $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE restart
>> > >
>> > > But doesn't work when I tested it.
>> > >
>> > > No idea about it?
>> > >
>> > > Thanks.
>> > >
>> > >
>> > >
>> > >
>> > > On Wed, Jan 2, 2019 at 3:18 AM Paul Rogers <par0328@yahoo.com.invalid
>> >
>> > > wrote:
>> > >
>> > > > Hi Charles,
>> > > >
>> > > > Your engineers have identified a common need, but one which is very
>> > > > difficult to satisfy.
>> > > >
>> > > > TL;DR: DoY gets as close to the requirements as possible within the
>> > > > constraints of YARN and Drill. But, future projects could do more.
>> > > >
>> > > > Your engineers want resource segregation among tenants:
>> multi-tenancy.
>> > > > This is very difficult to achieve at the application level. Consider
>> > > Drill.
>> > > > It would need some way to identify users to know which tenant they
>> > belong
>> > > > to. Then, Drill would need a way to enqueue users whose queries
>> would
>> > > > exceed the memory or CPU limit for that tenant. Plus, Drill would
>> have
>> > to
>> > > > be able to limit memory and CPU for each query. Much work has been
>> done
>> > > to
>> > > > limit memory, but CPU is very difficult. Mature products such as
>> > Teradata
>> > > > can do this, but Teradata has 40 years of effort behind it.
>> > > >
>> > > > Since it is hard to build multi-tenancy in at the app level (not
>> > > > impossible, just very, very hard), the thought is to apply it at the
>> > > > cluster level. This is done in YARN via limiting the resources
>> > available
>> > > to
>> > > > processes (typically map/reduce) and to limit the number of running
>> > > > processes. Works for M/R because each map task uses disk to shuffle
>> > > results
>> > > > to a reduce task, so map and reduce tasks can run asynchronously.
>> > > >
>> > > > For tools such as Drill, which do in-memory processing (really,
>> > > > across-the-network exchanges), both the sender and receiver have to
>> run
>> > > > concurrently. This is much harder to schedule than async m/r tasks:
>> it
>> > > > means that the entire Drill cluster (of whatever size) be up and
>> > running
>> > > to
>> > > > run a query.
>> > > >
>> > > > The start-up time for Drill is far, far longer than a query. So, it
>> is
>> > > not
>> > > > feasible to use YARN to launch a Drill cluster for each query the
>> way
>> > you
>> > > > would do with Spark. Instead, under YARN, Drill is a long running
>> > service
>> > > > that handles many queries.
>> > > >
>> > > > Obviously, this is not ideal: I'm sure your engineers want to use
a
>> > > > tenant's resources for Drill when running queries, else for Spark,
>> > Hive,
>> > > or
>> > > > maybe TensorFlow. If Drill has to be long-running, I'm sure they's
>> like
>> > > to
>> > > > slosh resources between tenants as is done in YARN. As noted above,
>> > this
>> > > is
>> > > > a hard problem that DoY did not attempt to solve.
>> > > >
>> > > > One might suggest that Drill grab resources from YARN when Tenant
A
>> > wants
>> > > > to run a query, and release them when that tenant is done, grabbing
>> new
>> > > > resources when Tenant B wants to run. Impala tried this with Llama
>> and
>> > > > found it did not work. (This is why DoY is quite a bit simpler; no
>> > reason
>> > > > to rerun a failed experiment.)
>> > > >
>> > > > Some folks are looking to Kubernetes (K8s) as a solution. But, that
>> > just
>> > > > replaces YARN with K8s: Drill is still a long-running process.
>> > > >
>> > > > To solve the problem you identify, you'll need either:
>> > > >
>> > > > * A bunch of work in Drill to build multi-tenancy into Drill, or
>> > > > * A cloud-like solution in which each tenant spins up a Drill
>> cluster
>> > > > within its budget, spinning it down, or resizing it, to stay with
an
>> > > > overall budget.
>> > > >
>> > > > The second option can be achieved under YARN with DoY, assuming that
>> > DoY
>> > > > added support for graceful shutdown (or the cluster is reduced in
>> size
>> > > only
>> > > > when no queries are active.) Longer-term, a more modern solution
>> would
>> > be
>> > > > Drill-on-Kubernetes (DoK?) which Abhishek started on.
>> > > >
>> > > > Engineering is the art of compromise. The question for your
>> engineers
>> > is
>> > > > how to achieve the best result given the limitations of the software
>> > > > available today. At the same time, helping the Drill community
>> improve
>> > > the
>> > > > solutions over time.
>> > > >
>> > > > Thanks,
>> > > > - Paul
>> > > >
>> > > >
>> > > >
>> > > >    On Sunday, December 30, 2018, 9:38:04 PM PST, Charles Givre <
>> > > > cgivre@gmail.com> wrote:
>> > > >
>> > > >  Hi Paul,
>> > > > Here’s what our engineers said:
>> > > >
>> > > > From Paul’s response, I understand that there is a slight confusion
>> > > around
>> > > > how multi-tenancy has been enabled in our data lake.
>> > > >
>> > > > Some more details on this –
>> > > >
>> > > > Drill already has the concept of multitenancy where we can have
>> > multiple
>> > > > drill clusters running on the same data lake enabled through
>> different
>> > > > ports and zookeeper. But, all of this is launched through the same
>> hard
>> > > > coded yarn queue that we provide as a config parameter.
>> > > >
>> > > > In our data lake, each tenant has a certain amount of compute
>> capacity
>> > > > allotted to them which they can use for their project work. This is
>> > > > provisioned through individual YARN queues for each tenant (resource
>> > > > caging). This restricts the tenants from using cluster resources
>> > beyond a
>> > > > certain limit and not impacting other tenants at the same time.
>> > > >
>> > > > Access to these YARN queues is provisioned through ACL memberships.
>> > > >
>> > > > ——
>> > > >
>> > > > Does this make sense?  Is this possible to get Drill to work in this
>> > > > manner, or should we look into opening up JIRAs and working on new
>> > > > capabilities?
>> > > >
>> > > >
>> > > >
>> > > > > On Dec 17, 2018, at 21:59, Paul Rogers <par0328@yahoo.com.INVALID
>> >
>> > > > wrote:
>> > > > >
>> > > > > Hi Kwizera,
>> > > > > I hope my answer to Charles gave you the information you need.
If
>> > not,
>> > > > please check out the DoY documentation or ask follow-up questions.
>> > > > > Key thing to remember: Drill is a long-running YARN service;
>> queries
>> > DO
>> > > > NOT go through YARN queues, they go through Drill directly.
>> > > > >
>> > > > > Thanks,
>> > > > > - Paul
>> > > > >
>> > > > >
>> > > > >
>> > > > >    On Monday, December 17, 2018, 11:01:04 AM PST, Kwizera hugues
>> > Teddy
>> > > <
>> > > > nbted2017@gmail.com> wrote:
>> > > > >
>> > > > > Hello,
>> > > > > Same questions ,
>> > > > > I would like to know how drill deal with this yarn fonctionality?
>> > > > > Cheers.
>> > > > >
>> > > > > On Mon, Dec 17, 2018, 17:53 Charles Givre <cgivre@gmail.com
>> wrote:
>> > > > >
>> > > > >> Hello all,
>> > > > >> We are trying to set up a Drill cluster on our corporate
data
>> lake.
>> > > Our
>> > > > >> cluster requires dynamic YARN queue allocation for multi-tenant
>> > > > >> environment.  Is this something that Drill supports or is
there a
>> > > > >> workaround?
>> > > > >> Thanks!
>> > > > >> —C
>> > > >
>> >
>>
>  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message