drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Girish <agir...@apache.org>
Subject Re: Drill on YARN Questions
Date Sat, 12 Jan 2019 07:10:33 GMT
Hello Teddy,

I don't recollect a restart option for the drill-on-yarn.sh script. I've
always used a combination of stop and start, like Paul mentions. Could you
please try if that works and get back to us? We could certainly have a
minor enhancement to support restart - until then i'll request Bridget to
update the documentation.

Regards,
Abhishek

On Fri, Jan 11, 2019 at 11:05 PM Kwizera hugues Teddy <nbted2017@gmail.com>
wrote:

> Hello Paul ,
>
> Thanks you for your response with some interesting information(files in
> /tmp).
>
> For my side all other command line  work normally(start|stop|status...|)
> but no restart(this option not recognized). I tried to search the code
> source and I found that the restart command is not implemented . then I
> wonder why the documentation does not match the source code ?.
>
> Thanks .Teddy
>
>
> On Sat, Jan 12, 2019, 02:39 Paul Rogers <par0328@yahoo.com.invalid wrote:
>
> > Let's try to troubleshoot. Does the combination of stop and start work?
> If
> > so, then there could be a bug with the restart command itself.
> >
> > If neither start nor stop work, it could be that you are missing the
> > application ID file created when you first started DoY. Some background.
> >
> > When we submit an app to YARN, YARN gives us an app ID. We need this in
> > order to track down the app master for DoY so we can send it commands
> later.
> >
> > When the command line tool starts DoY, it writes the YARN app ID to a
> > file. Can't remember the details, but it is probably in the $DRILL_SITE
> > directory. The contents are, as I recall, a long hexadecimal string.
> >
> > When you invoke the command line, the tool reads this file to figure to
> > track down the DoY app master. The tool then sends commands to the app
> > master: in this case, a request to shut down. Then, for reset, the tool
> > will communicate with YARN to start a new instance.
> >
> > The tool is suppose to give detailed error messages. Did you get any?
> That
> > might tell us which of these steps failed.
> >
> > Can you connect to the DoY Web UI at the URL provided when you started
> > DoY? If you can, this means that the DoY App Master is up and running.
> >
> > Are you running the client from the same node on which you started it?
> > That file I mentioned is local to the "DoY client" machine; it is not in
> > DFS.
> >
> > Then, there is one more very obscure bug you can check. On some
> > distributions, the YARN task files are written to the /tmp directory.
> Some
> > Linux systems remove these files from time to time. Once the files are
> > gone, YARN can no longer control its containers: it won't be able to stop
> > the app master or the Drillbit containers. There are two fixes. First, go
> > kill all the processes by hand. Then, move the YARN state files out of
> > /tmp, or exclude YARN's files from the periodic cleanup.
> >
> > Try some of the above and let us know what you find.
> >
> > Also, perhaps Abhishek can offer some suggestions as he tested the heck
> > out of the feature and may have additional suggestions.
> >
> > Thanks,
> > - Paul
> >
> >
> >
> >     On Friday, January 11, 2019, 7:46:55 AM PST, Kwizera hugues Teddy <
> > nbted2017@gmail.com> wrote:
> >
> >  hello,
> >
> >  2 weeks ago, I began to discover DoY. Today by reading drill documents (
> > https://drill.apache.org/docs/appendix-a-release-note-issues/ ) I saw
> that
> > we can restart drill cluster by :
> >
> >  $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE restart
> >
> > But doesn't work when I tested it.
> >
> > No idea about it?
> >
> > Thanks.
> >
> >
> >
> >
> > On Wed, Jan 2, 2019 at 3:18 AM Paul Rogers <par0328@yahoo.com.invalid>
> > wrote:
> >
> > > Hi Charles,
> > >
> > > Your engineers have identified a common need, but one which is very
> > > difficult to satisfy.
> > >
> > > TL;DR: DoY gets as close to the requirements as possible within the
> > > constraints of YARN and Drill. But, future projects could do more.
> > >
> > > Your engineers want resource segregation among tenants: multi-tenancy.
> > > This is very difficult to achieve at the application level. Consider
> > Drill.
> > > It would need some way to identify users to know which tenant they
> belong
> > > to. Then, Drill would need a way to enqueue users whose queries would
> > > exceed the memory or CPU limit for that tenant. Plus, Drill would have
> to
> > > be able to limit memory and CPU for each query. Much work has been done
> > to
> > > limit memory, but CPU is very difficult. Mature products such as
> Teradata
> > > can do this, but Teradata has 40 years of effort behind it.
> > >
> > > Since it is hard to build multi-tenancy in at the app level (not
> > > impossible, just very, very hard), the thought is to apply it at the
> > > cluster level. This is done in YARN via limiting the resources
> available
> > to
> > > processes (typically map/reduce) and to limit the number of running
> > > processes. Works for M/R because each map task uses disk to shuffle
> > results
> > > to a reduce task, so map and reduce tasks can run asynchronously.
> > >
> > > For tools such as Drill, which do in-memory processing (really,
> > > across-the-network exchanges), both the sender and receiver have to run
> > > concurrently. This is much harder to schedule than async m/r tasks: it
> > > means that the entire Drill cluster (of whatever size) be up and
> running
> > to
> > > run a query.
> > >
> > > The start-up time for Drill is far, far longer than a query. So, it is
> > not
> > > feasible to use YARN to launch a Drill cluster for each query the way
> you
> > > would do with Spark. Instead, under YARN, Drill is a long running
> service
> > > that handles many queries.
> > >
> > > Obviously, this is not ideal: I'm sure your engineers want to use a
> > > tenant's resources for Drill when running queries, else for Spark,
> Hive,
> > or
> > > maybe TensorFlow. If Drill has to be long-running, I'm sure they's like
> > to
> > > slosh resources between tenants as is done in YARN. As noted above,
> this
> > is
> > > a hard problem that DoY did not attempt to solve.
> > >
> > > One might suggest that Drill grab resources from YARN when Tenant A
> wants
> > > to run a query, and release them when that tenant is done, grabbing new
> > > resources when Tenant B wants to run. Impala tried this with Llama and
> > > found it did not work. (This is why DoY is quite a bit simpler; no
> reason
> > > to rerun a failed experiment.)
> > >
> > > Some folks are looking to Kubernetes (K8s) as a solution. But, that
> just
> > > replaces YARN with K8s: Drill is still a long-running process.
> > >
> > > To solve the problem you identify, you'll need either:
> > >
> > > * A bunch of work in Drill to build multi-tenancy into Drill, or
> > > * A cloud-like solution in which each tenant spins up a Drill cluster
> > > within its budget, spinning it down, or resizing it, to stay with an
> > > overall budget.
> > >
> > > The second option can be achieved under YARN with DoY, assuming that
> DoY
> > > added support for graceful shutdown (or the cluster is reduced in size
> > only
> > > when no queries are active.) Longer-term, a more modern solution would
> be
> > > Drill-on-Kubernetes (DoK?) which Abhishek started on.
> > >
> > > Engineering is the art of compromise. The question for your engineers
> is
> > > how to achieve the best result given the limitations of the software
> > > available today. At the same time, helping the Drill community improve
> > the
> > > solutions over time.
> > >
> > > Thanks,
> > > - Paul
> > >
> > >
> > >
> > >    On Sunday, December 30, 2018, 9:38:04 PM PST, Charles Givre <
> > > cgivre@gmail.com> wrote:
> > >
> > >  Hi Paul,
> > > Here’s what our engineers said:
> > >
> > > From Paul’s response, I understand that there is a slight confusion
> > around
> > > how multi-tenancy has been enabled in our data lake.
> > >
> > > Some more details on this –
> > >
> > > Drill already has the concept of multitenancy where we can have
> multiple
> > > drill clusters running on the same data lake enabled through different
> > > ports and zookeeper. But, all of this is launched through the same hard
> > > coded yarn queue that we provide as a config parameter.
> > >
> > > In our data lake, each tenant has a certain amount of compute capacity
> > > allotted to them which they can use for their project work. This is
> > > provisioned through individual YARN queues for each tenant (resource
> > > caging). This restricts the tenants from using cluster resources
> beyond a
> > > certain limit and not impacting other tenants at the same time.
> > >
> > > Access to these YARN queues is provisioned through ACL memberships.
> > >
> > > ——
> > >
> > > Does this make sense?  Is this possible to get Drill to work in this
> > > manner, or should we look into opening up JIRAs and working on new
> > > capabilities?
> > >
> > >
> > >
> > > > On Dec 17, 2018, at 21:59, Paul Rogers <par0328@yahoo.com.INVALID>
> > > wrote:
> > > >
> > > > Hi Kwizera,
> > > > I hope my answer to Charles gave you the information you need. If
> not,
> > > please check out the DoY documentation or ask follow-up questions.
> > > > Key thing to remember: Drill is a long-running YARN service; queries
> DO
> > > NOT go through YARN queues, they go through Drill directly.
> > > >
> > > > Thanks,
> > > > - Paul
> > > >
> > > >
> > > >
> > > >    On Monday, December 17, 2018, 11:01:04 AM PST, Kwizera hugues
> Teddy
> > <
> > > nbted2017@gmail.com> wrote:
> > > >
> > > > Hello,
> > > > Same questions ,
> > > > I would like to know how drill deal with this yarn fonctionality?
> > > > Cheers.
> > > >
> > > > On Mon, Dec 17, 2018, 17:53 Charles Givre <cgivre@gmail.com wrote:
> > > >
> > > >> Hello all,
> > > >> We are trying to set up a Drill cluster on our corporate data lake.
> > Our
> > > >> cluster requires dynamic YARN queue allocation for multi-tenant
> > > >> environment.  Is this something that Drill supports or is there a
> > > >> workaround?
> > > >> Thanks!
> > > >> —C
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message