drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <par0...@yahoo.com.INVALID>
Subject Re: Drill on YARN Questions
Date Sat, 12 Jan 2019 01:39:39 GMT
Let's try to troubleshoot. Does the combination of stop and start work? If so, then there could
be a bug with the restart command itself.

If neither start nor stop work, it could be that you are missing the application ID file created
when you first started DoY. Some background.

When we submit an app to YARN, YARN gives us an app ID. We need this in order to track down
the app master for DoY so we can send it commands later.

When the command line tool starts DoY, it writes the YARN app ID to a file. Can't remember
the details, but it is probably in the $DRILL_SITE directory. The contents are, as I recall,
a long hexadecimal string.

When you invoke the command line, the tool reads this file to figure to track down the DoY
app master. The tool then sends commands to the app master: in this case, a request to shut
down. Then, for reset, the tool will communicate with YARN to start a new instance.

The tool is suppose to give detailed error messages. Did you get any? That might tell us which
of these steps failed.

Can you connect to the DoY Web UI at the URL provided when you started DoY? If you can, this
means that the DoY App Master is up and running.

Are you running the client from the same node on which you started it? That file I mentioned
is local to the "DoY client" machine; it is not in DFS.

Then, there is one more very obscure bug you can check. On some distributions, the YARN task
files are written to the /tmp directory. Some Linux systems remove these files from time to
time. Once the files are gone, YARN can no longer control its containers: it won't be able
to stop the app master or the Drillbit containers. There are two fixes. First, go kill all
the processes by hand. Then, move the YARN state files out of /tmp, or exclude YARN's files
from the periodic cleanup.

Try some of the above and let us know what you find.

Also, perhaps Abhishek can offer some suggestions as he tested the heck out of the feature
and may have additional suggestions.

Thanks,
- Paul

 

    On Friday, January 11, 2019, 7:46:55 AM PST, Kwizera hugues Teddy <nbted2017@gmail.com>
wrote:  
 
 hello,

 2 weeks ago, I began to discover DoY. Today by reading drill documents (
https://drill.apache.org/docs/appendix-a-release-note-issues/ ) I saw that
we can restart drill cluster by :

 $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE restart

But doesn't work when I tested it.

No idea about it?

Thanks.




On Wed, Jan 2, 2019 at 3:18 AM Paul Rogers <par0328@yahoo.com.invalid>
wrote:

> Hi Charles,
>
> Your engineers have identified a common need, but one which is very
> difficult to satisfy.
>
> TL;DR: DoY gets as close to the requirements as possible within the
> constraints of YARN and Drill. But, future projects could do more.
>
> Your engineers want resource segregation among tenants: multi-tenancy.
> This is very difficult to achieve at the application level. Consider Drill.
> It would need some way to identify users to know which tenant they belong
> to. Then, Drill would need a way to enqueue users whose queries would
> exceed the memory or CPU limit for that tenant. Plus, Drill would have to
> be able to limit memory and CPU for each query. Much work has been done to
> limit memory, but CPU is very difficult. Mature products such as Teradata
> can do this, but Teradata has 40 years of effort behind it.
>
> Since it is hard to build multi-tenancy in at the app level (not
> impossible, just very, very hard), the thought is to apply it at the
> cluster level. This is done in YARN via limiting the resources available to
> processes (typically map/reduce) and to limit the number of running
> processes. Works for M/R because each map task uses disk to shuffle results
> to a reduce task, so map and reduce tasks can run asynchronously.
>
> For tools such as Drill, which do in-memory processing (really,
> across-the-network exchanges), both the sender and receiver have to run
> concurrently. This is much harder to schedule than async m/r tasks: it
> means that the entire Drill cluster (of whatever size) be up and running to
> run a query.
>
> The start-up time for Drill is far, far longer than a query. So, it is not
> feasible to use YARN to launch a Drill cluster for each query the way you
> would do with Spark. Instead, under YARN, Drill is a long running service
> that handles many queries.
>
> Obviously, this is not ideal: I'm sure your engineers want to use a
> tenant's resources for Drill when running queries, else for Spark, Hive, or
> maybe TensorFlow. If Drill has to be long-running, I'm sure they's like to
> slosh resources between tenants as is done in YARN. As noted above, this is
> a hard problem that DoY did not attempt to solve.
>
> One might suggest that Drill grab resources from YARN when Tenant A wants
> to run a query, and release them when that tenant is done, grabbing new
> resources when Tenant B wants to run. Impala tried this with Llama and
> found it did not work. (This is why DoY is quite a bit simpler; no reason
> to rerun a failed experiment.)
>
> Some folks are looking to Kubernetes (K8s) as a solution. But, that just
> replaces YARN with K8s: Drill is still a long-running process.
>
> To solve the problem you identify, you'll need either:
>
> * A bunch of work in Drill to build multi-tenancy into Drill, or
> * A cloud-like solution in which each tenant spins up a Drill cluster
> within its budget, spinning it down, or resizing it, to stay with an
> overall budget.
>
> The second option can be achieved under YARN with DoY, assuming that DoY
> added support for graceful shutdown (or the cluster is reduced in size only
> when no queries are active.) Longer-term, a more modern solution would be
> Drill-on-Kubernetes (DoK?) which Abhishek started on.
>
> Engineering is the art of compromise. The question for your engineers is
> how to achieve the best result given the limitations of the software
> available today. At the same time, helping the Drill community improve the
> solutions over time.
>
> Thanks,
> - Paul
>
>
>
>    On Sunday, December 30, 2018, 9:38:04 PM PST, Charles Givre <
> cgivre@gmail.com> wrote:
>
>  Hi Paul,
> Here’s what our engineers said:
>
> From Paul’s response, I understand that there is a slight confusion around
> how multi-tenancy has been enabled in our data lake.
>
> Some more details on this –
>
> Drill already has the concept of multitenancy where we can have multiple
> drill clusters running on the same data lake enabled through different
> ports and zookeeper. But, all of this is launched through the same hard
> coded yarn queue that we provide as a config parameter.
>
> In our data lake, each tenant has a certain amount of compute capacity
> allotted to them which they can use for their project work. This is
> provisioned through individual YARN queues for each tenant (resource
> caging). This restricts the tenants from using cluster resources beyond a
> certain limit and not impacting other tenants at the same time.
>
> Access to these YARN queues is provisioned through ACL memberships.
>
> ——
>
> Does this make sense?  Is this possible to get Drill to work in this
> manner, or should we look into opening up JIRAs and working on new
> capabilities?
>
>
>
> > On Dec 17, 2018, at 21:59, Paul Rogers <par0328@yahoo.com.INVALID>
> wrote:
> >
> > Hi Kwizera,
> > I hope my answer to Charles gave you the information you need. If not,
> please check out the DoY documentation or ask follow-up questions.
> > Key thing to remember: Drill is a long-running YARN service; queries DO
> NOT go through YARN queues, they go through Drill directly.
> >
> > Thanks,
> > - Paul
> >
> >
> >
> >    On Monday, December 17, 2018, 11:01:04 AM PST, Kwizera hugues Teddy <
> nbted2017@gmail.com> wrote:
> >
> > Hello,
> > Same questions ,
> > I would like to know how drill deal with this yarn fonctionality?
> > Cheers.
> >
> > On Mon, Dec 17, 2018, 17:53 Charles Givre <cgivre@gmail.com wrote:
> >
> >> Hello all,
> >> We are trying to set up a Drill cluster on our corporate data lake.  Our
> >> cluster requires dynamic YARN queue allocation for multi-tenant
> >> environment.  Is this something that Drill supports or is there a
> >> workaround?
> >> Thanks!
> >> —C
>  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message