drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kwizera hugues Teddy <nbted2...@gmail.com>
Subject Re: Drill on YARN Questions
Date Tue, 15 Jan 2019 10:39:17 GMT
hello Paul,

Yes, I can access AM web UI.

I remark that the problem it's caused by SSL/TLS access Enabled( ssl-enabled:
true)_.

- https://xxxxxxxxx:10048/rest/status  : work fine in the browser

I think I have to deal with certificates on AM host. Do you have an idea?

Thanks.

On Mon, Jan 14, 2019 at 4:53 PM Paul Rogers <par0328@yahoo.com.invalid>
wrote:

> Hi,
>
> Can you reach the AM web UI? The Web UI URL was shown below. It also
> should have been given when you started DoY.
>
> I notice that you're using SSL/TLS access. Doing so requires the right
> certificates on the AM host. Again, trying to connect via your browser may
> help identify if that works.
>
> If the Web UI works, then check the host name and port number in your
> browser compared to that shown in the error message.
>
> The resize command on the command line does nothing other than some
> validation, then it sends the URL shown below. You can try entering the URL
> directly into your browser. Again, if that fails, there is something amiss
> with your config. If that works, then we'll have to figure out what might
> be wrong with the DoY command line tool.
>
> Please try out the above and let us know what you learn.
>
> Thanks,
> - Paul
>
>
>
>     On Monday, January 14, 2019, 7:30:44 AM PST, Kwizera hugues Teddy <
> nbted2017@gmail.com> wrote:
>
>  Hello all,
>
> I am experiencing an error on Resize and Status .
> The errors are from the REST call on the AM.
>
> command : $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE status
> Result:
> Application ID: xxxxxxxxxxxxxxxx Application State: RUNNING Host:
> xxxxxxxxxxxxxxxx Queue: root.xxxxx.default User: xxxxxxxx Start Time:
> 2019-01-14 14:56:29 Application Name: Drill-on-YARN-cluster_01 Tracking
> URL: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Failed to get AM
> status
> REST request failed: https://xxxxxxxxxxxxxxx:9048/rest/status
>
> Command : $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE resize
> Result :
>       Resizing cluster for Application ID:
> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>       Resize failed: REST request failed:
> https://xxxxxxxxxxxxxxx:9048/rest/shrink/1
>
>  I didn't found how I can resolve this issue. maybe someone can help me
>
> Thanks.
>
>
>
> On Sat, Jan 12, 2019 at 8:30 AM Kwizera hugues Teddy <nbted2017@gmail.com>
> wrote:
>
> > Hello ,
> >
> > Other option work .
> >
> > As you say an update is needed in docs  and the remove of wrong
> > information.
> >
> > Thanks.
> >
> > On Sat, Jan 12, 2019, 08:10 Abhishek Girish <agirish@apache.org wrote:
> >
> >> Hello Teddy,
> >>
> >> I don't recollect a restart option for the drill-on-yarn.sh script. I've
> >> always used a combination of stop and start, like Paul mentions. Could
> you
> >> please try if that works and get back to us? We could certainly have a
> >> minor enhancement to support restart - until then i'll request Bridget
> to
> >> update the documentation.
> >>
> >> Regards,
> >> Abhishek
> >>
> >> On Fri, Jan 11, 2019 at 11:05 PM Kwizera hugues Teddy <
> >> nbted2017@gmail.com>
> >> wrote:
> >>
> >> > Hello Paul ,
> >> >
> >> > Thanks you for your response with some interesting information(files
> in
> >> > /tmp).
> >> >
> >> > For my side all other command line  work
> normally(start|stop|status...|)
> >> > but no restart(this option not recognized). I tried to search the code
> >> > source and I found that the restart command is not implemented . then
> I
> >> > wonder why the documentation does not match the source code ?.
> >> >
> >> > Thanks .Teddy
> >> >
> >> >
> >> > On Sat, Jan 12, 2019, 02:39 Paul Rogers <par0328@yahoo.com.invalid
> >> wrote:
> >> >
> >> > > Let's try to troubleshoot. Does the combination of stop and start
> >> work?
> >> > If
> >> > > so, then there could be a bug with the restart command itself.
> >> > >
> >> > > If neither start nor stop work, it could be that you are missing the
> >> > > application ID file created when you first started DoY. Some
> >> background.
> >> > >
> >> > > When we submit an app to YARN, YARN gives us an app ID. We need this
> >> in
> >> > > order to track down the app master for DoY so we can send it
> commands
> >> > later.
> >> > >
> >> > > When the command line tool starts DoY, it writes the YARN app ID to
> a
> >> > > file. Can't remember the details, but it is probably in the
> >> $DRILL_SITE
> >> > > directory. The contents are, as I recall, a long hexadecimal string.
> >> > >
> >> > > When you invoke the command line, the tool reads this file to figure
> >> to
> >> > > track down the DoY app master. The tool then sends commands to the
> app
> >> > > master: in this case, a request to shut down. Then, for reset, the
> >> tool
> >> > > will communicate with YARN to start a new instance.
> >> > >
> >> > > The tool is suppose to give detailed error messages. Did you get
> any?
> >> > That
> >> > > might tell us which of these steps failed.
> >> > >
> >> > > Can you connect to the DoY Web UI at the URL provided when you
> started
> >> > > DoY? If you can, this means that the DoY App Master is up and
> running.
> >> > >
> >> > > Are you running the client from the same node on which you started
> it?
> >> > > That file I mentioned is local to the "DoY client" machine; it is
> not
> >> in
> >> > > DFS.
> >> > >
> >> > > Then, there is one more very obscure bug you can check. On some
> >> > > distributions, the YARN task files are written to the /tmp
> directory.
> >> > Some
> >> > > Linux systems remove these files from time to time. Once the files
> are
> >> > > gone, YARN can no longer control its containers: it won't be able
to
> >> stop
> >> > > the app master or the Drillbit containers. There are two fixes.
> >> First, go
> >> > > kill all the processes by hand. Then, move the YARN state files out
> of
> >> > > /tmp, or exclude YARN's files from the periodic cleanup.
> >> > >
> >> > > Try some of the above and let us know what you find.
> >> > >
> >> > > Also, perhaps Abhishek can offer some suggestions as he tested the
> >> heck
> >> > > out of the feature and may have additional suggestions.
> >> > >
> >> > > Thanks,
> >> > > - Paul
> >> > >
> >> > >
> >> > >
> >> > >    On Friday, January 11, 2019, 7:46:55 AM PST, Kwizera hugues Teddy
> >> <
> >> > > nbted2017@gmail.com> wrote:
> >> > >
> >> > >  hello,
> >> > >
> >> > >  2 weeks ago, I began to discover DoY. Today by reading drill
> >> documents (
> >> > > https://drill.apache.org/docs/appendix-a-release-note-issues/ ) I
> saw
> >> > that
> >> > > we can restart drill cluster by :
> >> > >
> >> > >  $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE restart
> >> > >
> >> > > But doesn't work when I tested it.
> >> > >
> >> > > No idea about it?
> >> > >
> >> > > Thanks.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Jan 2, 2019 at 3:18 AM Paul Rogers
> <par0328@yahoo.com.invalid
> >> >
> >> > > wrote:
> >> > >
> >> > > > Hi Charles,
> >> > > >
> >> > > > Your engineers have identified a common need, but one which is
> very
> >> > > > difficult to satisfy.
> >> > > >
> >> > > > TL;DR: DoY gets as close to the requirements as possible within
> the
> >> > > > constraints of YARN and Drill. But, future projects could do
more.
> >> > > >
> >> > > > Your engineers want resource segregation among tenants:
> >> multi-tenancy.
> >> > > > This is very difficult to achieve at the application level.
> Consider
> >> > > Drill.
> >> > > > It would need some way to identify users to know which tenant
they
> >> > belong
> >> > > > to. Then, Drill would need a way to enqueue users whose queries
> >> would
> >> > > > exceed the memory or CPU limit for that tenant. Plus, Drill would
> >> have
> >> > to
> >> > > > be able to limit memory and CPU for each query. Much work has
been
> >> done
> >> > > to
> >> > > > limit memory, but CPU is very difficult. Mature products such
as
> >> > Teradata
> >> > > > can do this, but Teradata has 40 years of effort behind it.
> >> > > >
> >> > > > Since it is hard to build multi-tenancy in at the app level (not
> >> > > > impossible, just very, very hard), the thought is to apply it
at
> the
> >> > > > cluster level. This is done in YARN via limiting the resources
> >> > available
> >> > > to
> >> > > > processes (typically map/reduce) and to limit the number of
> running
> >> > > > processes. Works for M/R because each map task uses disk to
> shuffle
> >> > > results
> >> > > > to a reduce task, so map and reduce tasks can run asynchronously.
> >> > > >
> >> > > > For tools such as Drill, which do in-memory processing (really,
> >> > > > across-the-network exchanges), both the sender and receiver have
> to
> >> run
> >> > > > concurrently. This is much harder to schedule than async m/r
> tasks:
> >> it
> >> > > > means that the entire Drill cluster (of whatever size) be up
and
> >> > running
> >> > > to
> >> > > > run a query.
> >> > > >
> >> > > > The start-up time for Drill is far, far longer than a query.
So,
> it
> >> is
> >> > > not
> >> > > > feasible to use YARN to launch a Drill cluster for each query
the
> >> way
> >> > you
> >> > > > would do with Spark. Instead, under YARN, Drill is a long running
> >> > service
> >> > > > that handles many queries.
> >> > > >
> >> > > > Obviously, this is not ideal: I'm sure your engineers want to
use
> a
> >> > > > tenant's resources for Drill when running queries, else for Spark,
> >> > Hive,
> >> > > or
> >> > > > maybe TensorFlow. If Drill has to be long-running, I'm sure they's
> >> like
> >> > > to
> >> > > > slosh resources between tenants as is done in YARN. As noted
> above,
> >> > this
> >> > > is
> >> > > > a hard problem that DoY did not attempt to solve.
> >> > > >
> >> > > > One might suggest that Drill grab resources from YARN when Tenant
> A
> >> > wants
> >> > > > to run a query, and release them when that tenant is done,
> grabbing
> >> new
> >> > > > resources when Tenant B wants to run. Impala tried this with
Llama
> >> and
> >> > > > found it did not work. (This is why DoY is quite a bit simpler;
no
> >> > reason
> >> > > > to rerun a failed experiment.)
> >> > > >
> >> > > > Some folks are looking to Kubernetes (K8s) as a solution. But,
> that
> >> > just
> >> > > > replaces YARN with K8s: Drill is still a long-running process.
> >> > > >
> >> > > > To solve the problem you identify, you'll need either:
> >> > > >
> >> > > > * A bunch of work in Drill to build multi-tenancy into Drill,
or
> >> > > > * A cloud-like solution in which each tenant spins up a Drill
> >> cluster
> >> > > > within its budget, spinning it down, or resizing it, to stay
with
> an
> >> > > > overall budget.
> >> > > >
> >> > > > The second option can be achieved under YARN with DoY, assuming
> that
> >> > DoY
> >> > > > added support for graceful shutdown (or the cluster is reduced
in
> >> size
> >> > > only
> >> > > > when no queries are active.) Longer-term, a more modern solution
> >> would
> >> > be
> >> > > > Drill-on-Kubernetes (DoK?) which Abhishek started on.
> >> > > >
> >> > > > Engineering is the art of compromise. The question for your
> >> engineers
> >> > is
> >> > > > how to achieve the best result given the limitations of the
> software
> >> > > > available today. At the same time, helping the Drill community
> >> improve
> >> > > the
> >> > > > solutions over time.
> >> > > >
> >> > > > Thanks,
> >> > > > - Paul
> >> > > >
> >> > > >
> >> > > >
> >> > > >    On Sunday, December 30, 2018, 9:38:04 PM PST, Charles Givre
<
> >> > > > cgivre@gmail.com> wrote:
> >> > > >
> >> > > >  Hi Paul,
> >> > > > Here’s what our engineers said:
> >> > > >
> >> > > > From Paul’s response, I understand that there is a slight
> confusion
> >> > > around
> >> > > > how multi-tenancy has been enabled in our data lake.
> >> > > >
> >> > > > Some more details on this –
> >> > > >
> >> > > > Drill already has the concept of multitenancy where we can have
> >> > multiple
> >> > > > drill clusters running on the same data lake enabled through
> >> different
> >> > > > ports and zookeeper. But, all of this is launched through the
same
> >> hard
> >> > > > coded yarn queue that we provide as a config parameter.
> >> > > >
> >> > > > In our data lake, each tenant has a certain amount of compute
> >> capacity
> >> > > > allotted to them which they can use for their project work. This
> is
> >> > > > provisioned through individual YARN queues for each tenant
> (resource
> >> > > > caging). This restricts the tenants from using cluster resources
> >> > beyond a
> >> > > > certain limit and not impacting other tenants at the same time.
> >> > > >
> >> > > > Access to these YARN queues is provisioned through ACL
> memberships.
> >> > > >
> >> > > > ——
> >> > > >
> >> > > > Does this make sense?  Is this possible to get Drill to work
in
> this
> >> > > > manner, or should we look into opening up JIRAs and working on
new
> >> > > > capabilities?
> >> > > >
> >> > > >
> >> > > >
> >> > > > > On Dec 17, 2018, at 21:59, Paul Rogers
> <par0328@yahoo.com.INVALID
> >> >
> >> > > > wrote:
> >> > > > >
> >> > > > > Hi Kwizera,
> >> > > > > I hope my answer to Charles gave you the information you
need.
> If
> >> > not,
> >> > > > please check out the DoY documentation or ask follow-up questions.
> >> > > > > Key thing to remember: Drill is a long-running YARN service;
> >> queries
> >> > DO
> >> > > > NOT go through YARN queues, they go through Drill directly.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > - Paul
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >    On Monday, December 17, 2018, 11:01:04 AM PST, Kwizera
hugues
> >> > Teddy
> >> > > <
> >> > > > nbted2017@gmail.com> wrote:
> >> > > > >
> >> > > > > Hello,
> >> > > > > Same questions ,
> >> > > > > I would like to know how drill deal with this yarn
> fonctionality?
> >> > > > > Cheers.
> >> > > > >
> >> > > > > On Mon, Dec 17, 2018, 17:53 Charles Givre <cgivre@gmail.com
> >> wrote:
> >> > > > >
> >> > > > >> Hello all,
> >> > > > >> We are trying to set up a Drill cluster on our corporate
data
> >> lake.
> >> > > Our
> >> > > > >> cluster requires dynamic YARN queue allocation for multi-tenant
> >> > > > >> environment.  Is this something that Drill supports
or is
> there a
> >> > > > >> workaround?
> >> > > > >> Thanks!
> >> > > > >> —C
> >> > > >
> >> >
> >>
> >

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message