drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <par0...@yahoo.com.INVALID>
Subject Re: Drill on YARN Questions
Date Fri, 18 Jan 2019 19:21:21 GMT
Hi,

Sorry for the delay, was traveling.

If you can access the web UI from a browser, this means that 1) your server has the right
certificates, and 2) your browser has the correct files to validate the certificates.

If TLS does not work from Java, then this may simply mean that you have to configure TLS for
the Java install used to run the DoY command line. I'm not an expert on this, but many articles
exist. Basically, you need to ensure that the certificate for your signing authority is available
to Java. If you are using self-signed certificates, you need the authority for your internal
certificates.

I'm skating on thin ice here as I've not done this in a while. Abhishek or Sorabh, can you
provide more details?

The way to check this is to write a very simple Java program that sends that resize URL to
the DoY app master. If you get that to work in your own test app, then it will work in the
DoY client. All the DoY client does for resize is parse some the command, work though some
config settings, and issue a REST call to the AM. 

Thanks,
- Paul

 

    On Tuesday, January 15, 2019, 2:39:40 AM PST, Kwizera hugues Teddy <nbted2017@gmail.com>
wrote:  
 
 hello Paul,

Yes, I can access AM web UI.

I remark that the problem it's caused by SSL/TLS access Enabled( ssl-enabled:
true)_.

- https://xxxxxxxxx:10048/rest/status  : work fine in the browser

I think I have to deal with certificates on AM host. Do you have an idea?

Thanks.

On Mon, Jan 14, 2019 at 4:53 PM Paul Rogers <par0328@yahoo.com.invalid>
wrote:

> Hi,
>
> Can you reach the AM web UI? The Web UI URL was shown below. It also
> should have been given when you started DoY.
>
> I notice that you're using SSL/TLS access. Doing so requires the right
> certificates on the AM host. Again, trying to connect via your browser may
> help identify if that works.
>
> If the Web UI works, then check the host name and port number in your
> browser compared to that shown in the error message.
>
> The resize command on the command line does nothing other than some
> validation, then it sends the URL shown below. You can try entering the URL
> directly into your browser. Again, if that fails, there is something amiss
> with your config. If that works, then we'll have to figure out what might
> be wrong with the DoY command line tool.
>
> Please try out the above and let us know what you learn.
>
> Thanks,
> - Paul
>
>
>
>    On Monday, January 14, 2019, 7:30:44 AM PST, Kwizera hugues Teddy <
> nbted2017@gmail.com> wrote:
>
>  Hello all,
>
> I am experiencing an error on Resize and Status .
> The errors are from the REST call on the AM.
>
> command : $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE status
> Result:
> Application ID: xxxxxxxxxxxxxxxx Application State: RUNNING Host:
> xxxxxxxxxxxxxxxx Queue: root.xxxxx.default User: xxxxxxxx Start Time:
> 2019-01-14 14:56:29 Application Name: Drill-on-YARN-cluster_01 Tracking
> URL: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Failed to get AM
> status
> REST request failed: https://xxxxxxxxxxxxxxx:9048/rest/status
>
> Command : $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE resize
> Result :
>      Resizing cluster for Application ID:
> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>      Resize failed: REST request failed:
> https://xxxxxxxxxxxxxxx:9048/rest/shrink/1
>
>  I didn't found how I can resolve this issue. maybe someone can help me
>
> Thanks.
>
>
>
> On Sat, Jan 12, 2019 at 8:30 AM Kwizera hugues Teddy <nbted2017@gmail.com>
> wrote:
>
> > Hello ,
> >
> > Other option work .
> >
> > As you say an update is needed in docs  and the remove of wrong
> > information.
> >
> > Thanks.
> >
> > On Sat, Jan 12, 2019, 08:10 Abhishek Girish <agirish@apache.org wrote:
> >
> >> Hello Teddy,
> >>
> >> I don't recollect a restart option for the drill-on-yarn.sh script. I've
> >> always used a combination of stop and start, like Paul mentions. Could
> you
> >> please try if that works and get back to us? We could certainly have a
> >> minor enhancement to support restart - until then i'll request Bridget
> to
> >> update the documentation.
> >>
> >> Regards,
> >> Abhishek
> >>
> >> On Fri, Jan 11, 2019 at 11:05 PM Kwizera hugues Teddy <
> >> nbted2017@gmail.com>
> >> wrote:
> >>
> >> > Hello Paul ,
> >> >
> >> > Thanks you for your response with some interesting information(files
> in
> >> > /tmp).
> >> >
> >> > For my side all other command line  work
> normally(start|stop|status...|)
> >> > but no restart(this option not recognized). I tried to search the code
> >> > source and I found that the restart command is not implemented . then
> I
> >> > wonder why the documentation does not match the source code ?.
> >> >
> >> > Thanks .Teddy
> >> >
> >> >
> >> > On Sat, Jan 12, 2019, 02:39 Paul Rogers <par0328@yahoo.com.invalid
> >> wrote:
> >> >
> >> > > Let's try to troubleshoot. Does the combination of stop and start
> >> work?
> >> > If
> >> > > so, then there could be a bug with the restart command itself.
> >> > >
> >> > > If neither start nor stop work, it could be that you are missing the
> >> > > application ID file created when you first started DoY. Some
> >> background.
> >> > >
> >> > > When we submit an app to YARN, YARN gives us an app ID. We need this
> >> in
> >> > > order to track down the app master for DoY so we can send it
> commands
> >> > later.
> >> > >
> >> > > When the command line tool starts DoY, it writes the YARN app ID to
> a
> >> > > file. Can't remember the details, but it is probably in the
> >> $DRILL_SITE
> >> > > directory. The contents are, as I recall, a long hexadecimal string.
> >> > >
> >> > > When you invoke the command line, the tool reads this file to figure
> >> to
> >> > > track down the DoY app master. The tool then sends commands to the
> app
> >> > > master: in this case, a request to shut down. Then, for reset, the
> >> tool
> >> > > will communicate with YARN to start a new instance.
> >> > >
> >> > > The tool is suppose to give detailed error messages. Did you get
> any?
> >> > That
> >> > > might tell us which of these steps failed.
> >> > >
> >> > > Can you connect to the DoY Web UI at the URL provided when you
> started
> >> > > DoY? If you can, this means that the DoY App Master is up and
> running.
> >> > >
> >> > > Are you running the client from the same node on which you started
> it?
> >> > > That file I mentioned is local to the "DoY client" machine; it is
> not
> >> in
> >> > > DFS.
> >> > >
> >> > > Then, there is one more very obscure bug you can check. On some
> >> > > distributions, the YARN task files are written to the /tmp
> directory.
> >> > Some
> >> > > Linux systems remove these files from time to time. Once the files
> are
> >> > > gone, YARN can no longer control its containers: it won't be able
to
> >> stop
> >> > > the app master or the Drillbit containers. There are two fixes.
> >> First, go
> >> > > kill all the processes by hand. Then, move the YARN state files out
> of
> >> > > /tmp, or exclude YARN's files from the periodic cleanup.
> >> > >
> >> > > Try some of the above and let us know what you find.
> >> > >
> >> > > Also, perhaps Abhishek can offer some suggestions as he tested the
> >> heck
> >> > > out of the feature and may have additional suggestions.
> >> > >
> >> > > Thanks,
> >> > > - Paul
> >> > >
> >> > >
> >> > >
> >> > >    On Friday, January 11, 2019, 7:46:55 AM PST, Kwizera hugues Teddy
> >> <
> >> > > nbted2017@gmail.com> wrote:
> >> > >
> >> > >  hello,
> >> > >
> >> > >  2 weeks ago, I began to discover DoY. Today by reading drill
> >> documents (
> >> > > https://drill.apache.org/docs/appendix-a-release-note-issues/ ) I
> saw
> >> > that
> >> > > we can restart drill cluster by :
> >> > >
> >> > >  $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE restart
> >> > >
> >> > > But doesn't work when I tested it.
> >> > >
> >> > > No idea about it?
> >> > >
> >> > > Thanks.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Jan 2, 2019 at 3:18 AM Paul Rogers
> <par0328@yahoo.com.invalid
> >> >
> >> > > wrote:
> >> > >
> >> > > > Hi Charles,
> >> > > >
> >> > > > Your engineers have identified a common need, but one which is
> very
> >> > > > difficult to satisfy.
> >> > > >
> >> > > > TL;DR: DoY gets as close to the requirements as possible within
> the
> >> > > > constraints of YARN and Drill. But, future projects could do
more.
> >> > > >
> >> > > > Your engineers want resource segregation among tenants:
> >> multi-tenancy.
> >> > > > This is very difficult to achieve at the application level.
> Consider
> >> > > Drill.
> >> > > > It would need some way to identify users to know which tenant
they
> >> > belong
> >> > > > to. Then, Drill would need a way to enqueue users whose queries
> >> would
> >> > > > exceed the memory or CPU limit for that tenant. Plus, Drill would
> >> have
> >> > to
> >> > > > be able to limit memory and CPU for each query. Much work has
been
> >> done
> >> > > to
> >> > > > limit memory, but CPU is very difficult. Mature products such
as
> >> > Teradata
> >> > > > can do this, but Teradata has 40 years of effort behind it.
> >> > > >
> >> > > > Since it is hard to build multi-tenancy in at the app level (not
> >> > > > impossible, just very, very hard), the thought is to apply it
at
> the
> >> > > > cluster level. This is done in YARN via limiting the resources
> >> > available
> >> > > to
> >> > > > processes (typically map/reduce) and to limit the number of
> running
> >> > > > processes. Works for M/R because each map task uses disk to
> shuffle
> >> > > results
> >> > > > to a reduce task, so map and reduce tasks can run asynchronously.
> >> > > >
> >> > > > For tools such as Drill, which do in-memory processing (really,
> >> > > > across-the-network exchanges), both the sender and receiver have
> to
> >> run
> >> > > > concurrently. This is much harder to schedule than async m/r
> tasks:
> >> it
> >> > > > means that the entire Drill cluster (of whatever size) be up
and
> >> > running
> >> > > to
> >> > > > run a query.
> >> > > >
> >> > > > The start-up time for Drill is far, far longer than a query.
So,
> it
> >> is
> >> > > not
> >> > > > feasible to use YARN to launch a Drill cluster for each query
the
> >> way
> >> > you
> >> > > > would do with Spark. Instead, under YARN, Drill is a long running
> >> > service
> >> > > > that handles many queries.
> >> > > >
> >> > > > Obviously, this is not ideal: I'm sure your engineers want to
use
> a
> >> > > > tenant's resources for Drill when running queries, else for Spark,
> >> > Hive,
> >> > > or
> >> > > > maybe TensorFlow. If Drill has to be long-running, I'm sure they's
> >> like
> >> > > to
> >> > > > slosh resources between tenants as is done in YARN. As noted
> above,
> >> > this
> >> > > is
> >> > > > a hard problem that DoY did not attempt to solve.
> >> > > >
> >> > > > One might suggest that Drill grab resources from YARN when Tenant
> A
> >> > wants
> >> > > > to run a query, and release them when that tenant is done,
> grabbing
> >> new
> >> > > > resources when Tenant B wants to run. Impala tried this with
Llama
> >> and
> >> > > > found it did not work. (This is why DoY is quite a bit simpler;
no
> >> > reason
> >> > > > to rerun a failed experiment.)
> >> > > >
> >> > > > Some folks are looking to Kubernetes (K8s) as a solution. But,
> that
> >> > just
> >> > > > replaces YARN with K8s: Drill is still a long-running process.
> >> > > >
> >> > > > To solve the problem you identify, you'll need either:
> >> > > >
> >> > > > * A bunch of work in Drill to build multi-tenancy into Drill,
or
> >> > > > * A cloud-like solution in which each tenant spins up a Drill
> >> cluster
> >> > > > within its budget, spinning it down, or resizing it, to stay
with
> an
> >> > > > overall budget.
> >> > > >
> >> > > > The second option can be achieved under YARN with DoY, assuming
> that
> >> > DoY
> >> > > > added support for graceful shutdown (or the cluster is reduced
in
> >> size
> >> > > only
> >> > > > when no queries are active.) Longer-term, a more modern solution
> >> would
> >> > be
> >> > > > Drill-on-Kubernetes (DoK?) which Abhishek started on.
> >> > > >
> >> > > > Engineering is the art of compromise. The question for your
> >> engineers
> >> > is
> >> > > > how to achieve the best result given the limitations of the
> software
> >> > > > available today. At the same time, helping the Drill community
> >> improve
> >> > > the
> >> > > > solutions over time.
> >> > > >
> >> > > > Thanks,
> >> > > > - Paul
> >> > > >
> >> > > >
> >> > > >
> >> > > >    On Sunday, December 30, 2018, 9:38:04 PM PST, Charles Givre
<
> >> > > > cgivre@gmail.com> wrote:
> >> > > >
> >> > > >  Hi Paul,
> >> > > > Here’s what our engineers said:
> >> > > >
> >> > > > From Paul’s response, I understand that there is a slight
> confusion
> >> > > around
> >> > > > how multi-tenancy has been enabled in our data lake.
> >> > > >
> >> > > > Some more details on this –
> >> > > >
> >> > > > Drill already has the concept of multitenancy where we can have
> >> > multiple
> >> > > > drill clusters running on the same data lake enabled through
> >> different
> >> > > > ports and zookeeper. But, all of this is launched through the
same
> >> hard
> >> > > > coded yarn queue that we provide as a config parameter.
> >> > > >
> >> > > > In our data lake, each tenant has a certain amount of compute
> >> capacity
> >> > > > allotted to them which they can use for their project work. This
> is
> >> > > > provisioned through individual YARN queues for each tenant
> (resource
> >> > > > caging). This restricts the tenants from using cluster resources
> >> > beyond a
> >> > > > certain limit and not impacting other tenants at the same time.
> >> > > >
> >> > > > Access to these YARN queues is provisioned through ACL
> memberships.
> >> > > >
> >> > > > ——
> >> > > >
> >> > > > Does this make sense?  Is this possible to get Drill to work
in
> this
> >> > > > manner, or should we look into opening up JIRAs and working on
new
> >> > > > capabilities?
> >> > > >
> >> > > >
> >> > > >
> >> > > > > On Dec 17, 2018, at 21:59, Paul Rogers
> <par0328@yahoo.com.INVALID
> >> >
> >> > > > wrote:
> >> > > > >
> >> > > > > Hi Kwizera,
> >> > > > > I hope my answer to Charles gave you the information you
need.
> If
> >> > not,
> >> > > > please check out the DoY documentation or ask follow-up questions.
> >> > > > > Key thing to remember: Drill is a long-running YARN service;
> >> queries
> >> > DO
> >> > > > NOT go through YARN queues, they go through Drill directly.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > - Paul
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >    On Monday, December 17, 2018, 11:01:04 AM PST, Kwizera
hugues
> >> > Teddy
> >> > > <
> >> > > > nbted2017@gmail.com> wrote:
> >> > > > >
> >> > > > > Hello,
> >> > > > > Same questions ,
> >> > > > > I would like to know how drill deal with this yarn
> fonctionality?
> >> > > > > Cheers.
> >> > > > >
> >> > > > > On Mon, Dec 17, 2018, 17:53 Charles Givre <cgivre@gmail.com
> >> wrote:
> >> > > > >
> >> > > > >> Hello all,
> >> > > > >> We are trying to set up a Drill cluster on our corporate
data
> >> lake.
> >> > > Our
> >> > > > >> cluster requires dynamic YARN queue allocation for multi-tenant
> >> > > > >> environment.  Is this something that Drill supports
or is
> there a
> >> > > > >> workaround?
> >> > > > >> Thanks!
> >> > > > >> —C
> >> > > >
> >> >
> >>
> >  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message