spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takeshi Yamamuro <linguin....@gmail.com>
Subject Re: Thrift JDBC server - why only one per machine and only yarn-client
Date Sat, 02 Jul 2016 07:49:16 GMT
This is probably because the current thrift-server implementation has
`SparkContext` inside
(See:
https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala#L34
).
To support yarn-cluster, we need to add a lots of functionalities to deploy
the thrift-server itself in a cluster.
However, istm there are many technical issues around this.

// maropu

On Fri, Jul 1, 2016 at 1:38 PM, Egor Pahomov <pahomov.egor@gmail.com> wrote:

> What about yarn-cluster mode?
>
> 2016-07-01 11:24 GMT-07:00 Egor Pahomov <pahomov.egor@gmail.com>:
>
>> Separate bad users with bad quires from good users with good quires.
>> Spark do not provide no scope separation out of the box.
>>
>> 2016-07-01 11:12 GMT-07:00 Jeff Zhang <zjffdu@gmail.com>:
>>
>>> I think so, any reason you want to deploy multiple thrift server on one
>>> machine ?
>>>
>>> On Fri, Jul 1, 2016 at 10:59 AM, Egor Pahomov <pahomov.egor@gmail.com>
>>> wrote:
>>>
>>>> Takeshi, of course I used different HIVE_SERVER2_THRIFT_PORT
>>>> Jeff, thanks, I would try, but from your answer I'm getting the
>>>> feeling, that I'm trying some very rare case?
>>>>
>>>> 2016-07-01 10:54 GMT-07:00 Jeff Zhang <zjffdu@gmail.com>:
>>>>
>>>>> This is not a bug, because these 2 processes use the
>>>>> same SPARK_PID_DIR which is /tmp by default.  Although you can resolve
this
>>>>> by using different SPARK_PID_DIR, I suspect you would still have other
>>>>> issues like port conflict. I would suggest you to deploy one spark thrift
>>>>> server per machine for now. If stick to deploy multiple spark thrift
server
>>>>> on one machine, then define different SPARK_CONF_DIR, SPARK_LOG_DIR and
>>>>> SPARK_PID_DIR for your 2 instances of spark thrift server. Not sure if
>>>>> there's other conflicts. but please try first.
>>>>>
>>>>>
>>>>> On Fri, Jul 1, 2016 at 10:47 AM, Egor Pahomov <pahomov.egor@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I get
>>>>>>
>>>>>> "org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running
as
>>>>>> process 28989.  Stop it first."
>>>>>>
>>>>>> Is it a bug?
>>>>>>
>>>>>> 2016-07-01 10:10 GMT-07:00 Jeff Zhang <zjffdu@gmail.com>:
>>>>>>
>>>>>>> I don't think the one instance per machine is true.  As long
as you
>>>>>>> resolve the conflict issue such as port conflict, pid file, log
file and
>>>>>>> etc, you can run multiple instances of spark thrift server.
>>>>>>>
>>>>>>> On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov <pahomov.egor@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hi, I'm using Spark Thrift JDBC server and 2 limitations
are really
>>>>>>>> bother me -
>>>>>>>>
>>>>>>>> 1) One instance per machine
>>>>>>>> 2) Yarn client only(not yarn cluster)
>>>>>>>>
>>>>>>>> Are there any architectural reasons for such limitations?
About
>>>>>>>> yarn-client I might understand in theory - master is the
same process as a
>>>>>>>> server, so it makes some sense, but it's really inconvenient
- I need a lot
>>>>>>>> of memory on my driver machine. Reasons for one instance
per machine I do
>>>>>>>> not understand.
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>>
>>>>>>>> *Sincerely yoursEgor Pakhomov*
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards
>>>>>>>
>>>>>>> Jeff Zhang
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>> *Sincerely yoursEgor Pakhomov*
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards
>>>>>
>>>>> Jeff Zhang
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> *Sincerely yoursEgor Pakhomov*
>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>>
>>
>> --
>>
>>
>> *Sincerely yoursEgor Pakhomov*
>>
>
>
>
> --
>
>
> *Sincerely yoursEgor Pakhomov*
>



-- 
---
Takeshi Yamamuro

Mime
View raw message