spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shixiong Zhu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-5124) Standardize internal RPC interface
Date Fri, 09 Jan 2015 09:35:34 GMT

    [ https://issues.apache.org/jira/browse/SPARK-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270809#comment-14270809
] 

Shixiong Zhu commented on SPARK-5124:
-------------------------------------

{quote}
1. For DAGScheduler, we are probably OK to just use an event loop instead of an actor. Just
put some messages into a queue, and have a loop that processes the queue. Otherwise we are
making every call to DAGScheduler going through a socket and that can severely impact scheduler
throughput. (Although I haven't looked closely at your change so maybe you are doing a different
thing here)
{quote}

In my current implementation, DAGScheduler still uses Actor. A LocalActorRef should not pass
through the socket. However, for better performance, we can use a Multi-Producer-Single-Consumer(MPSC)
queue to bypass Akka.

{quote}
2. Is it really that expensive to listen to network events that warrant a separate NetworkRpcEndpoint?
{quote}

I created a NetworkRpcEndpoint to avoid to do the following pattern matching checks for every
message for RpcEndpoints not interested in them.

{code}
case AssociatedEvent(_, remoteAddress, _) =>
  ...
case DisassociatedEvent(_, remoteAddress, _) =>
  ...
case AssociationErrorEvent(cause, localAddress, remoteAddress, inbound, _) =>
  ...
{code}

{quote}
Thread-safe contract when processing messages (similar to Akka).
{quote}

The thread-safe contract means there is only one method of RpcEndpoint will be called at the
same time, just like Actor. Without this property, RpcEndpoint will need a lock to protect
its data. However, considering the complex logical of so many RpcEndpoints, it may lead to
dead-lock.

{quote}
A simple fault tolerance(if a RpcEndpoint is crashed, restart it or stop it).
{quote}

It means "Any error thrown by `onStart`, `receive` and `onStop` will be sent to `onError`.
If onError throws an error, it will force RpcEndpoint to
 restart by creating a new one."

"restart" maybe not a proper way. But an `onError` which is used to handle all errors is better
than requiring RPCEndPoint never have uncaught exceptions (need to write many try-catch codes)

> Standardize internal RPC interface
> ----------------------------------
>
>                 Key: SPARK-5124
>                 URL: https://issues.apache.org/jira/browse/SPARK-5124
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Reynold Xin
>            Assignee: Shixiong Zhu
>         Attachments: Pluggable RPC - draft 1.pdf
>
>
> In Spark we use Akka as the RPC layer. It would be great if we can standardize the internal
RPC interface to facilitate testing. This will also provide the foundation to try other RPC
implementations in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message