tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hitesh Sharma <hit...@microsoft.com.INVALID>
Subject Re: Custom routing in EdgeManager
Date Wed, 20 Sep 2017 01:12:33 GMT
Fixing the formatting and resending..

I'm looking to add a custom edge manager which allows me to route events between two vertices
using some custom protocol. For instance I want to say that DataMovementEvent(s) from the
tasks in the source vertex should be routed to the tasks in the destination  vertex based
on the fact whether the tasks are in the same rack or not (or for that matter use some other
key to route events between the tasks in the two stages). To do this I implemented my own
EdgeManagerPluginOnDemand derivative but I see it has two APIs  for routing the events:

routeDataMovementEventToDestination(int sourceTaskIndex, int sourceOutputIndex, int destinationTaskIndex)
routeDataMovementEventToDestination(DataMovementEvent event, int sourceTaskIndex, int sourceOutputIndex,
Map<Integer,List<Integer>> destinationTaskAndInputIndices)

My questions are:

- What's the difference between the two APIs and which one is to be used? The API with DataMovementEvent
doesn't seem to be getting called with ScatterGather edge manager and others.
- If this API is deprecated then it is not sufficient in my case to do the routing as I need
some more metadata, which I could have got from the DataMovementEvent payload for e.g., so
what options do I have here?

Thanks,
Hitesh


From: Hitesh Sharma <hitesh@microsoft.com.INVALID>
Sent: Tuesday, September 19, 2017 4:24 PM
To: dev@tez.apache.org
Subject: Custom routing in EdgeManager
    
Hello,


I'm looking to add a custom edge manager which allows me to route events between two vertices
using some custom protocol. For instance I want to say that DataMovementEvent(s) from the
tasks in the source vertex should be routed to the tasks in the destination  vertex based
on the fact whether the tasks are in the same rack or not (or for that matter use some other
key to route events between the tasks in the two stages). To do this I implemented my own
EdgeManagerPluginOnDemand derivative but I see it has two APIs  for routing the events:


routeDataMovementEventToDestination<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftez.apache.org%2Freleases%2F0.8.2%2Ftez-api-javadocs%2Forg%2Fapache%2Ftez%2Fdag%2Fapi%2FEdgeManagerPluginOnDemand.html%23routeDataMovementEventToDestination(int%2C%2520int%2C%2520int&data=02%7C01%7Chitesh%40microsoft.com%7C5e63bce773a245622e2608d4ffb595eb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636414602750314036&sdata=sjKKq5DYkoX%2Bp5kdtW%2BPIdaXJQFd1Kk0nOn6iDN%2FDBI%3D&reserved=0)>(int
 sourceTaskIndex, int sourceOutputIndex, int destinationTaskIndex)


routeDataMovementEventToDestination<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftez.apache.org%2Freleases%2F0.8.2%2Ftez-api-javadocs%2Forg%2Fapache%2Ftez%2Fdag%2Fapi%2FEdgeManagerPluginOnDemand.html%23routeDataMovementEventToDestination(org.apache.tez.runtime.api.events.DataMovementEvent%2C%2520int%2C%2520int%2C%2520java.util.Map&data=02%7C01%7Chitesh%40microsoft.com%7C5e63bce773a245622e2608d4ffb595eb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636414602750314036&sdata=ikq%2Bv1Z18Xg%2FGWmuU0luja6k%2Bil2Xu0XvzAbObp6Zkg%3D&reserved=0)>(DataMovementEvent<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftez.apache.org%2Freleases%2F0.8.2%2Ftez-api-javadocs%2Forg%2Fapache%2Ftez%2Fruntime%2Fapi%2Fevents%2FDataMovementEvent.html&data=02%7C01%7Chitesh%40microsoft.com%7C5e63bce773a245622e2608d4ffb595eb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636414602750314036&sdata=0SYlziVPD4mAeNraRJrUvzHeKLloMMTNcYGjiVA1KK0%3D&reserved=0>
 event, int sourceTaskIndex, int sourceOutputIndex, Map<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdocs.oracle.com%2Fjavase%2F7%2Fdocs%2Fapi%2Fjava%2Futil%2FMap.html%3Fis-external%3Dtrue&data=02%7C01%7Chitesh%40microsoft.com%7C5e63bce773a245622e2608d4ffb595eb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636414602750314036&sdata=U4spiJ%2FqEg4vrn2I3Ptr%2FJQ2OS7lWfD7pbwQCOEsJPE%3D&reserved=0><Integer<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdocs.oracle.com%2Fjavase%2F7%2Fdocs%2Fapi%2Fjava%2Flang%2FInteger.html%3Fis-external%3Dtrue&data=02%7C01%7Chitesh%40microsoft.com%7C5e63bce773a245622e2608d4ffb595eb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636414602750314036&sdata=vF8uWk6g20ss7%2BeOAoTerkk25FmM4P3WDRFT9byn4MY%3D&reserved=0>,List<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdocs.oracle.com%2Fjavase%2F7%2Fdocs%2Fapi%2Fjava%2Futil%2FList.html%3Fis-external%3Dtrue&data=02%7C01%7Chitesh%40microsoft.com%7C5e63bce773a245622e2608d4ffb595eb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636414602750314036&sdata=oXu0hEmF81ZIx5M6Tpqv7KJ%2FwGT%2FYGf%2BsqBeOhAGIxg%3D&reserved=0><Integer<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdocs.oracle.com%2Fjavase%2F7%2Fdocs%2Fapi%2Fjava%2Flang%2FInteger.html%3Fis-external%3Dtrue&data=02%7C01%7Chitesh%40microsoft.com%7C5e63bce773a245622e2608d4ffb595eb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636414602750314036&sdata=vF8uWk6g20ss7%2BeOAoTerkk25FmM4P3WDRFT9byn4MY%3D&reserved=0>>>
 destinationTaskAndInputIndices)

My questions is:


  *   What's the difference between the two APIs and which one is to be used? The API with
DataMovementEvent doesn't seem to be getting called with ScatterGather edge manager and others.
  *   If this API is deprecated then it is not sufficient in my case to do the routing
as I need some more metadata, which I could have got from the DataMovementEvent payload for
e.g., so what options do I have here?


Thanks,

Hitesh
    
Mime
View raw message