atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Mestry (Jira)" <j...@apache.org>
Subject [jira] [Updated] (ATLAS-3762) Entity Creation: Improve Edges Fetch Between Vertices
Date Fri, 01 May 2020 01:07:00 GMT

     [ https://issues.apache.org/jira/browse/ATLAS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ashutosh Mestry updated ATLAS-3762:
-----------------------------------
    Attachment: ATLAS-3762-Edge-fetch-improvement-gremlin.patch

> Entity Creation: Improve Edges Fetch Between Vertices
> -----------------------------------------------------
>
>                 Key: ATLAS-3762
>                 URL: https://issues.apache.org/jira/browse/ATLAS-3762
>             Project: Atlas
>          Issue Type: Improvement
>            Reporter: Ashutosh Mestry
>            Assignee: Ashutosh Mestry
>            Priority: Major
>         Attachments: ATLAS-3762-Edge-fetch-improvement-gremlin.patch, ATLAS-3762-Improve-Edge-creator-using-Genuine-iterat.patch
>
>
> *Background*
> One of the earlier commits replaced vertices and edges fetch with _StreamSupport.stream_.
This uses _Collect(toList),_ which causes all contents to be fetched. 
> Using this causes large amount of data to be fetched.
> *Solution*
> Switch to iterators that will use lazy loading.
> *Edge Fetch Refactoring*
> Change the _getEdge_ to iterate on smaller dataset. 
> Here are the scenarios:
> - _fromVertex_ is _hive_table_, _toVertex_ is _hive_column_. This means that outgoing
edges from _fromVertex_ will be many more than incoming edges to _toVertex_.
> - _fromVertex_ is _hive_process_execution_, _toVertex_ is _hive_table_. This means
that outgoing edges from _fromVertex_ will be fewer than incoming edges _hive_table_.
> Approach:
>  * Search it is a linear search, it will be more efficient to iterate over fewer items
than more items.
>  * Fetch count edge items for _fromVertex_ and _toVertex_. If either of the count
is 0, return NULL, since it will not result in anything being found.
>  * If either of the counts is not 0, take the one with fewer elements and perform a search.
> [~sidharthkmishra] Thanks for this simple but effective fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message