atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Madhan Neethiraj <mad...@apache.org>
Subject Re: Review Request 73010: Re-Indexing Implemented as JAVA_PATCH
Date Wed, 11 Nov 2020 07:13:09 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/73010/#review222198
-----------------------------------------------------------




graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraphManagement.java
Lines 364 (patched)
<https://reviews.apache.org/r/73010/#comment311256>

    JanusGraph documentation suggests that entire index can be rebuilt with the following
(https://docs.janusgraph.org/index-management/index-reindexing/). If you haven't considered
this option, please take a look.
    
        m = graph.openManagement()
        i = m.getGraphIndex('names')
        m.updateIndex(i, SchemaAction.REINDEX)
        m.commit()



intg/src/main/java/org/apache/atlas/AtlasConfiguration.java
Lines 77 (patched)
<https://reviews.apache.org/r/73010/#comment311252>

    atlas.patch.reindex.enabled => atlas.rebuild.index



repository/src/main/java/org/apache/atlas/repository/patches/AtlasPatchManager.java
Lines 58 (patched)
<https://reviews.apache.org/r/73010/#comment311253>

    Consider executing this patch before other patches listed above. I can help avoid rebuilding
index for attributes updated in  above patches.



repository/src/main/java/org/apache/atlas/repository/patches/ReIndexPatch.java
Lines 116 (patched)
<https://reviews.apache.org/r/73010/#comment311254>

    Object => AtlasEdge



repository/src/main/java/org/apache/atlas/repository/patches/ReIndexPatch.java
Lines 147 (patched)
<https://reviews.apache.org/r/73010/#comment311255>

    Consider marking following members as final, as they  are assigned only  in the constructor:
     - list
     - graph
     - indexNames


- Madhan Neethiraj


On Nov. 10, 2020, 5:48 p.m., Ashutosh Mestry wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/73010/
> -----------------------------------------------------------
> 
> (Updated Nov. 10, 2020, 5:48 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath
Subramanian.
> 
> 
> Bugs: ATLAS-4015
>     https://issues.apache.org/jira/browse/ATLAS-4015
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> **Background**
> Please see JIRA.
> Re-indexing within Atlas was implemented so far as an external tool. Using this tool
had number of challenges. The biggest being the throughput of the tool. For a medium sized
Atlas repository, the tool could take days to finish.
> 
> The implementation addresses the problems. (See results below.)
> 
> **Approach**
> Re-indexing is now implemented as a JAVA_PATCH that is applied only when the property
_atlas.patch.reindex.enabled_ is set to true.
> 
> *Modified* AtlasJanusGraphManagement: New method _reindex_ implements the re-indexing
logic.
> *New* _ReIndexPatch_ is a JAVA_PATCH that implements the reindexing logic. This uses
the PC framework to enumerate vertices and edges. The patch application displays useful log
messages indicating progress.
> 
> **Configuration**
> _atlas.patch.reindex.enabled=true_
> 
> 
> Diffs
> -----
> 
>   graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasGraphManagement.java
f7d2e273c 
>   graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraphManagement.java
2a2ef92a7 
>   intg/src/main/java/org/apache/atlas/AtlasConfiguration.java 1c7915859 
>   repository/src/main/java/org/apache/atlas/repository/patches/AtlasPatchManager.java
b142a2a4a 
>   repository/src/main/java/org/apache/atlas/repository/patches/ConcurrentPatchProcessor.java
c6f0e6438 
>   repository/src/main/java/org/apache/atlas/repository/patches/ReIndexPatch.java PRE-CREATION

> 
> 
> Diff: https://reviews.apache.org/r/73010/diff/2/
> 
> 
> Testing
> -------
> 
> **Test Setup**
> Start with a known Atlas setup with known data. Ascetain that basic search yields results.
> 
> Use these CURL commands to delete Solr indexes:
> 
> curl http://<host>:8983/solr/vertex_index/update?commit=true  -H "Content-Type:
text/xml" --data-binary '<delete><query>b2d_t:*</query></delete>'
> 
> curl http://<host>:8983/solr/edge_index/update?commit=true  -H "Content-Type: text/xml"
--data-binary '<delete><query>1151_t:*</query></delete>'
> 
> curl http://ve0128.halxg.cloudera.com:8983/solr/fulltext_index/update?commit=true  -H
"Content-Type: text/xml" --data-binary '<delete><query>14at_t:*</query></delete>'
> 
> This will delete solr indexes. If basic search is performed from within the web UI, it
will not show any results.
> 
> Now set configuration parameter. Restart Atlas.
> 
> Server-side logs will indicate that the patch is run.
> 
> **Volume Testing**
> __Test 1__
> 
> Vertices: ~16M: Duration: ~5 hrs.
> Edges: ~122M: ~6 hrs.
> 
> Configuration parameters:
> atlas.patch.reindex.enabled=true
> atlas.patch.numWorkers=14
> atlas.patch.batchSize=1000
> 
> Node configuration:
> Atlas: Heap size: 6 GB.
> Solr: Heap size: 12 GB.
> 
> __Test 2__
> 
> Vertices: ~21M: Duration: ~5 hrs.
> Edges: ~31M: ~4 hrs.
> 
> Configuration parameters:
> atlas.patch.reindex.enabled=true
> atlas.patch.numWorkers=44
> atlas.patch.batchSize=1000
> 
> Node configuration:
> Atlas: Heap size: 6 GB.
> Solr: Heap size: 12 GB.
> 
> 
> **PC Build**
> https://ci-builds.apache.org/job/Atlas/job/PreCommit-ATLAS-Build-Test/177/
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message