atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashutosh Mestry via Review Board <nore...@reviews.apache.org>
Subject Review Request 73010: Re-Indexing Implemented as JAVA_PATCH
Date Mon, 09 Nov 2020 21:45:45 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/73010/
-----------------------------------------------------------

Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.


Bugs: ATLAS-4015
    https://issues.apache.org/jira/browse/ATLAS-4015


Repository: atlas


Description
-------

**Background**
Please see JIRA.
Re-indexing within Atlas was implemented so far as an external tool. Using this tool had number
of challenges. The biggest being the throughput of the tool. For a medium sized Atlas repository,
the tool could take days to finish.

The implementation addresses the problems. (See results below.)

**Approach**
Re-indexing is now implemented as a JAVA_PATCH that is applied only when the property _atlas.patch.reindex.enabled_
is set to true.

*Modified* AtlasJanusGraphManagement: New method _reindex_ implements the re-indexing logic.
*New* _ReIndexPatch_ is a JAVA_PATCH that implements the reindexing logic. This uses the PC
framework to enumerate vertices and edges. The patch application displays useful log messages
indicating progress.

**Configuration**
_atlas.patch.reindex.enabled=true_


Diffs
-----

  graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasGraphManagement.java
f7d2e273c 
  graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraphManagement.java
2a2ef92a7 
  intg/src/main/java/org/apache/atlas/AtlasConfiguration.java 1c7915859 
  repository/src/main/java/org/apache/atlas/repository/patches/AtlasPatchManager.java b142a2a4a

  repository/src/main/java/org/apache/atlas/repository/patches/ConcurrentPatchProcessor.java
c6f0e6438 
  repository/src/main/java/org/apache/atlas/repository/patches/ReIndexPatch.java PRE-CREATION



Diff: https://reviews.apache.org/r/73010/diff/1/


Testing
-------

**Test Setup**
Start with a known Atlas setup with known data. Ascetain that basic search yields results.

Use these CURL commands to delete Solr indexes:

curl http://<host>:8983/solr/vertex_index/update?commit=true  -H "Content-Type: text/xml"
--data-binary '<delete><query>b2d_t:*</query></delete>'

curl http://<host>:8983/solr/edge_index/update?commit=true  -H "Content-Type: text/xml"
--data-binary '<delete><query>1151_t:*</query></delete>'

curl http://ve0128.halxg.cloudera.com:8983/solr/fulltext_index/update?commit=true  -H "Content-Type:
text/xml" --data-binary '<delete><query>14at_t:*</query></delete>'

This will delete solr indexes. If basic search is performed from within the web UI, it will
not show any results.

Now set configuration parameter. Restart Atlas.

Server-side logs will indicate that the patch is run.

**Volume Testing**
Vertices: ~16M: Duration: ~5 hrs.
Edges: ~122M: ~6 hrs.

**PC Build**
https://ci-builds.apache.org/job/Atlas/job/PreCommit-ATLAS-Build-Test/177/


Thanks,

Ashutosh Mestry


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message