jackrabbit-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Jackrabbit Wiki] Update of "Oakathon November 2017" by MattRyan
Date Tue, 14 Nov 2017 08:19:38 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification.

The "Oakathon November 2017" page has been changed by MattRyan:
https://wiki.apache.org/jackrabbit/Oakathon%20November%202017?action=diff&rev1=34&rev2=35

  = Prep Work =
  
  = Notes from the Oakathon =
+ == Providing JCR Node information to DataStore ==
+ There are two main cases to consider:  Creating a new blob and accessing an existing blob.
+  * Blob creation is done in the DataStore interface via addRecord(InputStream).  Options
discussed (in order of preference) were:
+   * Creating a new InputStream implementation that includes node information in the input
stream.  When the DataStore reads the stream if the stream is of the new implementation type
it will pull the node information out of the stream and then send the rest along, or something
like that.
+   * Extend the Jackrabbit DataStore API to also support addRecord() with additional node
information.  This would not replace the addRecord(InputStream) method, but would be additive
(and, we admit, not strictly compliant with the JCR spec).
+   * Use an existing deprecated method that might suit this purpose.  We dislike this, obviously,
because we would be knowingly using a deprecated method.
+  * Accessing an existing blob is ''usually'' but not always via a DataIdentifier.
+   * We considered encoding additional information into the DataIdentifier, but we are leaning
away from that for a few reasons:
+    * Encoding node information into the identifier means that blob ids would have to change
if any of the node information were ever to change, like adding a property or moving the blob
to a different path.  It also presents complications for supporting binary deduplication (the
same blob may be stored at two different paths).
+    * Encoding a data store identifier into the blob id has similar issues if the blob were
to be moved from one data store to another.
+    * Taking this step also creates a data migration issue for existing users.
+   * Instead we discussed that the CompositeDataStore can assume the responsibility for mapping
DataIdentifiers to delegate data stores.  This was basically considered a requirement for
CompositeDataStore anyway (via Bloom filters).  The CompositeDataStore would need to load
existing identifiers at startup time to do this.  We might be able to get the DataIdentifiers
via the blob tracker.
  
+ During the meeting we also brought up a number of issues that need to be verified with CompositeDataStore,
related to this topic:
+  * We need to check initiation of CompositeDataStore in the system and make sure that caching
gets initialized correctly.
+  * We need to check DSGC in the production system/test system use case.  Since the test
system accesses the production data store read-only, does it also participate in the mark
phase?  If binaries are marked for delete in production, do they end up getting deleted from
the production data store?
+  * In the production system/test system use case, can the test system reuse or share the
index segments from the production system, or is the test system required to rebuild the indexes
for the test system use?  This may take so long that it limits the usefulness of the test
system, so this needs to be understood.  How would this work if the production instance is
doing active deletion of Lucene indexes?
+   * Can we clone an instance and also clone the index segments if active deletion is being
used?  Since the clone only happens from the node store's head state, would the clone care
about other information not at the head state?
+   * Is it okay to have separate index segments between both (and rebuild them), or copy
them and update them for the local system, or would it be better to try to share the index
segments?
+ 
+ Finally, we discussed what we may consider to be the first use case of this capability in
Oak.  Initially Matt proposed that allowing the CompositeDataStore to select a delegate based
on path information may be the first use case.  Another suggestion (Amit? Vikas?) was that
a smaller use case might exist just within Oak to use a CompositeDataStore and store only
index segments in one delegate and everything else in the other.  In that case this would
happen entirely within Oak and the user would not be aware that a CompositeDataStore was being
used.
+ 

Mime
View raw message