jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chetan Mehrotra (JIRA)" <j...@apache.org>
Subject [jira] [Created] (OAK-6671) Enable support for custom types in ExternalSort
Date Fri, 15 Sep 2017 04:36:00 GMT
Chetan Mehrotra created OAK-6671:
------------------------------------

             Summary: Enable support for custom types in ExternalSort
                 Key: OAK-6671
                 URL: https://issues.apache.org/jira/browse/OAK-6671
             Project: Jackrabbit Oak
          Issue Type: Technical task
          Components: commons
            Reporter: Chetan Mehrotra
            Assignee: Chetan Mehrotra
             Fix For: 1.8


ExternalSort currently sorts the file content as string. For some cases we need to sort the
content in custom way which is current facilitated via Comparator support. However in this
mode we need to deserialize the line in required format for enabling custom comparison which
adds overhead.

For e.g. consider a file having following file
{noformat}
/apps|{"8":"dat:2016-07-01T15:14:37.241+05:30","71":["nam:rep:AccessControllable"],"9":"admin","0":"nam:sling:Folder"}
/apps/assets|{"8":"dat:2016-07-01T15:37:38.598+05:30","9":"admin","0":"nam:nt:folder"}
{noformat}

This needs to be sorted on the basis of path and that too on per element basis. Currently
sorting a 50Gb file having 130M lines take 30 for a batch for 8M. Most of the time is spent
in extract the path structure. This can be avoided if ExternalSort support mapping line to
custom type and retain that type for the sorting phase

This would add slight memory overhead for cases where this feature is used. For normal case
no overhead would be present.

Would come up with a patch



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message