directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Burgemeister (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DIRSTUDIO-1174) Directory Studio startup very slow due to schema LDIF processing
Date Sun, 26 Aug 2018 06:15:00 GMT

    [ https://issues.apache.org/jira/browse/DIRSTUDIO-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592805#comment-16592805
] 

Aaron Burgemeister edited comment on DIRSTUDIO-1174 at 8/26/18 6:14 AM:
------------------------------------------------------------------------

[EDIT: I see I missed a few comments while typing this; I'm building from source with the
latest commits to see how it performs now]

 

The replaceAll with newline followed by carriage return followed by a space is, in my opinion,
an invalid case since no OS does that (or, as far as I know, has ever done that):
 s = s.replaceAll( "\n\r ", "" ); //$NON-NLS-1$ //$NON-NLS-2$
 The carriage only line is valid for Mac OS 9 and earlier, but I am guessing almost nobody
runs that anymore since OS X debuted in 2001, and if they do they probably cannot get Directory
Studio on there.  Still, it's theoretically possible somebody could have an old file from
there sent to somebody else with Directory Studio.  If that is deemed too much of an unlikely
scenario, then we can take out this line:
 s = s.replaceAll( "\r ", "" ); //$NON-NLS-1$ //$NON-NLS-2$
 That leaves the windows carriage return followed by newline abomination, and the Linux/Unix/MacOSX/etc.
case of a simple newline.  Since all of these calls use the String object which is immutable,
all of those calls basically recreate the String each time, and while the regex part is probably
the slow part, the recreation of strings of this size probably does not help much either. 
It would be interesting to see which of the following performed best:
 s = s.replaceAll( "\r?\n ", "" ); //$NON-NLS-1$ //$NON-NLS-2$
 vs.
 s = s.replaceAll( "\r", "" ); //$NON-NLS-1$ //$NON-NLS-2$
 s = s.replaceAll( "\n ", "" ); //$NON-NLS-1$ //$NON-NLS-2$
 vs.
 s = s.replaceAll( "(?:\r\n)|(?:\n) ", "" ); //$NON-NLS-1$ //$NON-NLS-2$
  

Also, is there a reason we fold the lines in the schema files saved out by Directory Studio? 
If that is stopped, then a method to read schema files without trying to unfold them could
be used.  I suspect folding for internal use is not that helpful (I personally think folding
is not that helpful in general unless you really hate long lines, but this isn't meant for
humans as much as computers), though I am sure we need the current methods to properly handle
unfolding when getting files from outside Directory Studio.

I went into my ~/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core
directory and unfolded the schema files manually and the load time decreased a little (two
(2) to three (3) seconds), but since the replaceAll calls are still in there I would expect
even better performance with the changes suggested above:

{{for onefile in *.ldif; do sed -i -n '1 \{h; $ !d}; $ \{x; s/\n //g; p}; /^ / \{H; d}; /^
/! \{x; s/\n //g; p}' "${onefile}"; done}}


was (Author: dajoker):
[EDIT: I see I missed a few comments while typing this; I'm building from source with the
latest commits to see how it performs now]

 

The replaceAll with newline followed by carriage return followed by a space is, in my opinion,
an invalid case since no OS does that (or, as far as I know, has ever done that):
 s = s.replaceAll( "\n\r ", "" ); //$NON-NLS-1$ //$NON-NLS-2$
 The carriage only line is valid for Mac OS 9 and earlier, but I am guessing almost nobody
runs that anymore since OS X debuted in 2001, and if they do they probably cannot get Directory
Studio on there.  Still, it's theoretically possible somebody could have an old file from
there sent to somebody else with Directory Studio.  If that is deemed too much of an unlikely
scenario, then we can take out this line:
 s = s.replaceAll( "\r ", "" ); //$NON-NLS-1$ //$NON-NLS-2$
 That leaves the windows carriage return followed by newline abomination, and the Linux/Unix/MacOSX/etc.
case of a simple newline.  Since all of these calls use the String object which is immutable,
all of those calls basically recreate the String each time, and while the regex part is probably
the slow part, the recreation of strings of this size probably does not help much either. 
It would be interesting to see which of the following performed best:
 s = s.replaceAll( "\r?\n ", "" ); //$NON-NLS-1$ //$NON-NLS-2$
 vs.
 s = s.replaceAll( "\r", "" ); //$NON-NLS-1$ //$NON-NLS-2$
 s = s.replaceAll( "\n ", "" ); //$NON-NLS-1$ //$NON-NLS-2$
 vs.
 s = s.replaceAll( "(?:\r\n)|(?:\n) ", "" ); //$NON-NLS-1$ //$NON-NLS-2$
  

Also, is there a reason we fold the lines in the schema files saved out by Directory Studio? 
If that is stopped, then a method to read schema files without trying to unfold them could
be used.  I suspect folding for internal use is not that helpful (I personally think folding
is not that helpful in general unless you really hate long lines, but this isn't meant for
humans as much as computers), though I am sure we need the current methods to properly handle
unfolding when getting files from outside Directory Studio.

I went into my ~/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core
directory and unfolded the schema files manually and the load time decreased a little (two
(2) to three (3) seconds), but since the replaceAll calls are still in there I would expect
even better performance with the changes suggested above:
{quote}{{for onefile in *.ldif; do sed -i -n '1 {h; $ ! / \{H; d}; /^ /! \{x; s/\n //g; p}'
"${onefile}"; done}}
{quote}
 

> Directory Studio startup very slow due to schema LDIF processing
> ----------------------------------------------------------------
>
>                 Key: DIRSTUDIO-1174
>                 URL: https://issues.apache.org/jira/browse/DIRSTUDIO-1174
>             Project: Directory Studio
>          Issue Type: Bug
>          Components: studio-connection
>    Affects Versions: 2.0.0-M13
>         Environment: openSUSE Linux (installed on my laptop)
> Sun/Oracle Java 1.8.0_111 (previously 1.7 with same issue)
> Apache Directory Studio 2.0.0 M12 and M13, plus earlier milestones too
>            Reporter: Aaron Burgemeister
>            Priority: Major
>              Labels: LDIF, schema, startup-time
>         Attachments: 20180415-no-load-schema-ldif-by-default.patch, 20180416-dirstudio-1174-fix-a.patch,
20180821-schema-analysis-a.csv.bz2, 20180821-schema-analysis-b.csv.bz2, schema-9060594b-7c28-4123-b574-35fe09727283.ldif.bz2
>
>
> For the past couple years startup of Apache Directory Studio has slowed down to the point
where it takes more than a minute on my not-a-slouch laptop to start.  Other systems, VMs
with new installs, start much faster, even on the same laptop, implying something other than
the base product is at fault.  As a result, I had suspected maybe Directory Studio slowed
down precipitously due to the number of stored connections, but never confirmed the same.
> Today I connected strace to the 'java' process as it started and noticed the following:
>  
> [pid 30108] *1521902717*.154740 open("/home/ab/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core/schema-ba001fb7-4b83-4dca-be44-517c14139f4b.ldif",
O_RDONLY) = *-1 ENOENT (No such file or directory)*
> [pid 30108] *1521902717*.154906 stat("/home/ab/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core",
\{st_mode=S_IFDIR|0755, st_size=5378, ...}) = 0
> [pid 30108] *1521902717*.154948 open("/home/ab/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core/schema-95e1202e-9a67-418c-afe9-b02f4e7c06df.ldif",
O_RDONLY) = *-1 ENOENT (No such file or directory)*
> [pid 30108] *1521902717*.155019 stat("/home/ab/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core",
\{st_mode=S_IFDIR|0755, st_size=5378, ...}) = 0
> [pid 30108] *1521902717*.155053 open("/home/ab/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core/schema-687f43f6-9d05-4d08-b159-35b0e76dc95a.ldif",
O_RDONLY) = *-1 ENOENT (No such file or directory)*
> [pid 30108] *1521902717*.155120 stat("/home/ab/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core",
\{st_mode=S_IFDIR|0755, st_size=5378, ...}) = 0
> [pid 30108] *1521902717*.155154 open("/home/ab/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core/schema-d62d0e10-c81e-4477-81a2-ac2c9e5c7169.ldif",
O_RDONLY) = *121*
> [pid 30108] *1521902718*.698702 stat("/home/ab/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core",
\{st_mode=S_IFDIR|0755, st_size=5378, ...}) = 0
> [pid 30108] *1521902718*.698800 open("/home/ab/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core/schema-7b6a9a7c-2192-4b24-8874-1378e5b1b30c.ldif",
O_RDONLY) = *126*
> [pid 30108] *1521902719*.770570 stat("/home/ab/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core",
\{st_mode=S_IFDIR|0755, st_size=5378, ...}) = 0
> [pid 30108] *1521902719*.770660 open("/home/ab/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core/schema-b3b02838-067f-4f24-bf92-6bf3fccdbc52.ldif",
O_RDONLY) = *127*
> [pid 30108] *1521902721*.198417 stat("/home/ab/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core",
\{st_mode=S_IFDIR|0755, st_size=5378, ...}) = 0
>  
> Notice the timestamps (bolded near beginning of line) and how they change based on whether
or not a schema LDIF file was found (bolded near end of line) and, presumably, processed. 
When a file is not found, subsequent files are sought immediately without significantly delaying
startup.
> These schema files are all under 1 MiB in size, but most of them are several hundred
KiBs, approaching the 1 MiB size, so depending on what Directory Studio is doing as it reads
and processes these files, it would seem that this introduces the slowness when a file is
found.
> Looking for an existing issue I found DIRSTUDIO-1027 which may be related.  During startup
of Directory Studio one of my laptop's eight cores is fully utilized, which makes me think
this may be more about processing the LDIF than just swapping memory due to inefficient data
structures, but I am not a memory management expert, so I only mention the possibility here
in case it helps find the root cause quickly.
> My Directory Studio's total startup time: sixty-one (61) seconds.
> Time spent (per strace) reading schema files: fifty-five (55) seconds.
> Estimated non-schema startup time: six (6) seconds.
>  
> Steps to duplicate:
> Have a lot, e.g. 100, of stored schema LDIF files from previous connections.
> Startup Apache Directory Studio.
> Expected results: Startup quickly.  Processing old schema LDIFs, when most of them will
not be used at any given time, seems like a waste of time in general.  Perhaps this can be
done only when a connection is accessed in some way rather than at startup.
> Actual results: Slow startup.
> Reproducible: I think so, but am not sure why my system has these schema LDIFs when others
may not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message