jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Solomon Rutzky (Jira)" <j...@apache.org>
Subject [jira] [Commented] (OAK-8977) RDBBlobStore performance improvement for SQL Server (MSSQL)
Date Tue, 07 Apr 2020 07:26:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077000#comment-17077000
] 

Solomon Rutzky commented on OAK-8977:
-------------------------------------

[~reschke] I finally cleaned up that test script but I am unable to attach it to this issue.
So, I posted it on PasteBin:

[JackRabbitOAK8977.sql|https://pastebin.com/csh4fy42]

> RDBBlobStore performance improvement for SQL Server (MSSQL)
> -----------------------------------------------------------
>
>                 Key: OAK-8977
>                 URL: https://issues.apache.org/jira/browse/OAK-8977
>             Project: Jackrabbit Oak
>          Issue Type: Technical task
>          Components: rdbmk
>            Reporter: Solomon Rutzky
>            Priority: Major
>
> The performance test done in OAK-8926 indicates that one or more aspects of using SQL
Server as the database need to be addressed (also MySQL, but this issue only concerns SQL
Server).
>  
> OAK-8963 (specify a binary collation in the {{CREATE TABLE}} statement) will improve
performance a little, but not nearly enough to get its performance up to the range of the
other non-MySQL databases.
>  
> I did some testing and found one area that can be improved, and that should be feasible
to improve. The current DB access pattern is to submit parameterized dynamic SQL. JDBC handles
this as a "prepared" statement. I did some research four years ago on prepared statements
(  [What is the sense and benefit of using SqlCommand.Prepare()?|https://dba.stackexchange.com/a/129721/30859]
on DBA.StackExchange ) and found that:
> # prepared statements, even when parameterized, don't cache the execution plan between
connections (even if the connection is pooled and does not need to re-authenticate)
> # prepared statements only re-use the execution plan when executing the same prepared
statement with different parameters in the _same session_ (the Java code opens and closes
the connection per each call)
> One way to improve performance is to:
> # put the T-SQL code into a stored procedure (that way SQL Server does not need to parse
it per each call and can re-use an execution plan), _and_
> # call the stored procedure using a "callable statement": [Using a stored procedure with
input parameters|https://docs.microsoft.com/en-us/sql/connect/jdbc/using-a-stored-procedure-with-input-parameters]
>  
> I tested this by first copying the basic sequence of T-SQL statements found in the [storeBlockInDatabase|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBBlobStore.java?revision=1874570&view=markup#l307]
method (assuming that everything always worked / happy path / no error handling) and simulating
the new [testInsertSmallBlobs()|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-store-document/src/test/java/org/apache/jackrabbit/oak/plugins/document/blob/RDBBlobStoreTest.java?view=markup&pathrev=1874693#l308]
test, first as dynamic SQL, and then as a stored procedure:
> {code:sql}
> ALTER PROCEDURE dbo.[StoreBlockInDatabase]
> (
> 	@Digest VARBINARY(32),
> 	@Level INT,
> 	@Data VARBINARY(MAX)
> )
> AS
> SET NOCOUNT ON;
> DECLARE @Id VARCHAR(64) = CONVERT(VARCHAR(64), @Digest, 2),
> 		@Now BIGINT;
> SELECT @Now = [cpu_ticks]
> FROM   sys.dm_os_sys_info;
> BEGIN TRY
> 	UPDATE dbo.[DATASTORE_META]
> 	SET    [LASTMOD] = @Now
> 	WHERE  [ID] = @Id;
> 	IF (@@ROWCOUNT = 0)
> 	BEGIN
> 		INSERT INTO dbo.[DATASTORE_DATA] ([ID], [DATA]) VALUES (@Id, @Data);
> 		INSERT INTO dbo.[DATASTORE_META] ([ID], [LVL], [LASTMOD]) VALUES (@Id, @Level, @Now);
> 	END;
> END TRY
> BEGIN CATCH
> 	;THROW;
> END CATCH;
> GO
> {code}
> I tested against both the {{Latin1_General_CI_AS}} collation and the {{Latin1_General_BIN2}}
collation:
> * The stored procedure inserts 2.5 times more rows than the dynamic SQL version
> * The binary collation is slightly better than the non-binary collation per each test
> I will attach the test script once I get it cleaned up.
> P.S. I also tested with another pattern to see if doing a {{SELECT}} first (rather than
assuming the {{UPDATE}}) would be better (based on my testing for a similar case: [UPDATE
performance where no data changes|https://dba.stackexchange.com/a/114369/30859] on DBA.StackExchange),
but it wasn't better.
> P.P.S. I figure the {{CREATE PROCEDURE}} statement could be placed in the vendor-specific
area of [RDBBlobStoreDB|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBBlobStoreDB.java?revision=1874271&view=markup#l60]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message