jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ian Boston (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-2920) RDBDocumentStore: fail init when database config seems to be inadequate
Date Fri, 26 Feb 2016 12:46:18 GMT

    [ https://issues.apache.org/jira/browse/OAK-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168925#comment-15168925
] 

Ian Boston commented on OAK-2920:
---------------------------------

If the DB config is broken, then content that expects UTF8 in the Path will fail to import
as the IDs will be rejected as duplicates. For instance any application that stores i18n content
in the repository and needs to work with any language that has double byte characters (eg
German) will fail. ID duplicates are easy to detect. Much harder to detect is data corruption
within JCR properties as a user using Oak via a WebUI could suspect any of the links between
the Browser and the DB as the source of UTF8 corruption.

Taking mySQL as an example. Without utf8, Characters in common use in EU countries cant be
stored as JCR properties. http://www.periodni.com/unicode_utf-8_encoding.html. Without utf8mb4,
supplementary UTF8 characters can't be stored as JCR properties. http://www.i18nguy.com/unicode/supplementary-test.html

For those reasons, any database or JDBC connection that is misconfigured is likely to cause
considerable problems in production and probably won't work with most modern applications
that have been internationalised or need to mention the Euro.  € &#8364;

One approach to detect this is to write a row to the nodes table containing supplementary
UTF8 characters, commit the row, and then read the same row back, verifying that the data
survived the round trip. Finally delete the row. The ID of the row can be something that Oak
would never use with a low probability of collision with other Oak instances in the same cluster.
(ie ms timestamp eg 21313412313:utf8test). If there is a concern about tables other than the
nodes table, then those can be tested as well.

A switch should be provided to allow those who have managed to run Oak in production with
a misconfigured database to at least keep running in production while they correct the issue.
For mySQL this might be as simple as correcting the JDBC url to include utf8mb4 encoding.

> RDBDocumentStore: fail init when database config seems to be inadequate
> -----------------------------------------------------------------------
>
>                 Key: OAK-2920
>                 URL: https://issues.apache.org/jira/browse/OAK-2920
>             Project: Jackrabbit Oak
>          Issue Type: Sub-task
>          Components: rdbmk
>            Reporter: Julian Reschke
>            Priority: Minor
>              Labels: resilience
>
> It has been suggested that the implementation should fail to start (rather than warn)
when it detects a DB configuration that is likely to cause problems (such as wrt character
encoding or collation sequences)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message