jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bernhard Stiftner (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (JENA-1553) Can't Backup data - java.io.IOException: Illegal UTF-8: 0xFFFFFFB1
Date Fri, 28 Sep 2018 11:19:00 GMT

    [ https://issues.apache.org/jira/browse/JENA-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16631709#comment-16631709
] 

Bernhard Stiftner edited comment on JENA-1553 at 9/28/18 11:18 AM:
-------------------------------------------------------------------

Experienced the same problem with Jena 3.8.0. TDB node tables got corrupted at some point
under a combined, concurrent read/write workload, consequently leading to various exceptions
being thrown in/around NodeLib.decode. Among the incarnations of the same problem were...

Different kinds of RiotParseExceptions when attemping to access corrupted TDB node tables:

{noformat}
org.apache.jena.riot.RiotParseException: [line: 1, col: 1 ] Failed to find a prefix name or
keyword: ^@(0;0x0000)
        at org.apache.jena.riot.tokens.TokenizerText$ErrorHandlerTokenizer.error(TokenizerText.java:65)
        at org.apache.jena.riot.tokens.TokenizerText.error(TokenizerText.java:1244)
        at org.apache.jena.riot.tokens.TokenizerText.readPrefixedNameOrKeyword(TokenizerText.java:536)
        at org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:445)
        at org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:99)
        at org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:127)
        at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:110
{noformat}

{noformat}
org.apache.jena.riot.RiotParseException: [line: 1, col: 3 ] Malformed double: 2e
        at org.apache.jena.riot.tokens.TokenizerText$ErrorHandlerTokenizer.error(TokenizerText.java:65)
        at org.apache.jena.riot.tokens.TokenizerText.error(TokenizerText.java:1244)
        at org.apache.jena.riot.tokens.TokenizerText.exponent(TokenizerText.java:1011)
        at org.apache.jena.riot.tokens.TokenizerText.readNumber(TokenizerText.java:916)
        at org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:421)
        at org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:99)
        at org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:127)
        at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:110)
{noformat}


Or a TDBException like this one:

{noformat}
org.apache.jena.tdb.TDBException: Not a node: if/stmt/6da980f15dedf35826cf3a4354525ded8efde37b>
        at org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:133)
        at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:110)
{noformat}

And I also got "Illegal UTF-8" errors just as in the stacktrace of the original problem report:

{noformat}
org.apache.jena.atlas.RuntimeIOException: java.io.IOException: Illegal UTF-8: 0xFFFFFF97
        at org.apache.jena.atlas.io.IO.exception(IO.java:254)
        at org.apache.jena.atlas.io.BlockUTF8.exception(BlockUTF8.java:275)
        at org.apache.jena.atlas.io.BlockUTF8.toCharsBuffer(BlockUTF8.java:150)
        at org.apache.jena.atlas.io.BlockUTF8.toChars(BlockUTF8.java:73)
        at org.apache.jena.atlas.io.BlockUTF8.toString(BlockUTF8.java:95)
        at org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:101)
        at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:110)
{noformat}

All of those errors disappeared after patching Jena (we're using our own fork of 3.8.0) with
the proposed fix for JENA-1581 (upcoming Jena 3.9.0) and completely rebuilding TDB stores.
Existing data is probably corrupted and cannot be recovered, but so far I believe that JENA-1581
prevents TDB corruption from happening in the first place.


was (Author: bersti):
Experienced the same problem with Jena 3.8.0. TDB node tables got corrupted at some point
under a combined, concurrent read/write workload, consequently leading to various exceptions
being thrown in/around NodeLib.decode. Among the incarnations of the same problem were...

Different kinds of RiotParseExceptions when attemping to access corrupted TDB node tables:

{noformat}
org.apache.jena.riot.RiotParseException: [line: 1, col: 1 ] Failed to find a prefix name or
keyword: ^@(0;0x0000)
        at org.apache.jena.riot.tokens.TokenizerText$ErrorHandlerTokenizer.error(TokenizerText.java:65)
        at org.apache.jena.riot.tokens.TokenizerText.error(TokenizerText.java:1244)
        at org.apache.jena.riot.tokens.TokenizerText.readPrefixedNameOrKeyword(TokenizerText.java:536)
        at org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:445)
        at org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:99)
        at org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:127)
        at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:110
{noformat}

{noformat}
org.apache.jena.riot.RiotParseException: [line: 1, col: 3 ] Malformed double: 2e
        at org.apache.jena.riot.tokens.TokenizerText$ErrorHandlerTokenizer.error(TokenizerText.java:65)
        at org.apache.jena.riot.tokens.TokenizerText.error(TokenizerText.java:1244)
        at org.apache.jena.riot.tokens.TokenizerText.exponent(TokenizerText.java:1011)
        at org.apache.jena.riot.tokens.TokenizerText.readNumber(TokenizerText.java:916)
        at org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:421)
        at org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:99)
        at org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:127)
        at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:110)
{noformat}


Or a TDBException like this one:

{noformat}
org.apache.jena.tdb.TDBException: Not a node: if/stmt/6da980f15dedf35826cf3a4354525ded8efde37b>
        at org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:133)
        at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:110)
{noformat}

And I also got "Illegal UTF-8" errors just as in the stacktrace above:

{noformat}
org.apache.jena.atlas.RuntimeIOException: java.io.IOException: Illegal UTF-8: 0xFFFFFF97
        at org.apache.jena.atlas.io.IO.exception(IO.java:254)
        at org.apache.jena.atlas.io.BlockUTF8.exception(BlockUTF8.java:275)
        at org.apache.jena.atlas.io.BlockUTF8.toCharsBuffer(BlockUTF8.java:150)
        at org.apache.jena.atlas.io.BlockUTF8.toChars(BlockUTF8.java:73)
        at org.apache.jena.atlas.io.BlockUTF8.toString(BlockUTF8.java:95)
        at org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:101)
        at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:110)
{noformat}

All of those errors disappeared after patching Jena (we're using our own fork of 3.8.0) with
the proposed fix for JENA-1581 (upcoming Jena 3.9.0) and completely rebuilding TDB stores.
Existing data is probably corrupted and cannot be recovered, but so far I believe that JENA-1581
prevents TDB corruption from happening in the first place.

> Can't Backup data - java.io.IOException: Illegal UTF-8: 0xFFFFFFB1
> ------------------------------------------------------------------
>
>                 Key: JENA-1553
>                 URL: https://issues.apache.org/jira/browse/JENA-1553
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Jena
>         Environment: Ubuntu 16.04 running Docker.  Running stain/jena-fuseki from the
official Docker Hub.
>            Reporter: Brian Mullen
>            Priority: Major
>
> Attempting to backup through Fuseki, TDB 500M+ triples, breaking with error:  
>  
> {code:java}
> [2018-06-01 13:25:46] Log4jLoggerAdapter WARN  Exception in backup
> org.apache.jena.atlas.RuntimeIOException: java.io.IOException: Illegal UTF-8: 0xFFFFFFB1
>         at org.apache.jena.atlas.io.IO.exception(IO.java:233)
>         at org.apache.jena.atlas.io.BlockUTF8.exception(BlockUTF8.java:275)
>         at org.apache.jena.atlas.io.BlockUTF8.toCharsBuffer(BlockUTF8.java:150)
>         at org.apache.jena.atlas.io.BlockUTF8.toChars(BlockUTF8.java:73)
>         at org.apache.jena.atlas.io.BlockUTF8.toString(BlockUTF8.java:95)
>         at org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:101)
>         at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:105)
>         at org.apache.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:81)
>         at org.apache.jena.tdb.store.nodetable.NodeTableNative.readNodeFromTable(NodeTableNative.java:186)
>         at org.apache.jena.tdb.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:111)
>         at org.apache.jena.tdb.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:70)
>         at org.apache.jena.tdb.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:128)
>         at org.apache.jena.tdb.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:82)
>         at org.apache.jena.tdb.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:50)
>         at org.apache.jena.tdb.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:67)
>         at org.apache.jena.tdb.lib.TupleLib.triple(TupleLib.java:107)
>         at org.apache.jena.tdb.lib.TupleLib.triple(TupleLib.java:84)
>         at org.apache.jena.tdb.lib.TupleLib.lambda$convertToTriples$2(TupleLib.java:54)
>         at org.apache.jena.atlas.iterator.Iter$2.next(Iter.java:270)
>         at org.apache.jena.atlas.iterator.Iter$2.next(Iter.java:270)
>         at org.apache.jena.atlas.iterator.Iter.next(Iter.java:891)
>         at org.apache.jena.riot.system.StreamOps.sendQuadsToStream(StreamOps.java:140)
>         at org.apache.jena.riot.writer.NQuadsWriter.write$(NQuadsWriter.java:62)
>         at org.apache.jena.riot.writer.NQuadsWriter.write(NQuadsWriter.java:45)
>         at org.apache.jena.riot.writer.NQuadsWriter.write(NQuadsWriter.java:91)
>         at org.apache.jena.riot.RDFWriter.write$(RDFWriter.java:208)
>         at org.apache.jena.riot.RDFWriter.output(RDFWriter.java:165)
>         at org.apache.jena.riot.RDFWriter.output(RDFWriter.java:112)
>         at org.apache.jena.riot.RDFWriterBuilder.output(RDFWriterBuilder.java:149)
>         at org.apache.jena.riot.RDFDataMgr.write$(RDFDataMgr.java:1269)
>         at org.apache.jena.riot.RDFDataMgr.write(RDFDataMgr.java:1162)
>         at org.apache.jena.riot.RDFDataMgr.write(RDFDataMgr.java:1153)
>         at org.apache.jena.fuseki.mgt.Backup.backup(Backup.java:115)
>         at org.apache.jena.fuseki.mgt.Backup.backup(Backup.java:75)
>         at org.apache.jena.fuseki.mgt.ActionBackup$BackupTask.run(ActionBackup.java:58)
>         at org.apache.jena.fuseki.async.AsyncPool.lambda$submit$0(AsyncPool.java:55)
>         at org.apache.jena.fuseki.async.AsyncTask.call(AsyncTask.java:100)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Illegal UTF-8: 0xFFFFFFB1
>         ... 40 more
> [2018-06-01 13:25:46] Log4jLoggerAdapter INFO  Backup(/fuseki/backups/PDE_PROD_2018-06-01_13-24-00):2{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message