jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Vesse (JIRA)" <>
Subject [jira] Created: (JENA-12) Turtle Files with a UTF-8 BOM fail to parse
Date Sat, 18 Dec 2010 13:06:00 GMT
Turtle Files with a UTF-8 BOM fail to parse

                 Key: JENA-12
             Project: Jena
          Issue Type: Bug
          Components: RIOT
         Environment: Windows 7, latest Sun Java Runtime, Jena 2.6.4
            Reporter: Rob Vesse

If a Turtle file has a BOM at the start then Jena will refuse to parse it giving the following

Exception in thread "main" com.hp.hpl.jena.n3.turtle.TurtleParseException: Lexical error at
line 1, column 2.  Encountered: "@" (64), after : "\ufeff"
    at com.hp.hpl.jena.n3.turtle.ParserTurtle.parse(
    at com.hp.hpl.jena.n3.turtle.TurtleReader.readWorker(
    at com.hp.hpl.jena.n3.JenaReaderBase.readImpl(
    at TurtleWithBOM.main(

The code I used to produce this error was as follows:

import com.hp.hpl.jena.rdf.model.*;
import com.hp.hpl.jena.util.FileManager;


public class TurtleWithBOM

    public static void main(String[] args)

        // create an empty model
        Model model = ModelFactory.createDefaultModel();

        InputStream in = FileManager.get().open( "ttl-with-bom.ttl" );
        if (in == null)
            throw new IllegalArgumentException( "File: ttl-with-bom.ttl not found");

        // read the Turtle file, "", "TTL");

        // write it to standard out

A sample Turtle file used with the above code can be found attached to the original report
to the Jena Users mailing list here -|b0e33a3dc6849ef75f49c8891480853dmBGBgv06rav08r||

The data files are coming from my software which is all written in .Net and when outputting
in UTF-8 the default behaviour of .Net is to include the BOM at the start of the file. The
BOM is not required for UTF-8 but it is not forbidden so I think this should be fixed (if
possible) for future releases. I will be modifying my software so that output of the BOM can
be disabled by my users if desired 

Looking at the error message given I expect that the same problem would also affect N3 files
since they are using the same reader afaict from the error trace. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message