commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Herbert <>
Subject Re: [CSV] Release again
Date Sat, 15 Jun 2019 18:59:17 GMT

> On 15 Jun 2019, at 18:57, Gary Gregory <> wrote:
> We've fixed some issues immediately after 1.7. How does everyone feel about
> releasing again?
> What else needs to be addressed in the short term?
> Gary

- Bug (picked up by FindBugs):

CSVRecord is no longer serialisable as it stores the CSVParser and that is not serialisable.
This can be fixed by making the parser transient and documenting the getParser() method as
invalid after deserialisation. Otherwise the cascade of fields that must be serialisable eventually
includes the original BufferedReader used to read the data. The parser is required for the
header map functionality and the getParser() method. So the class can store the header map
(easy), or overload the serialisation methods to record and load the header map (more effort),
or not support the parser and the header map functionality after deserialisation, or something

This should have been picked up by FindBugs during the release cycle but for unknown reasons
the FindBugs report on the site does not show this. So anyone who deserialises this class
will get an error as the serialVersionUID has not been updated and the new parser field is
not serialisable.

I also raised a few issues about the handling of column headers in my recent fix PR:

- Bug:

If the settings are not allowing empty columns headers you can currently use a single empty
header. This is because column headers are only checked for empty when they are duplicates.
So it is the second empty header (the first duplicate) that raises an error. This test should
pass but does not:

// This fails to throw an exception.
// Only 1 column name is missing and this is not recognised as a problem.
@Test(expected = IllegalArgumentException.class)
public void testHeadersMissing1ColumnException() throws Exception {
    final Reader in = new StringReader("a,,c,d\n1,2,3,4\nx,y,z,zz");

- Documentation:

If you are using the map of the header to column index it should be noted that under certain
circumstances where you allow duplicate column headers that the map cannot represent the one-to-many
relationship. The map is only guaranteed to be valid if the record was created with a format
that does not allow duplicate or empty header names.

- Bug / Documentation

The CSVParser only stores headers names in a list of header names if they are not null. So
the list can be shorter than the number of columns if you use a format that allows empty headers.

If the purpose of the list is to provide the correct order of the column headers then it should
contain empty entries too. Otherwise I don’t see the purpose of storing this list unless
it is documented what it will contain. This appears to be a list of non-null header names
in the order they occur in the header and thus represent keys that can be used in the header
map. But you can get that (without the ordering) using the map ketSet.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message