commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Javen O'Neal" <one...@apache.org>
Subject [collections] a list class with fast indexOf(Object) and contains(Object)
Date Mon, 12 Sep 2016 05:14:35 GMT
On the Apache POI project (Java library to read and write Microsoft
Office files), we are using a List to store the fonts (XSSFFont) used
in an Excel workbook's style table (XSSFWorkbook.getStylesSource() ->
StyleTable) [1].

To avoid bloating the style table with duplicate fonts, we check if
the font is in the style table and return the index. Otherwise we add
the font and return the index. However, we do allow the list to
contain duplicate fonts.

The problem is ArrayList.indexOf(Object) and
ArrayList.contains(Object) are slow on large lists, described in bug
58740 [2]. Archie Cobbs' patch fixed the slowness by adding a reverse
mapping of list entry to list index (HashSetValuedHashMap<XSSFFont,
Integer> [3]) in addition to the list (List<XSSFFont>). This works,
but I felt it would be better to capture this complexity in a data
structure to keep StylesTable as simple as possible.

I originally looked at the collections4.1 BidiMap [4] implementations,
but these did not offer the same iteration order as a list and do not
allow duplicate XSSFFonts.
I also saw ListOrderedMap [5], but this does not allow duplicates and
I would rather expose a List interface than a Map.
None of the collections4.1 List [6] implementations meet the
reverse-lookup need.

I am writing a class that implements the List interface that uses a
List and a HashSetValuedHashMap, but wanted to see if there are
similar or better existing implementations that can solve my problem.
All suggestions are appreciated.

Requirements:
1. Implements List interface
2. Iteration order is ascending list order
3. Fast (O(log n) or better) get(index), get(Object), add(Object),
indexOf(Object), lastIndexOf(Object), contains(Object), and
iterator().next() operations.
4. Not part of the list interface, but finding duplicate objects via
getIndices(Object)->Collection<Integer> would be helpful.
5. Constant-time (O(1)) size()

Nice to have for a general-purpose class, but not needed for my application
4. Fast (O(log n) or better) set(index, Object), add(index, Object),
remove(index), remove(Object)


[1] list data structures used in StylesTable
https://svn.apache.org/viewvc/poi/trunk/src/ooxml/java/org/apache/poi/xssf/model/StylesTable.java?revision=1748898&view=markup#l70

[2] Increase StylesTable performance
https://bz.apache.org/bugzilla/show_bug.cgi?id=58740

[3] HashSetValuedHashMap
https://commons.apache.org/proper/commons-collections/javadocs/api-release/org/apache/commons/collections4/multimap/HashSetValuedHashMap.html

[4] BidiMap https://commons.apache.org/proper/commons-collections/javadocs/api-release/org/apache/commons/collections4/BidiMap.html

[5] ListOrderedMap
https://commons.apache.org/proper/commons-collections/javadocs/api-release/org/apache/commons/collections4/map/ListOrderedMap.html

[6] List https://commons.apache.org/proper/commons-collections/javadocs/api-release/org/apache/commons/collections4/list/package-summary.html

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message