poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject [Bug 59432] createName is slow when there are many Names (10000)
Date Tue, 17 May 2016 14:35:23 GMT
https://bz.apache.org/bugzilla/show_bug.cgi?id=59432

--- Comment #1 from Javen O'Neal <onealj@apache.org> ---
Adding a name requires checking the name manager for existing names to avoid
defining the same name at the same scope (my guess is this would result in a
corrupt workbook). POI uses a naïve implementation, shown in comment 0, which
requires O(n) time for a naïve implementation. We could perform this check in
O(1) using a hash table with a hashable tuple (scope, name) as the key. A less
elegant, inferior solution that runs in O(1) uses nested hash tables: the first
layer having scope (sheet name or global) keys and the second layer having name
keys (inner and outer key could swapped). This comes at the cost of higher
memory consumption and increasing the complexity of the code (and therefore
higher chance for bugs).

Given the code from comment 0, I'm not surprised that adding N names is slow,
as it is performing O(N²) operations.

Here's what you could do:
1) Provide a patch with a speed-optimized implementation with 100% test
coverage.
2) Provide a patch with a non-validating version of setNameName (probably
called setNameNameUnsafe) [1]
3) access the CT* classes yourself, either with introspection, subclassing, or
forking POI, which gives you direct access to the CTName data structure. This
would complicate upgrading POI in the future.

[1] Relevant discussion on dev@poi mailing list
http://apache-poi.1045710.n5.nabble.com/Preventing-corrupt-workbooks-td5722973.html

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Mime
View raw message