commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <>
Subject [jira] [Commented] (COLLECTIONS-406) ListUtils.subtract is very slow
Date Sun, 03 Jun 2012 10:35:23 GMT


Hudson commented on COLLECTIONS-406:

Integrated in commons-collections #20 (See [])
    [COLLECTIONS-406] Improved ListUtils.subtract to O(n) performance. Thanks to Adrian Nistor
for reporting and providing a patch. (Revision 1345644)

     Result = SUCCESS
tn :
Files : 
* /commons/proper/collections/trunk/src/main/java/org/apache/commons/collections/
* /commons/proper/collections/trunk/src/test/java/org/apache/commons/collections/

> ListUtils.subtract is very slow 
> --------------------------------
>                 Key: COLLECTIONS-406
>                 URL:
>             Project: Commons Collections
>          Issue Type: Bug
>    Affects Versions: 3.2.1
>         Environment: java 1.6.0_24
> Ubuntu 11.10
>            Reporter: Adrian Nistor
>             Fix For: 4.0
>         Attachments:, patch.diff
> Hi,
> ListUtils.subtract is very slow when subtracting two large lists.  The
> root cause of this problem is similar to the root cause of the
> previously fixed COLLECTIONS-328 in ListUtils, i.e., quadratic
> complexity instead of linear complexity.
> I am encountering this problem in version 3.2.1 and also in revision
> 1342815 (May 25th 2012).  I have attached a test that exposes this
> problem and a simple patch.  On my machine, for the attached test,
> this patch provides a 95X speedup.
> To run the test, just do:
> $ java Test
> Currently, the code for ListUtils.subtract is:
> final ArrayList<E> result = new ArrayList<E>(list1);
> for (E e : list2) {
>     result.remove(e);
> }
> return result;
> which is quadratic, because result.remove(e) has linear complexity.
> The attached patch makes the remove() be constant complexity by
> removing from an org.apache.commons.collections.bag.HashBag.  I use
> HashBag and not HashSet because ListUtils.subtract needs to respect
> the cardinality when there are repeated objects in the list.
> As mentioned in COLLECTIONS-328 discussion, for small lists, there is
> some overhead for creating the HashBag.  This can be fixed with a
> threshold, but I did not implement it in my patch because the
> COLLECTIONS-328 patch does not implement it.
> Unlike the patch for COLLECTIONS-328, my patch does not choose the
> list to iterate over based on size, because of the cardinality
> requirement in subtract.  This means the code could be made even
> faster if we could use something like a LinkedHashBag but neither
> Apache Collections nor standard Java libraries have such a class.
> Even so, this patch is still a lot faster than the original version.
> Is this truly a bug, or am I missing something here?  If so, can you
> please confirm if the patch is correct?
> Thanks,
> Adrian

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message