commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Herbert <alex.d.herb...@gmail.com>
Subject Re: [lang][rng] org.apache.commons.lang3.ArrayUtils.shuffle()
Date Fri, 14 Jun 2019 11:57:06 GMT

On 14/06/2019 12:01, sebb wrote:
> I meant that the iterator would use the shuffled and/or selected
> indices to return the appropriate entry from the original array.
>
> No need to modify the original array.

To shuffle an array requires storage as elements are swapped. Either 
store an index or store the data. So for primitive types with 4-bytes or 
less storage it is more efficient (memory wise) to copy the array and 
shuffle in place. For those with size of 8-bytes then you can create an 
int array of indices and shuffle that. The index can be used to take the 
sample from the original array.

In terms of the RNG library the functionality to shuffle an entire array 
once (as per ArrayUtils) would be the same as the functions from [lang] 
ArrayUtils. So a decision should be made on whether to copy that 
functionality from [lang] and mark it as deprecated there.

The idea of unlimited shuffling would be done as a sampler so this would 
go into the rng-sampling module. A single shuffle creates a permutation 
of the sequence. The sample would do this dynamically and repeatedly 
pass over the array.

The sort of code would be:

double[] array = { 1.1, 2.2, 3.3 };
UniformRandomProvider rng = ...;
DoubleContinuousPermutationSampler sampler = new 
DoubleContinuousPermutationSampler(rng, array);
for (int i = 0; i < 100; i++) {
     double sample = sampler.next();
}

The output would be the n values of the array in a random order without 
replacement. Once the entire set n is produced then the output would 
repeat in a new random order, and so on:

2.2, 1.1, 3.3, 3.3, 2.2, 1.1, 2.2, 3.3, 1.1, ...

Currently the library does not specialise for primitives so a more 
generic sampler would only output a stream of int indices:

double[] array = { 1.1, 2.2, 3.3 };
UniformRandomProvider rng = ...;
ContinuousPermutationSampler sampler = new 
ContinuousPermutationSampler(rng, array.length);
for (int i = 0; i < 100; i++) {
     double sample = array[sampler.next()];
}

If you are only interested in a complete single pass then creating a 
shuffled int[] of indices can be done using the existing 
PermutationSampler. This returns an int[] permutation from its sample 
method. The array can be used in place of an iterator:

double[] array = { 1.1, 2.2, 3.3 };
UniformRandomProvider rng = ...;
for (int index : new PermutationSampler(rng, array.length, 
array.length).sample()) {
     double sample = array[index];
}

I'm not opposed to the continuous permutation sampler idea. I just can't 
think of a use case not already satisfied by the existing 
PermutationSampler, i.e. when you want to sample permutations using each 
index one at a time and not in chunks of indices. This type of idea:

double[] array = { 1.1, 2.2, 3.3 };
UniformRandomProvider rng;
// The PermutationSampler will validate the length is >0
final PermutationSampler sampler = new PermutationSampler(rng, 
array.length, array.length);
IntSupplier supplier = new IntSupplier() {
     int i;
     int[] sample;
     @Override
     public int getAsInt() {
         if (i == 0) {
             sample = sampler.sample();
             i = sample.length;
         }
         return sample[--i];
     }
};
for (;;) {
     double sample = array[supplier.getAsInt()];
}


>
> On Thu, 13 Jun 2019 at 18:15, Eric Barnhill <ericbarnhill@gmail.com> wrote:
>> An iterator that dynamically shuffles as you go along. That's really nice,
>> I had never even thought of that. Thanks.
>>
>> On Thu, Jun 13, 2019 at 10:11 AM Alex Herbert <alex.d.herbert@gmail.com>
>> wrote:
>>
>>> On 13/06/2019 17:56, Eric Barnhill wrote:
>>>> On Thu, Jun 13, 2019 at 9:36 AM sebb <sebbaz@gmail.com> wrote:
>>>>
>>>>> Rather than shuffle etc in place, how about various
>>>>> iterators/selectors to return entries in randomised order?
>>>>> [Or does that already exist?]
>>>>>
>>>> I am pretty sure random draws, and shuffling, are implemented with
>>>> different algorithms. Though sampling without replacement the full length
>>>> of the set would yield a shuffled set, I think there are more efficient
>>>> ways to shuffle a set.
>>> Iterators to return a random draw *without* replacement over the full
>>> length of the array? The iterator would dynamically shuffle the array on
>>> each call to next() so could be stopped early or can be called
>>> infinitely as if a continuous stream. Is that your idea?
>>>
>>> UniformRandomProvider rng = ...;
>>> int[] big = new int[1000000];
>>> //
>>> // Fill big with lots of data
>>> //
>>> IntIterator iter = ShuffleIterators.create(rng, big);
>>> int x = iter.next();
>>> int y = iter.next();
>>> int z = iter.next();
>>>
>>> This doesn't exist but it is easy to do. Memory requirements would
>>> require a copy of the data, or it could be marked as destructive to the
>>> input array order and shuffle in place.
>>>
>>> If you want a random draw *with* replacement then you can just call
>>> nextInt(int) with the size of the array to pick something.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message