drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Drill synthetic log generator
Date Sat, 13 Jul 2013 00:04:52 GMT

There is no command line parameter.

In LogGenerator, this line controls how users are invented:

  private LongTail<User> userGenerator = new LongTail<User>(50000, 0) {
        protected User createThing() {
            return new User(ipGenerator.sample(), geo, terms);

The two parameters here (50000 and 0) control how the number of users
grows.  The first number is called alpha and the second is the discount.
 When discount == 0 as in this code, the users are generated using a
Dirichlet process and the number of unique users grows at roughly alpha
log(n).  If discount > 0, then the percentage of users with a single
transaction is asymptotically equal to the discount.  The user population
grows roughly with alpha n^discount.

There will be a real problem if the number of users increases, however,
because each user requires a lot of memory.  This happens because the
language model for each user is cloned from a common base instead of
sharing this common base.  I have been looking into using a better kind of
hash table to allow sharing of mutable tables (using an HAMT, actually),
but this definitely isn't ready.  Once (if ever) it is ready, we should see
at least one and possibly 3 orders of magnitude decrease in the memory cost
of each user after the first few.

This all means that the simplest and safest thing to do is increase the
value of alpha from 50,000 and watch your memory usage.

On Fri, Jul 12, 2013 at 4:42 PM, peter he <rmjlxj@gmail.com> wrote:

> ...
> One quick followup question, is there anyway to change the number of users
> generated using a parameter?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message