metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cestella <>
Subject [GitHub] incubator-metron pull request #428: METRON-678: Multithread the flat file lo...
Date Fri, 27 Jan 2017 23:28:04 GMT
GitHub user cestella opened a pull request:

    METRON-678: Multithread the flat file loader

    Currently the flat file loader is single threaded in its writing to HBase. We could make
this a lot faster by multithreading the HBase puts.
    Executing this on single node vagrant with a batch size of 128 and a number of threads
varying between 1 and 6 for a 2 column CSV enrichment, a reasonable speedup was achieved:
    1. 91.019 seconds
    2. 76.07 seconds
    3. 39.974 seconds
    4. 35.039 seconds
    5. 30.531 seconds
    6. 30.559 seconds

You can merge this pull request into a Git repository by running:

    $ git pull parallel_extractor

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #428
commit 47d814ef95d67738d20ce5dc530ba7b05d418a96
Author: cstella <>
Date:   2017-01-27T23:15:44Z

    Multithreading the SimpleEnrichmentFlatFileLoader

commit 918d4ce4aea5d7dfde992f32bf049c70f35dd182
Author: cstella <>
Date:   2017-01-27T23:23:19Z

    doc changes.


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

View raw message