metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cestella <...@git.apache.org>
Subject [GitHub] incubator-metron pull request #428: METRON-678: Multithread the flat file lo...
Date Sat, 28 Jan 2017 18:33:12 GMT
GitHub user cestella reopened a pull request:

    https://github.com/apache/incubator-metron/pull/428

    METRON-678: Multithread the flat file loader

    Currently the flat file loader is single threaded in its writing to HBase. We could make
this a lot faster by multithreading the HBase puts.
    
    Executing this on single node vagrant with the following configuration for 100k 2-column
CSV enrichment import:
    * a batch size of 128
    * number of threads varying between 1 and 6
    
    A reasonable speedup was achieved:
    
    | Number of Threads | Time (in seconds) |
    |-------------------|-------------------|
    | 1                 | 91.019            |
    | 2                 | 76.07             |
    | 3                 | 39.974            |
    | 4                 | 35.039            |
    | 5                 | 30.531            |
    | 6                 | 30.559            |
    
    ![chart](https://cloud.githubusercontent.com/assets/540359/22392190/af852618-e4c4-11e6-9c03-a68b66e330ad.png)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cestella/incubator-metron parallel_extractor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-metron/pull/428.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #428
    
----
commit 47d814ef95d67738d20ce5dc530ba7b05d418a96
Author: cstella <cestella@gmail.com>
Date:   2017-01-27T23:15:44Z

    Multithreading the SimpleEnrichmentFlatFileLoader

commit 918d4ce4aea5d7dfde992f32bf049c70f35dd182
Author: cstella <cestella@gmail.com>
Date:   2017-01-27T23:23:19Z

    doc changes.

commit c6ca3a86881eb77bc9598a61e3c0cf8280ccb03f
Author: cstella <cestella@gmail.com>
Date:   2017-01-27T23:39:56Z

    Updating docs.

commit 8c9a79cdfa38ea2fbd161095d5e346147558ec5f
Author: cstella <cestella@gmail.com>
Date:   2017-01-28T03:36:31Z

    Investigating integration tests.

commit 315bd181aa634290ab987441d81c28addb7952e2
Author: cstella <cestella@gmail.com>
Date:   2017-01-28T04:09:28Z

    Update integration test to be a proper integration test.

commit 004c6f41b6c1cc3ecea70513e1a468501bd32e3c
Author: cstella <cestella@gmail.com>
Date:   2017-01-28T04:49:37Z

    Adding spliterator unit test for completeness

commit f8dd48ef920c948e1fc5ff736e386f641e551b2b
Author: cstella <cestella@gmail.com>
Date:   2017-01-28T05:01:42Z

    Updating test to use a proper file

commit 9b04f9723d442c8f4fb7a8bcaa1d733fc1305dc4
Author: cstella <cestella@gmail.com>
Date:   2017-01-28T05:17:12Z

    Updating docs and renaming a few things.

commit eb5b82cc35bd767a169f548ea8144dd9ae165f84
Author: cstella <cestella@gmail.com>
Date:   2017-01-28T05:23:25Z

    Update one more test case.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message