crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Ruppert (JIRA)" <>
Subject [jira] [Created] (CRUNCH-604) Avoid expensive Writables.reloadWritableComparableCodes where possible
Date Mon, 18 Apr 2016 19:45:25 GMT
Steven Ruppert created CRUNCH-604:

             Summary: Avoid expensive Writables.reloadWritableComparableCodes where possible
                 Key: CRUNCH-604
             Project: Crunch
          Issue Type: Improvement
          Components: Core
    Affects Versions: 0.13.0
            Reporter: Steven Ruppert
            Assignee: Josh Wills

Every time `setConf` is called on TupleWritable, `Writables.reloadWritableComparableCodes(conf)`
is called. Unfortunately, `SequenceFile$Reader.readValue` calls `setConf` every single time.
This burns a regrettable amount of CPU time.

Attached is a patch that prevents a given TupleWritable instance from reloading the code more
than once, as well as a patch to cache (hashCode-wise) reading from the actual hadoop config,
which has to run regexes and stuff. I can construe situations where this would break (somehow,
you modify the configuration in between reading to two values), but nothing actually sane
comes to mind.

This message was sent by Atlassian JIRA

View raw message