crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Ruppert (JIRA)" <>
Subject [jira] [Updated] (CRUNCH-603) Cache constituent Writables inside TupleWritable `readField` call
Date Mon, 18 Apr 2016 19:35:25 GMT


Steven Ruppert updated CRUNCH-603:
    Attachment: 0001-TupleWritable-reuse-Writable-instances-where-possibl.patch

> Cache constituent Writables inside TupleWritable `readField` call
> -----------------------------------------------------------------
>                 Key: CRUNCH-603
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.13.0
>            Reporter: Steven Ruppert
>            Assignee: Josh Wills
>            Priority: Minor
>         Attachments: 0001-TupleWritable-reuse-Writable-instances-where-possibl.patch
> Currently, `TupleWritable.readFields` will, for every field in the tuple, create a new
Writable of that field type using reflection (`WritableFactories.newInstance`), through `TupleWritable.getWritable`,
in order to deserialize that field. This burns up an unfortunate amount of CPU time.
> I've got a patch for this that caches the writables to be reused (just as the TupleWritable
itself is reused throughout hadoop). It appears to work, at least for our cases. I think it
will break if you ever  have heterogenous tuple types, but that seems like a bad idea, if
not already proscribed in the documentation somewhere.

This message was sent by Atlassian JIRA

View raw message