[ https://issues.apache.org/jira/browse/CRUNCH-603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steven Ruppert updated CRUNCH-603:
----------------------------------
Attachment: 0001-TupleWritable-reuse-Writable-instances-where-possibl.patch
> Cache constituent Writables inside TupleWritable `readField` call
> -----------------------------------------------------------------
>
> Key: CRUNCH-603
> URL: https://issues.apache.org/jira/browse/CRUNCH-603
> Project: Crunch
> Issue Type: Improvement
> Components: Core
> Affects Versions: 0.13.0
> Reporter: Steven Ruppert
> Assignee: Josh Wills
> Priority: Minor
> Attachments: 0001-TupleWritable-reuse-Writable-instances-where-possibl.patch
>
>
> Currently, `TupleWritable.readFields` will, for every field in the tuple, create a new
Writable of that field type using reflection (`WritableFactories.newInstance`), through `TupleWritable.getWritable`,
in order to deserialize that field. This burns up an unfortunate amount of CPU time.
> I've got a patch for this that caches the writables to be reused (just as the TupleWritable
itself is reused throughout hadoop). It appears to work, at least for our cases. I think it
will break if you ever have heterogenous tuple types, but that seems like a bad idea, if
not already proscribed in the documentation somewhere.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
|