sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Attila Szabo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-2906) Optimization of AvroUtil.toAvroIdentifier
Date Tue, 17 May 2016 19:28:12 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15287354#comment-15287354

Attila Szabo commented on SQOOP-2906:

First of all,

Joeri, thank you so much help. It's much clearer now! If you would interested in having +1
contributor, I would more than glad to provide pull requests for this project in my freetime
(e.g. direct CM investigation solution )..

The other thing:
After running a few test I'm quite sure that the current version what Joeri had provided works
quite good, and would provided the required performance boost. It is possible maybe later
we would be able to identify some synergies between his changes, and changes related to class
writer, but I do think we should release it right now.

[~jarcec], [~abrahamfine],
Could you please also review the changeset, and provide your feedback or if it also looks
alright on your side, put it into upstream?

Thanks in advance!

> Optimization of AvroUtil.toAvroIdentifier
> -----------------------------------------
>                 Key: SQOOP-2906
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2906
>             Project: Sqoop
>          Issue Type: Improvement
>            Reporter: Joeri Hermans
>            Assignee: Joeri Hermans
>              Labels: avro, hadoop, optimization
>         Attachments: diff.txt
> Hi all
> Our distributed profiler indicated some inefficiencies in the AvroUtil.toAvroIdentifier
method, more specifically, the use of Regex patterns. This can be directly observed from the
FlameGraph generated by this profiler (https://jhermans.web.cern.ch/jhermans/sqoop_avro_flamegraph.svg).
We implemented an optimization, and compared this with the original method. On our testing
machine, the optimization by itself is about 500% (on average) more efficient compared to
the original implementation. We have yet to test how this optimization will influence the
performance of user jobs.
> Any suggestions or remarks are welcome.
> Kind regards,
> Joeri
> https://github.com/apache/sqoop/pull/18
> Writeup:
> https://db-blog.web.cern.ch/blog/joeri-hermans/2016-04-hadoop-performance-troubleshooting-stack-tracing-introduction

This message was sent by Atlassian JIRA

View raw message