sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ruslan Dautkhanov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-2981) sqoop import from jdbc: JdbcWritableBridge.readBigDecimal() takes a ton of cpu
Date Mon, 12 Sep 2016 19:56:20 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485109#comment-15485109
] 

Ruslan Dautkhanov commented on SQOOP-2981:
------------------------------------------

{quote}Hi Ruslan, 

This is a courtesy follow-up to remind you that this Service Request (SR) remains open and
active with our team, waiting for your update. 

Please provide us with an update at your earliest possible opportunity to let us know the
current status of the issue and to inform us of how we can be of further assistance to you
with respect to this particular issue. If there is no longer any requirement for this Service
Request (SR) to remain open, please advise. 

Thank you and best regards, 
Rosmari 
{quote}

> sqoop import from jdbc: JdbcWritableBridge.readBigDecimal() takes a ton of cpu
> ------------------------------------------------------------------------------
>
>                 Key: SQOOP-2981
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2981
>             Project: Sqoop
>          Issue Type: Bug
>          Components: codegen, connectors/oracle, sqoop2-jdbc-connector
>    Affects Versions: 1.4.5, 1.4.6
>         Environment: sqoop import from Oracle; saves as parquet file
>            Reporter: Ruslan Dautkhanov
>              Labels: decimal, import, jdbc, oracle, parquet
>
> Majority of time spent of sqoop import from Oracle was on converting Decimal. It was
2.5x times more than following most cpu consumer (snappy compression). 
> Sqoop was 100% on cpu in total, Oracle side was pretty bored.
> {quote}
>  JvmTop 0.8.0 alpha - 11:56:45,  amd64, 48 cpus, Linux 2.6.32-57, load avg 0.92
>  http://code.google.com/p/jvmtop
>  Profiling PID 25489: org.apache.hadoop.mapred.YarnChild 10.20
>   38.78% (     7.68s) com.cloudera.sqoop.lib.JdbcWritableBridge.readBigDecimal()
>   14.27% (     2.82s) org.xerial.snappy.SnappyNative.rawCompress()
>   12.67% (     2.51s) parquet.io.api.Binary$FromStringBinary.encodeUTF8()
>   10.28% (     2.04s) ...quet.column.values.dictionary.DictionaryValuesWriter$()
>    4.80% (     0.95s) ...quet.column.values.fallback.FallbackValuesWriter.writ()
>    3.69% (     0.73s) com.cloudera.sqoop.lib.JdbcWritableBridge.readString()
>    2.51% (     0.50s) parquet.avro.AvroWriteSupport.writeRecordFields()
>    2.30% (     0.46s) parquet.column.impl.ColumnWriterV1.write()
>    1.90% (     0.38s) ...quet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOp()
>    1.31% (     0.26s) ...quet.column.values.rle.RunLengthBitPackingHybridEncod()
>    1.27% (     0.25s) ...quet.column.values.dictionary.DictionaryValuesWriter$()
>    1.22% (     0.24s) parquet.hadoop.CodecFactory$BytesCompressor.compress()
>    0.65% (     0.13s) ...quet.column.values.dictionary.DictionaryValuesWriter$()
>    0.64% (     0.13s) ...quet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOp()
>    0.64% (     0.13s) parquet.bytes.CapacityByteArrayOutputStream.addSlab()
>    0.63% (     0.12s) parquet.io.api.Binary$ByteArrayBackedBinary.getBytes()
>    0.62% (     0.12s) ...quet.column.values.dictionary.DictionaryValuesWriter.()
>    0.58% (     0.12s) parquet.bytes.CapacityByteArrayOutputStream.setByte()
>    0.49% (     0.10s) parquet.hadoop.codec.SnappyUtil.validateBuffer()
>    0.44% (     0.09s) parquet.hadoop.InternalParquetRecordWriter.write()
> {quote}
> DDL of the table on Oracle side:
> {quote}
> CREATE TABLE someschema.sometable
> (
>   ID             NUMBER                         NOT NULL,
>   psn  VARCHAR2(50 BYTE),
>   MERCHID     NUMBER,
>   RECENCY        NUMBER,
>   FREQUENCY      NUMBER,
>   SPEND          NUMBER,
>   SPENDING       NUMBER,
>   FREQ           NUMBER
> )
> {quote}
> Sqoop parameters:
> {quote}
> sqoop import 
>         -Dmapred.job.name="sqoop import into out_table" 
>         --connect "jdbc:oracle:thin:@jdbc_tns" 
>         --username username --password password
>         --direct 
>         --compress --compression-codec snappy 
>         --as-parquetfile 
>         --target-dir hdfs_dir
>         --num-mappers num_mappers
>         --query "SELECT * FROM someschema.sometable WHERE \$CONDITIONS"
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message