kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Kuznetsov <Andrey_Kuznet...@epam.com>
Subject RE: [kudu] import from hdfs
Date Thu, 17 Aug 2017 09:27:12 GMT
Yep, I’ve added a few Gbs ☺
But it provides minimal effect for performance of import

Best regards,
ANDREY KUZNETSOV
Software Engineering Team Leader, Assessment Global Discipline Head (Java)

Office: +7 482 263 00 70 x 42766<tel:+7%20482%20263%2000%2070;ext=42766>   Cell: +7
920 154 05 72<tel:+7%20920%20154%2005%2072>   Email: andrey_kuznetsov@epam.com<mailto:andrey_kuznetsov@epam.com>
Tver, Russia   epam.com<http://www.epam.com/>

CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or entity(ies) to which it
is addressed and contains information that is legally privileged and confidential. If you
are not the intended recipient, or the person responsible for delivering the message to the
intended recipient, you are hereby notified that any dissemination, distribution or copying
of this communication is strictly prohibited. All unintended recipients are obliged to delete
this message and destroy any printed copies.

From: Jean-Daniel Cryans [mailto:jdcryans@apache.org]
Sent: Wednesday, August 16, 2017 9:39 PM
To: user@kudu.apache.org
Cc: Special SBER-BPOC Team <SpecialSBER-BPOCTeam@epam.com>
Subject: Re: [kudu] import from hdfs

Huh this is confusing, how much memory did you say you have per node? You mentioned 256GB
but I'm not sure what it relates to anymore because I see you gave 400GB to Kudu in there.

Also, why a single disk? Is HDFS using more than one?

On Tue, Aug 15, 2017 at 9:40 AM, Andrey Kuznetsov <Andrey_Kuznetsov@epam.com<mailto:Andrey_Kuznetsov@epam.com>>
wrote:
Hi Jean-Daniel,
No problem, you can find screen in attachment,
Could not provide the log due security reasons, sorry…

Best regards,
ANDREY KUZNETSOV
Software Engineering Team Leader, Assessment Global Discipline Head (Java)

Office: +7 482 263 00 70 x 42766<tel:+7%20482%20263%2000%2070;ext=42766>   Cell: +7
920 154 05 72<tel:+7%20920%20154%2005%2072>   Email: andrey_kuznetsov@epam.com<mailto:andrey_kuznetsov@epam.com>
Tver, Russia   epam.com<http://www.epam.com/>

CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or entity(ies) to which it
is addressed and contains information that is legally privileged and confidential. If you
are not the intended recipient, or the person responsible for delivering the message to the
intended recipient, you are hereby notified that any dissemination, distribution or copying
of this communication is strictly prohibited. All unintended recipients are obliged to delete
this message and destroy any printed copies.

From: Jean-Daniel Cryans [mailto:jdcryans@apache.org<mailto:jdcryans@apache.org>]
Sent: Thursday, August 10, 2017 6:55 PM

To: user@kudu.apache.org<mailto:user@kudu.apache.org>
Cc: Special SBER-BPOC Team <SpecialSBER-BPOCTeam@epam.com<mailto:SpecialSBER-BPOCTeam@epam.com>>
Subject: Re: [kudu] import from hdfs

Hi Andrey,

Can you double check how much memory is actually given to Kudu? That's --memory_limit_hard_bytes.
Providing us with a full kudu-tserver log could be useful, as long as it starts with this
line "Tablet server non-default flags".

Without more data about your situation it's going to be really hard to help you.

Thx,

J-D

On Thu, Aug 10, 2017 at 4:46 AM, Andrey Kuznetsov <Andrey_Kuznetsov@epam.com<mailto:Andrey_Kuznetsov@epam.com>>
wrote:
Hi Jean-Daniel,
Nice to hear you)

I use kudu 1.3, I hope kudu has enough memory (about 256Gb each node),
I have played with threads parameter, but there are no a lot of differences -
it is extremely slow…

Best regards,
ANDREY KUZNETSOV
Software Engineering Team Leader, Assessment Global Discipline Head (Java)

Office: +7 482 263 00 70 x 42766<tel:+7%20482%20263%2000%2070;ext=42766>   Cell: +7
920 154 05 72<tel:+7%20920%20154%2005%2072>   Email: andrey_kuznetsov@epam.com<mailto:andrey_kuznetsov@epam.com>
Tver, Russia   epam.com<http://www.epam.com/>

CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or entity(ies) to which it
is addressed and contains information that is legally privileged and confidential. If you
are not the intended recipient, or the person responsible for delivering the message to the
intended recipient, you are hereby notified that any dissemination, distribution or copying
of this communication is strictly prohibited. All unintended recipients are obliged to delete
this message and destroy any printed copies.

From: Jean-Daniel Cryans [mailto:jdcryans@apache.org<mailto:jdcryans@apache.org>]
Sent: Wednesday, August 9, 2017 10:52 PM
To: user@kudu.apache.org<mailto:user@kudu.apache.org>
Cc: Special SBER-BPOC Team <SpecialSBER-BPOCTeam@epam.com<mailto:SpecialSBER-BPOCTeam@epam.com>>
Subject: Re: [kudu] import from hdfs

Hi Andrey,

Which version of Kudu and Impala are you using? Just that can make a huge difference.

Apart from that, make sure Kudu has enough memory (no memory back pressure), you have enough
maintenance manager threads (1/3 or 1/4 the number of disks), and that your partitioning favors
good load distribution.

But TBH writing to Parquet will remain faster than writing to Kudu, because Kudu isn't just
dropping the rows into a file and has to do more than that.

Hope this helps,

J-D

On Wed, Aug 9, 2017 at 9:05 AM, Andrey Kuznetsov <Andrey_Kuznetsov@epam.com<mailto:Andrey_Kuznetsov@epam.com>>
wrote:
Hi folk,
I have a problem with hdfs to kudu performance, I have created external table with CSV data
and ran “insert as select”  from it to kudu-table and to parquet-table:
Importing to parquet-table is 3x faster than to kudu – do you know some tips/tricks to increase
performance of import?
actually I am importing 8TB of data, so it is critical for me,

Best regards,
ANDREY KUZNETSOV
Software Engineering Team Leader, Assessment Global Discipline Head (Java)

Office: +7 482 263 00 70 x 42766<tel:+7%20482%20263%2000%2070;ext=42766>   Cell: +7
920 154 05 72<tel:+7%20920%20154%2005%2072>   Email: andrey_kuznetsov@epam.com<mailto:andrey_kuznetsov@epam.com>
Tver, Russia   epam.com<http://www.epam.com/>

CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or entity(ies) to which it
is addressed and contains information that is legally privileged and confidential. If you
are not the intended recipient, or the person responsible for delivering the message to the
intended recipient, you are hereby notified that any dissemination, distribution or copying
of this communication is strictly prohibited. All unintended recipients are obliged to delete
this message and destroy any printed copies.




Mime
View raw message