kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexey Serbin <aser...@cloudera.com>
Subject Re: [Benchmarking]
Date Tue, 14 Mar 2017 19:31:03 GMT
On Tue, Mar 14, 2017 at 11:25 AM, Alexey Serbin <aserbin@cloudera.com>
wrote:

> Hi,
>
> It seems that sort of benchmark is not a trivial undertaking.  I'm sure
> there is a lot to consider while doing that sort of benchmark.  Probably,
> more senior members of the Kudu team could suggest something else, but
> right away I can suggest the following:
>
> 1. Consider using real hardware machines while doing the benchmark, not
> VMs.  Make sure the databases store their data on the same media when doing
> the comparison.
>
> 2. Make sure your benchmark schema is supported by both Kudu and
> PostgreSQL.  Probably, to perform the benchmark you would need to tweak
> your existing schema little bit.  Kudu supports a subset of types available
> in PostreSQL.  Also, pay attention to primary keys/indices and partitions
> if you running read/scan comparisons. Overall, in this context it's worth
> reading this document first: https://kudu.apache.org/docs/
> schema_design.html
>
> 3. Kudu is supposed to shine when working with huge amount of data spread
> across multiple machines in a cluster.  Are you about to use clustered
> setup for PostgreSQL as well?  May be worth considering to try clustered
> setup for PostgreSQL as well.
>
> 4. While creating Kudu tables, use just a single replica -- additional
> replicas add some latency for write operations because the write operation
> is considered successful only when by majority of existing replicas.  Also,
> since I didn't see
>

​Oops, something happened with those words.​  I meant

... only when acknowledged by the majority of existing replicas.  I'm
suggesting to use just a single replica since I didn't see anything
mentioned about replication for the PostgreSQL.
​

> 5. Consider placing WAL for both Kudu and PostgreSQL on an SSD -- this
> lowers latencies for DML operations.  I know that's so at least for Kudu,
> and I would expect that's true for PostgreSQL as well.
>
> 6. Pay some attention to run-time resource limits in effect while running
> those benchmarks:
>   https://www.postgresql.org/docs/9.6/static/runtime-config-resource.html
>   https://kudu.apache.org/docs/configuration_reference.html (search for
> flags containing 'memory' and 'cache_size' in their names)
>
>
> As for inserting your existing data into Kudu, consider using Impala:
> https://kudu.apache.org/docs/kudu_impala_integration.html
>
>
> Best regards,
>
> Alexey
>
> On Tue, Mar 14, 2017 at 8:01 AM, paulo faria <zikoco2@hotmail.com> wrote:
>
>> HI
>>
>>
>> Im doing a benchmark of Kudu(and other timeseriesdbs) Versus PostgresQL
>> 9.6.
>> Done ur VM demo tutorial already.
>>
>>
>> But now I would like to compare those 2. I already got the Postgresql
>> enviroment set (with some tables + data (1GB per table to test)) on a
>> remote server.
>> 1)What is ur advice for a query(reads) performance compare?
>> 2)Any way to convert(or migrate) the postgres structure to the Kudu? I
>> got my database on HUE Impala so i can query over there and download the
>> data also from there.
>>
>>
>> Any tips are apreciated
>>
>> Best Regards
>>
>> Paulo Faria
>>
>>
>

Mime
View raw message