trafodion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anoop Sharma <anoop.sha...@esgyn.com>
Subject RE: Question about trafodion insert
Date Sat, 28 Apr 2018 04:45:44 GMT
For each insert statement, sql engine need to:
- compile the query
- begin transaction
- send data to hbase server
- commit transaction.

For simple queries like an insert, most of the cost is in the 3rd step where values are sent
to hbase server.
This is a process hop from mxosrvr to hbase region server.

When the insert query has multiple values, then all those values are buffered up and sent
to hbase.
That will reduce the number of process sends that have to be done between mxosrvr and region
server.

This is the same improvement that is seen if values are sent from client to server using rowsets.
In that mode, all rows that are included in the rowset are shipped in one call from client
to server.
That will make it run much faster compared to executing the query with one row.
(there are limits of rowset size and buffer size sent to hbase. Values are internally buffered
up to
the max buffer/rowset size).

Compile time query cache will cache the insert statement.
 Which means that the statement ' insert into song values(i)' when issued multiple times will
use
the cached plan.
But that will not reduce the buffer being sent from mxosrvr to region server (step #3 above).
That will
be done whenever an insert stmt is issued.

In addition to use of rowsets or multiple values in one insert, there are other clauses that
can be done to make inserts 
run faster. Those are use of upsert to avoid duplicate detection, use of 'upsert using load'
to eliminate transaction
being started/committed. Some of them depend on application need and its usage.

another note: using multiple values in one insert statement (insert ... values (i), (i+1)...)
is ok if the number
of values are 'small' but it is better to use rowsets if large number of values are to be
inserted.

anoop

-----Original Message-----
From: Song, Hao-Lin <haolin.song@esgyn.cn> 
Sent: Friday, April 27, 2018 7:50 PM
To: dev@trafodion.apache.org
Subject: Question about trafodion insert

Hi all

I have a question about trafodion insert that if trafodion executes query ‘insert into table
values(i),(i+1)…,(i+100)’ N/100 times faster than executing query ‘insert into table
values(i) N times. So I made a simple test:
I used jdbcT4 to execute query ‘insert into song values(i)’ 10K times (i from 1 to 10K,
10k rows)and it cost about 89 seconds. Then I executed query ‘insert into song values(i),(I
+1),…,(i+100)’ 100 times(total of 10,000 rows) and it cost about 8.6 seconds.
I am not sure if it only because simple 1 needs to send buffer to mxosrvr 10k times. But I
am more concerned about if query ‘insert into table values(i),(),(),(I + N)’ will be executed
faster in sql engine than executing ‘insert into table values(i)’ N times.

Besides, how will the engine handle query ‘insert into table values(a),(b),(c)’? Will
engine use execute plan cache?

Best,
宋昊霖 (Haolin(Leo) Song)


Mime
View raw message