spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tigerquoll <tigerqu...@outlook.com>
Subject Re: [Discuss] Datasource v2 support for manipulating partitions
Date Tue, 18 Sep 2018 01:26:36 GMT
Hi Jayesh,
I get where you are coming from - partitions are just an implementation
optimisation that we really shouldn’t be bothering the end user with. 
Unfortunately that view is like saying RPC is like a procedure call, and
details of the network transport should be hidden from the end user. CORBA
tried this approach for RPC and failed for the same reason that no major
vendor of DBMS systems that support partitions try to hide them from the end
user.  They have a substantial real world effect that is impossible to hide
from the user (in particular when writing/modifying the data source).  Any
attempt to “take care” of partitions automatically invariably guesses wrong
and ends up frustrating the end user (as “substantial real world effect”
turns to “show stopping performance penalty” if the user attempts to fight
against a partitioning scheme she has no idea exists)

So if we are not hiding them from the user, we need to allow users to
manipulate them. Either by representing them generically in the API,
allowing pass-through commands to manipulate them, or by some other means.

Regards,
Dale.




--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message