db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Features: Tablepartitioning, Tablespaces and replication and Loadbalancing
Date Tue, 26 Apr 2005 18:01:46 GMT
just my opinion, but for what it's worth:

improved online backup - this seems like a good addition to derby.  The
     current state is that you can take a backup while the system is
     running, but updating transactions will block until the backup is
     finished.  Recently implemented rollforward recovery makes 
implementing a full non-blocking online backup a next logical step.

table partitioning-
       The question here is why do want to partition the table.  If it
       just to spread I/O randomly across disks, I don't think it is a
       very useful feature.  The same thing can easily accomplished on
       most modern hardware/OS's at a lower level while presenting the
       disk farm as one disk to the JVM/derby.

       Now if you are talking about key partitioning then that may be
       useful, but only if accompanying work is done to partition
       query execution in parallel against those partitions.  Below
       I will describe one approach that I think is the easiest and most
       maintainable first step towards this.

replication -
       For this is can see a few directions:

       1) master/offline slave(s), hot stand by - Again with recent 
completed work of rollfoward
          recovery it would not be too hard to set up a secondary system
          which was ready to take over when a primary failed.  Basically
          copy the db, and then stream the logs across and apply the
          logs using existing recovery algorithms when you want to bring
          the system online.  Once the first slave initiated update is
          applied no new updates from master work using this algorithm.

       2) master/read only slave(s), hot stand by, with read only access
          - Building on 1, this would
          again not be too hard.  Some work needed to guarantee read
          access while applying recovery logic online rather than during
          boot.  Save caveats as above

       3) master/(read/write slave(s)), very hard - the usual problems,
          what do you do with with conflicts.  Such a system may better
          be handled by doing a higher level update/conflict tracking 
than the log.  maybe something like mysql does.

Load Balancing - I don't know what you are looking for here.  Would be
     be interested in more detail here.

An approach to a more scalable Derby Database (again just an opinion,
and note I don't have plans nor expertise to actually build the 
following, but would seem like a good project for someone interested in 
building distributed optimizer technology):

Taking a shared nothing approach to scalability, the following seems 
like a good first step to providing a more scalable Derby database.
Rather than partitioning tables within a single derby database, instead 
use the existing derby database software in a single node in a multi-node
distributed database.  To do this build a new piece of software that glues
a network of derby databases together, each piece of the database could
be on the same machine or different machine.

The new software would handle the following:
1) Some new set of ddl which would could partition a single distributed 
table across multiple local derby databases.
2) Handle dml, sending it to appropriate local database.
3) optimizer/execution - this is the interesting part.  Needs to 
partition queries, in parallel sending/receiving data from/to local dbs.
4) For extra credit one could build a fault tolerant system by applying
    RAID algorithms to the local db's.  Lose one local DB it could be
    rebuilt from other replicated db's.
4) probably a lot else I haven't mentioned.
Some benefits of doing this in derby:
1) If multiple partioned db's are local to distributed server then all 
communication can easily using embedded derby server interfaces - making 
them go fast.  In first implementation I would suggest just using 
standard jdbc between the distributed derby server and the local nodes 
as the easiest way to get it all working.
2) If using jdbc, same exact code will work to access local vs. 
networked local db's.
3) Seems like using the same kind of "driver" trick as does the network 
server, applications could
use this new distributed db with no code changes (apart from ddl to set
the system up).
4) Using derby modules, one can probably reuse derby code for some 
pieces (like the sql parser), while not slowing down the core 
non-networked derby version.   If done right a local system can be
configured that includes no networked code overhead, while from the
same codeline a distributed version can also be built.

I like this approach to a distributed derby database rather than trying
to make one set of code handle both local and network paths.  An 
optimizer is hard enough without making a single optimizer handle both 
local and distributed decisions.  It also means that local user 
performance does not suffer from code path issues from unused code.

apoc9009@yahoo.de wrote:
> Hi all,
> In theese Days there are some fine Databases out there but no one has 
> the Features of
> Java and its scale abilities.For example i cannot mix a MySQL 32 and 64 
> Bit Database on my
> given Hardware. I can only use 64-Bit or 64-Bit Systems "only".
> MySQL runs fast on Linux and poor on FreeBSD and other UNIX System 
> without some
> modifications (for example Threadlib issue) or dirty Tricks.
> Sometimes a Feature will be avaiable on Windows only and on Linux not 
> e.t.c.
> With a Java SQL-Database like Derby there is a real Chance to have all 
> the cool
> Features of the Database at any System at any tme, as long als a J2SE 
> JVM is
> present.
> Derby is the Right Way but is there any Plans to make it Enterprise 
> ready (replication,
> Loadbalancing of Connections,  Online Backup, Table PArtitioning)?
> Josh Carpenter

View raw message