gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ferdy Galema <ferdy.gal...@kalooga.com>
Subject Re: Operations (Deletes) in Gora
Date Thu, 02 Aug 2012 10:28:03 GMT
Hi Ed,

I agree that the "overwritten" state should be introduced into Gora. For
example, from an API perspective it is currently difficult to do the
following: Clear a map and put some new entries in them. A way to do this
now is to make sure that a client inputs the entire map so that it
explicitely iterates over all old entries and delete them. (Followed by
adding new entries). The current clear() implementation of StatefulHashMap
actually does this explicit delete-marking of all entries. Of course I
agree that setting a flag "overwritten" for that map field would be much
easier.

Can you verify that this is what you mean with "overwritten"?

Ferdy.

On Mon, Jul 30, 2012 at 11:19 PM, Ed Kohlwey <ekohlwey@gmail.com> wrote:

> Hi Renato,
> When I say mutable, I mean mutable in memory. Ie. in the same sense that
> strings are immutable in Java. This is just to prevent aliasing errors in
> people's code from creating changes in objects that won't be marked by the
> state tracking system I'm working on. Most Avro types are actually
> immutable; the only exceptions being records, maps, and lists.
>
> Lets say you have a data schema that looks like this:
>
> {
>   "type":"record",
>   "name":"Parent",
>   "fields":[
>     {"name": "child",
>      "type": {"name":"Child","type":"record",
>      ...
>     }
>   ]
> }
>
> So if your mapping in your store provider takes the field "child" and
> serializes it and stores it as an immutable blob, thats just how things
> will work; you won't look at the fine-grained state tracking information.
> If you wanted to use a more sophisticated mapping, where perhaps you use
> something like an xpath expression as the key for the "child" field, then
> you could do that too and make full use of the fine-grained state tracking.
>
> There's a number of scenarios where this might be desirable. You can, as I
> mentioned above, use a more complicated mapping mechanism like xpath
> expressions to denote record fields (and nested structure) rather than flat
> serialization into a particular column family or qualifier. You could also
> use features like column families on some data stores to represent two
> entities that are related but that you might want to sometimes access at
> the same time and other times not at the same time, so you would use some
> of the dirty metadata but not always all of it.
>
> So in summary, the level of sophistication would really depend on the
> particular data store and what features the maintainers of that code want
> to expose via its mapping.
>
> The dirty metadata itself shouldn't be persisted into the data store; it is
> only for keeping track of changes that occur to records to make sure that
> you always have enough information in the gora objects to clean up
> key-values that might be left in the data store.
>
>
> On Mon, Jul 30, 2012 at 2:22 PM, Renato MarroquĂ­n Mogrovejo <
> renatoj.marroquin@gmail.com> wrote:
>
> > Hi Ed,
> >
> > I have a couple of questions w.r.t. I am in the middle of implementing
> > the DynamoDB data store for Gora, and there are some severe
> > differences in Gora API between disk based data stores and web service
> > ones.
> > You are proposing to classify fields into:
> >
> > - Mutable. These ones will have the four states: clean, dirty,
> > deleted, and overwritten.
> > - Inmutable.
> >
> > Where most avro based objects will be mutable. How do you think we
> > could model Gora Api to deal with web service based data stores (e.g.
> > DynamoDB, GAE)? In these cases, the objects we are talking about are
> > inmutable objects because they all are primitive objects, and most of
> > the transactional methods are handled inside service providers. Do you
> > think we should create these attributes as well? Or what kind of
> > attributes do inmutable objects should have?
> > Thanks in advance!
> >
> >
> > Renato M.
> >
> > 2012/7/29 Ed Kohlwey <ekohlwey@gmail.com>:
> > > What I'm talking about is not specific to the Avro store. The issue is
> > that
> > > state information can be lost during the mutation process. For example,
> > one
> > > record has another record as a field. In this regard the sub-record
> > > represents a map. But deletion state in a record is not tracked; to
> have
> > > enough information to make sure you can go back and delete the kvs in
> the
> > > original store , you need to know what the original value was
> (depending
> > on
> > > how the store does mappings) or do a range delete. Maps also do not
> > retain
> > > enough information to be expressive in this regard; they maintain
> deleted
> > > state but do not describe in a granular fashion the original state of
> the
> > > object.
> > >
> > > My current thinking is to strictly define four states for fields:
> clean,
> > > which means no mutation is pending for a record; dirty, which means a
> > write
> > > is pending on a record; deleted, which means that a delete mutation is
> > > pending; and overwritten, which is equivalent to dirty and delete.
> Fields
> > > will be strictly separated into two categories: mutable (maps, arrays,
> > and
> > > records) and immutable (bytes, strings, and other primitives). All
> > > non-immutable fields should have the original state of any mutated
> fields
> > > stored either via a tombstone object or dirty bits. Tombstone objects
> > will
> > > be used to describe the original state of a mutable object that needs
> to
> > be
> > > deleted, and dirty bits will be used to signal that the current state
> of
> > > the object is not yet persistent.
> > >
> > > Sent from my smartphone. Please excuse any typos or shorthand.
> > > On Jul 29, 2012 1:49 PM, "Lewis John Mcgibbney" <
> > lewis.mcgibbney@gmail.com>
> > > wrote:
> > >
> > >> Hi Ed,
> > >>
> > >> Yeah I actually noticed that deletes are not available/supported in
> > >> Avro store in trunk and in the your 84 patch. As I'm more or less
> > >> coming into the Avro stuff blind... does Avro do deletes or is it just
> > >> that we don't yet support in Gora?
> > >>
> > >> Best
> > >> Lewis
> > >>
> > >> On Sun, Jul 29, 2012 at 6:17 PM, Ed Kohlwey <ekohlwey@gmail.com>
> wrote:
> > >> > I've found the apparent semantics of deletes to be pretty
> inconsistent
> > >> > through my work on the Avro port. I don't think enough state
> > information
> > >> is
> > >> > actually stored to implement it reliably. I'm currently working on
> > adding
> > >> > this on top of my Gora 84 work.
> > >> >
> > >> > Sent from my smartphone. Please excuse any typos or shorthand.
> > >> > On Jul 29, 2012 11:27 AM, "Lewis John Mcgibbney" <
> > >> lewis.mcgibbney@gmail.com>
> > >> > wrote:
> > >> >
> > >> >> Hi,
> > >> >>
> > >> >> What kind of conversation needs to be kicked off here?
> > >> >> Currently as it stands deletes (and some other operations) in
Gora
> > >> >> seem to be shrouded in mystery... :0|
> > >> >>
> > >> >> Deletes seem to be implemented in gora-accumulo fine, maybe Keith
> can
> > >> >> confirm? Also some of the semantics about what Accumulo expects
> > >> >> deletes to be like and whether or not it is working OK for your
use
> > >> >> case?
> > >> >> Ferdy provided important input into GORA-155 stating that there
> needs
> > >> >> to be more clarity w.r.t semantics for versions of operations
on a
> > >> >> general level before we begin to implement functionality willy
> nilly
> > >> >> at datastore level.
> > >> >>
> > >> >> Through Hector we can do many alternative delete operations for
> > >> >> Cassandra and this is great but I think it is important for us
to
> > >> >> establish some general rules about what we wish the Gora API to
> > >> >> offer/achieve.
> > >> >>
> > >> >> Any comments?
> > >> >> Thanks
> > >> >> Lewis
> > >> >>
> > >> >> --
> > >> >> Lewis
> > >> >>
> > >>
> > >>
> > >>
> > >> --
> > >> Lewis
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message