On Wed, Dec 8, 2010 at 5:35 AM, Alex Baranau wrote: > Hello, > > Please correct my following assumptions if they are wrong. > > HTable.delete(List) works a bit faster than delete(Delete) (if we need to > delete multiple records) because the former causes single RPC request > (single from client, but still chucks of deletes are sent to respective > region servers). And it looks like deleting itself is the same in current > implementation (as of 0.20.6 at least, what about trunk?): records deleted > one-by-one. > In 0.20, yes. In TRUNK, batching has been recast but in essence works the same way; i.e. once the 'Action' gets to the server, we add the edit one at a time. > If it's true, then it might makes sense to accept List (some > Writable form of it) in TableOutputFormat (along with currently accepted Put > and Delete): this could improve performance? > Yes. It could. Or, redo the TOF to take multiple Actions (in TRUNK): http://people.apache.org/~stack/hbase-0.90.0-candidate-1/docs/xref/org/apache/hadoop/hbase/client/Action.html Or fix our write buffer in HTable so it does Actions -- Puts and Deletes -- rather than just Puts. > Btw, it looks like with Puts we don't have this problem in case client-side > write buffer is used: put(Put) and put(List) are equivalent. Thats what I'd think... haven't tested it. Well, almost: > the first one creates extra ArrayList instance internally. Btw-2: it seems > like this instance is created for the sake of a bit better code > design/style, but doesn't look like it's worth it IMHO. > Smile. Thanks for looking into this Alex. St.Ack