phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddhi Mehta (JIRA)" <>
Subject [jira] [Created] (PHOENIX-2367) Change PhoenixRecordWriter to use execute instead of executeBatch
Date Wed, 04 Nov 2015 07:04:27 GMT
Siddhi Mehta created PHOENIX-2367:

             Summary: Change PhoenixRecordWriter to use execute instead of executeBatch
                 Key: PHOENIX-2367
             Project: Phoenix
          Issue Type: Improvement
            Reporter: Siddhi Mehta
            Assignee: Siddhi Mehta

Hey All,

I wanted to add a notion of skipping invalid rows for PhoenixHbaseStorage similar to how the
CSVBulkLoad tool has an option of ignoring the bad rows.I did some work on the apache pig
code that allows Storers to have a notion of Customizable/Configurable Errors PIG-4704.

I wanted to plug this behavior for PhoenixHbaseStorage and propose certain changes for the

Current Behavior/Problem:

PhoenixRecordWriter makes use of executeBatch() to process rows once batch size is reached.
If there are any client side validation/syntactical errors like data not fitting the column
size, executeBatch() throws an exception and there is no-way to retrieve the valid rows from
the batch and retry them. We discard the whole batch or fail the job without errorhandling.

With auto commit set to false execute() also servers the purpose of not making any rpc calls
 but does a bunch of validation client side and adds it to the client cache of mutation.

On conn.commit() we make a rpc call.

Proposed Change

To be able to use Configurable ErrorHandling and ignore only the failed records instead of
discarding the whole batch I want to propose changing the behavior in PhoenixRecordWriter
from execute to executeBatch() or having a configuration to toggle between the 2 behaviors

This message was sent by Atlassian JIRA

View raw message