phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Siddhi Mehta <sm26...@gmail.com>
Subject PhoenixHbaseStorage to Skip invalid rows
Date Tue, 03 Nov 2015 00:31:43 GMT
Hey All,

I wanted to add a notion of skipping invalid rows for PhoenixHbaseStorage
similar to how the CSVBulkLoad tool has an option of ignoring the bad
rows.I did some work on the apache pig code that allows Storers to have a
notion of Customizable/Configurable Errors PIG-4704
<https://issues.apache.org/jira/browse/PIG-4704>.

I wanted to plug this behavior for PhoenixHbaseStorage and propose certain
changes for the same.

*Current Behavior/Problem:*

PhoenixRecordWriter makes use of executeBatch() to process rows once batch
size is reached. If there are any client side validation/syntactical errors
like data not fitting the column size, executeBatch() throws an exception
and there is no-way to retrieve the valid rows from the batch and retry
them. We discard the whole batch or fail the job without errorhandling.

With auto commit set to false execute() also servers the purpose of not
making any rpc calls  but does a bunch of validation client side and adds
it to the client cache of mutation.

On conn.commit() we make a rpc call.

*Proposed Change*

To be able to use Configurable ErrorHandling and ignore only the failed
records instead of discarding the whole batch I want to propose changing
the behavior in PhoenixRecordWriter from execute to executeBatch() or
having a configuration to toggle between the 2 behaviors
Thoughts?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message