hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <j...@apache.org>
Subject [jira] [Work logged] (HIVE-16352) Ability to skip or repair out of sync blocks with HIVE at runtime
Date Thu, 27 Aug 2020 02:03:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-16352?focusedWorklogId=475100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-475100
]

ASF GitHub Bot logged work on HIVE-16352:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Aug/20 02:02
            Start Date: 27/Aug/20 02:02
    Worklog Time Spent: 10m 
      Work Description: gabrywu opened a new pull request #1436:
URL: https://github.com/apache/hive/pull/1436


   ### What changes were proposed in this pull request?
   1. add AvroGenericRecordReader.nextRecord
   2. optimize AvroGenericRecordReader.next adding ability to skip invalid sync blocks
   3. add enum value AVRO_SERDE_ERROR_SKIP to AvroSerdeUtils.AvroTableProperties
   
   ### Why are the changes needed?
   
   when reading the Avro file which has a bad file format in Hive, we want to skip the invalid
sync errors simply
   https://issues.apache.org/jira/browse/HIVE-16352
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   NO. The default value of AVRO_SERDE_ERROR_SKIP is false keeping the original logic
   
   ### How was this patch tested?
   
   add unit test cases in TestAvroGenericRecordReader.class
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 475100)
    Time Spent: 0.5h  (was: 20m)

> Ability to skip or repair out of sync blocks with HIVE at runtime
> -----------------------------------------------------------------
>
>                 Key: HIVE-16352
>                 URL: https://issues.apache.org/jira/browse/HIVE-16352
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Navdeep Poonia
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When a file is corrupted it raises the error java.io.IOException: Invalid sync! with
hive.
>  Can we have some functionality to skip or repair such blocks at runtime to make avro
more error resilient in case of data corruption.
>  Error: java.io.IOException: java.io.IOException: java.io.IOException: While processing
file s3n://<bucket>/navdeepp/warehouse/avro_test/354dc34474404f4bbc0d8013fc8e6e4b_000042.
java.io.IOException: Invalid sync!
>  at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>  at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>  at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:334)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message