hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <>
Subject [jira] [Work logged] (HIVE-16352) Ability to skip or repair out of sync blocks with HIVE at runtime
Date Wed, 26 Aug 2020 13:41:00 GMT


ASF GitHub Bot logged work on HIVE-16352:

                Author: ASF GitHub Bot
            Created on: 26/Aug/20 13:40
            Start Date: 26/Aug/20 13:40
    Worklog Time Spent: 10m 
      Work Description: gabrywu opened a new pull request #1434:

   ### What changes were proposed in this pull request?
   1. add AvroGenericRecordReader.nextRecord
   2. optimize adding ability to skip invalid sync blocks
   3. add enum value AVRO_SERDE_ERROR_SKIP to AvroSerdeUtils.AvroTableProperties
   ### Why are the changes needed?
   when reading the Avro file which has a bad file format in Hive, we want to skip the invalid
sync errors simply
   ### Does this PR introduce _any_ user-facing change?
   NO. The default value of AVRO_SERDE_ERROR_SKIP is false keeping the original logic
   ### How was this patch tested?
   add unit test cases in TestAvroGenericRecordReader.class

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:

Issue Time Tracking

            Worklog Id:     (was: 474808)
    Remaining Estimate: 0h
            Time Spent: 10m

> Ability to skip or repair out of sync blocks with HIVE at runtime
> -----------------------------------------------------------------
>                 Key: HIVE-16352
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Navdeep Poonia
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
> When a file is corrupted it raises the error Invalid sync! with
>  Can we have some functionality to skip or repair such blocks at runtime to make avro
more error resilient in case of data corruption.
>  Error: While processing
file s3n://<bucket>/navdeepp/warehouse/avro_test/354dc34474404f4bbc0d8013fc8e6e4b_000042. Invalid sync!
>  at
>  at
>  at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(

This message was sent by Atlassian Jira

View raw message