hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth Jayachandran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-11595) refactor ORC footer reading to make it usable from outside
Date Thu, 20 Aug 2015 18:14:45 GMT

    [ https://issues.apache.org/jira/browse/HIVE-11595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14705461#comment-14705461
] 

Prasanth Jayachandran commented on HIVE-11595:
----------------------------------------------

1) Can you rename the reader api from getSerializedFileMetadata() to getSerializedFooter()?
2) Also the variable fullFooterBuffer to serializedFooter?
3) Can you revert the signature of getAndCheckPostScript() to use Path instead of Object?
I am assuming metastore has enough information about where the ByteBuffer came from (i.e,
the path that the ByteBuffer belongs). It will be good to throw exception with path information
instead of just "Byte buffer" for which we won't have any clue. You can add another helper
for extractMetaInfoFromFooter that accepts Path as parameter.
4) Rename getAndCheckPostScript() to extractPostScript() to be inline with other extract methods?
5) What is the purpose of FooterInfo class? Apart from serialized footer and metadata (serialized
or non serialized?) what other information are stored in the metastore?

> refactor ORC footer reading to make it usable from outside
> ----------------------------------------------------------
>
>                 Key: HIVE-11595
>                 URL: https://issues.apache.org/jira/browse/HIVE-11595
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-10595.patch, HIVE-11595.01.patch
>
>
> If ORC footer is read from cache, we want to parse it without having the reader, opening
a file, etc. I thought it would be as simple as protobuf parseFrom bytes, but apparently there's
bunch of stuff going on there. It needs to be accessible via something like parseFrom(ByteBuffer),
or similar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message