hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-12307) Streaming API TransactionBatch.close() must abort any remaining transactions in the batch
Date Tue, 24 Nov 2015 19:28:11 GMT

    [ https://issues.apache.org/jira/browse/HIVE-12307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15025157#comment-15025157
] 

Alan Gates commented on HIVE-12307:
-----------------------------------

HiveEndPoint.TransactionBatchImpl:
* I'm +1 on making this package level, but does it do any good to make the class non-private
and leave the constructor private?
* Why did you make the isClosed value volatile?  Are you assuming users will use this in a
multi-threaded way?  I think this class is thread safe in as much as users can have per-thread
instances, but I don't think the class is re-entrant itself.  I see in other comments you're
expecting heartbeating to be done in a separate thread.  Is that the only case you think this
will be used?  If so, it seems we should comment this well so people know what is and isn't
safe to do multi-threaded.
* write()
** I don't understand why when the code gets a SerializationError it should still declare
success and re-throw.  I see re-throwing so the user can see the error, but don't we still
need to mark the batch a failure and abort?  How can the caller recover here?
** Since the logic here is getting more complex I would suggest that we refactor write(byte)
to be simply: write(Collections.singleList(byte)).  This way we only have the logic in one
place.
** A thought on the produceFault() calls.  Would it be better to create a special RecordWriter
implementation that wraps other RecordWriters and throws when you want it to?  That way you
don't have to pollute this code with produceFault() calls and you can eventually test errors
at more locations.
* abortImpl()
** We should look at adding a call to rollback a group of transactions, so that large aborted
batches don't cause a storm of metastore calls.  That obviously doesn't need to be part of
this patch.



> Streaming API TransactionBatch.close() must abort any remaining transactions in the batch
> -----------------------------------------------------------------------------------------
>
>                 Key: HIVE-12307
>                 URL: https://issues.apache.org/jira/browse/HIVE-12307
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog, Transactions
>    Affects Versions: 0.14.0
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>         Attachments: HIVE-12307.patch
>
>
> When the client of TransactionBatch API encounters an error it must close() the batch
and start a new one.  This prevents attempts to continue writing to a file that may damaged
in some way.
> The close() should ensure to abort the any txns that still remain in the batch and close
(best effort) all the files it's writing to.  The batch should also put itself into a mode
where any future ops on this batch fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message