hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Koifman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-12307) Streaming API TransactionBatch.close() must abort any remaining transactions in the batch
Date Tue, 24 Nov 2015 20:15:11 GMT

    [ https://issues.apache.org/jira/browse/HIVE-12307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15025266#comment-15025266
] 

Eugene Koifman commented on HIVE-12307:
---------------------------------------

bq. I'm +1 on making this package level, but does it do any good to make the class non-private
and leave the constructor private?
The class is made package level for testing only.  Private c'tor ensures that it's only constructed
via factory methods as originally implemented.
bq. Why did you make the isClosed value volatile? 
heartbeating is commonly done from separate thread, for example, Storm does it this way. 
Also, it's not unusual for  application clean up logic to come from a different thread (for
example calling close() as a form of cancel).  So this is volatile to make sure this works
properly regardless of how the client is implemented.
I didn't try any more general thread safety issues in this patch.  Judging by https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest#StreamingDataIngest-Example–Non-secureMode
the original intent was to NOT to have multiple threads in a StreamingConnection.  It's worthwhile
to do a thread safety review but was not my goal here.  

bq. write()
I'll refactor this



bq. SerializationError
This is was meant to indicate that a particular row is bad.  For example missing columns,
etc.  This gives the client ability to drop this row (or send to dead letter queue) since
replaying it won't help.  Unfortunately, w/o my changes here the client never sees SerializationError
- it gets wrapped in other exceptions.
bq. abortImpl()
there is https://issues.apache.org/jira/browse/HIVE-12440 for that

> Streaming API TransactionBatch.close() must abort any remaining transactions in the batch
> -----------------------------------------------------------------------------------------
>
>                 Key: HIVE-12307
>                 URL: https://issues.apache.org/jira/browse/HIVE-12307
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog, Transactions
>    Affects Versions: 0.14.0
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>         Attachments: HIVE-12307.patch
>
>
> When the client of TransactionBatch API encounters an error it must close() the batch
and start a new one.  This prevents attempts to continue writing to a file that may damaged
in some way.
> The close() should ensure to abort the any txns that still remain in the batch and close
(best effort) all the files it's writing to.  The batch should also put itself into a mode
where any future ops on this batch fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message