hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth Jayachandran (JIRA)" <>
Subject [jira] [Commented] (HIVE-19205) Hive streaming ingest improvements (v2)
Date Fri, 13 Apr 2018 22:44:00 GMT


Prasanth Jayachandran commented on HIVE-19205:

[~vgarg] Can this be included in 3.0.0 release?

> Hive streaming ingest improvements (v2)
> ---------------------------------------
>                 Key: HIVE-19205
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 3.0.0, 3.1.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>            Priority: Major
> This is umbrella jira to track hive streaming ingest improvements. At a high level following
are the improvements
> - Support for dynamic partitioning
> - API changes (simple streaming connection builder)
> - Hide the transaction batches from clients (client can tune the transaction batch but
doesn't have to know about the transaction batch size)
> - Support auto rollover to next transaction batch (clients don't have to worry about
closing a transaction batch and opening a new one)
> - Record writers will all be strict meaning the schema of the record has to match table
schema. This is to avoid the multiple serialization/deserialization for re-ordering columns
if there is schema mismatch
> - Automatic distribution for non-bucketed tables so that compactor can have more parallelism
> - Create delta files with all ORC overhead disabled (no index, no compression, no dictionary).
Compactor will recreate the orc files with index, compression and dictionary encoding.
> - Automatic memory management via auto-flushing (will yield smaller stripes for delta
files but is more scalable and clients don't have to worry about distributing the data across
> - Support for more writers (Avro specifically. ORC passthrough format?)
> - Support to accept input stream instead of record byte[]
> - Removing HCatalog dependency (old streaming API will be in the hcatalog package for
backward compatibility, new streaming API will be in its own hive module)

This message was sent by Atlassian JIRA

View raw message