carbondata-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravindra Pesala <ravi.pes...@gmail.com>
Subject [ANNOUNCE] Apache CarbonData 1.5.0 release
Date Tue, 16 Oct 2018 11:49:45 GMT
Hi,

Apache CarbonData community is pleased to announce the release of the
Version 1.5.0 in The Apache Software Foundation (ASF).

CarbonData is a high-performance data solution that supports various data
analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter
lookups on detail record, streaming analytics, and so on. CarbonData has
been deployed in many enterprise production environments, in one of the
largest scenarios, it supports queries on a single table with 3PB data
(more than 5 trillion records) with response time less than 3 seconds!

We encourage you to use the release
https://dist.apache.org/repos/dist/release/carbondata/1.5.0/, and feedback
through the CarbonData user mailing lists <user@carbondata.apache.org>!

This release note provides information on the new features, improvements,
and bug fixes of this release.
What’s New in CarbonData Version 1.5.0?

CarbonData 1.5.0 intention was to move closer to unified analytics. We want
to enable CarbonData files to be read from more engines/libraries to
support various use cases. In this regard, we have added support to read
CarbonData files from c++ libraries. Additionally, CarbonData files can be
read using Java SDK, Spark FileFormat interface, Spark, Presto.

CarbonData added multiple optimisations to reduce the store size so that
query can take advantage of lesser IO. Several enhancements have been made
to Streaming support from CarbonData.

In this version of CarbonData, more than 150 JIRA tickets related to new
features, improvements, and bugs have been resolved. Following are the
summary.
Ecosystem IntegrationSupport Spark 2.3.2 ecosystem integration

Now CarbonData supports Spark 2.3.2

Spark 2.3.2 has many performance improvements in addition to critical bug
fixes. Spark 2.3.2 has many improvements related to Streaming and
unification of interfaces. In 1.5.0 version, CarbonData integrated with
Spark so that future versions of CarbonData can add enhancements based on
Spark's new and improved capabilities.
Support Hadoop 3.1.1 ecosystem integration

Now CarbonData supports Hadoop 3.1.1 which is the latest and stable hadoop
version and supports many new features.(EC, federation cluster etc.)
LightWeight Integration with Spark

CarbonData now supports the Spark FileFormat Data Source APIs so that
CarbonData can be integrated to Spark as an external file source. This
integration helps to query CarbonData tables from SparkSession, it also
helps applications which needs standard compliance's with respect to
interfaces.

Spark data source APIs support file format level operations such as read
and write. CarbonData’s enhanced features namely IUD, Alter, Compaction,
Segment Management, Streaming will not be available to use when CarbonData
is integrated as a Spark’s data source through the data source API.
CarbonData CoreAdaptive Encoding for Numeric Columns

CarbonData now supports adaptive encoding for numeric columns. Adaptive
encoding helps to store each data of a column as a delta of Min/Max value
of that column, there by reducing the effective bits required to store the
value. This results in smaller store size there by increasing the query
performance due to lesser IO. Adaptive encoding for dictionary columns is
already supported from version 1.1.0, now supports for all numeric columns.

Performance improvement measurement is not complete in 1.5.0. The results
will be published along with 1.5.1 release.
Configurable Column Size for Generating Min/Max

CarbonData generates Min/Max index for all columns and uses it for
effective pruning of data while querying. Generating Min/Max for columns
having longer width(like address column) will lead to increased storage
size, increased memory footprint there by reducing the query performance.
Moreover filters are not applied on such columns and hence there is no
necessity of generating the indexes; or the filters on such columns are
very minimal and would be wise to have lower query performance in such
scenarios, rather than affecting the over all performance for other filter
scenarios due to increased index size. CarbonData now supports configuring
the limit of the column width(in terms of characters) beyond which the
Min/Max generation would be skipped.

By Default the Min/Max is generated for all string columns. Users who are
aware of they data schema and know the columns which have more number of
characters and on which filters will not be applied upon, can configure the
exclude such columns; or the maximum length of characters upto which the
Min/Max can be generated can be specified so that CarbonData would skip
Min/Max index generation  if the column character length crosses this
configured threshold. By default string columns with more than 200 bytes
are skipped from Min/Max index generation. In Java each character occupies
2 characters.Hence column length greater than 100 characters are skipped
from Min/Max index generation.
Support for Map Complex Data Type

CarbonData has integrated map complex data type support. Map data schema
defined in Avro can be stored into CarbonData tables. Map data types help
for an efficient look up of data. Adding Map complex data type support
CarbonData helps the user to directly store their Avro data without writing
the conversion logic into CarbonData supported data types.
Support for Byte and Float Data Types

CarbonData supports Byte and Float data types so that the data types
defined in Avro schema can be stored into CarbonData tables. Columns of
Byte data type can be included in sort columns.
ZSTD Compression

ZSTD compression is supported to compress each page of CarbonData file.
ZSTD offers better compression ratio there by reducing the store size. On
the average ZSTD compression reduces store size by 20-30% . ZSTD
compression is supported to compress sort temp files written during data
loading.
CarbonData SDKSDK Supports C++ Interfaces to read CarbonData files

To enable integration with non java based execution engines, CarbonData
supports C++ reader to read the CarbonData files. These readers can be
integrated with any execution engine and queried for data stored in
CarbonData tables without the dependency on Spark or Hadoop.
*Multi-Thread Safe W**riter** API in SDK *

To improve the write performance when using SDK, CarbonData supports
multi-thread safe writer APIs. This enables the applications to write data
to a single CarbonData file in parallel. Multi-Thread safe writers help in
generating bigger CarbonData files there by avoiding the small files
problem faced in HDFS.
StreamingStreamSQL supports Kafka as streaming source

StreamSQL DDL now supports specifying Kafka as streaming source. With this
support, users need not write custom application to ingest streaming data
from Kafka into CarbonData. They can easily do so by specifying 'format' as
'kafka' in CREATE TABLE DDL.
StreamSQL supports Json records from Kafka/socket streaming sources

Now StreamSQL can accept Json as data format in addition to csv. This helps
the users not to write their custom applications to ingest streaming data
into CarbonData.
Min/Max Index Support for Streaming Segment

CarbonData supports generating Min/Max indexes for Streaming segment so
that filter pruning is more efficient and increases the query performance.
CarbonData is able to serve the queries faster due to the Min/Max indexes
built at various levels. Adding Min/Max index support to Stream segment
will enable CarbonData to serve the queries with same performance as other
columnar segments.
Debugging and Maintenance enhancementsData Summary Tool

CarbonData supports a CLI tool to retrieve the statistical information from
each CarbonData file.It can list various parameters like number of
blocklets, pages, encoding types, Min/Max indexes. This tool is useful to
identify the reason for a block/blocklet selection during pruning.Looking
at the Min/Max indexes, user can easily decide the size of blocklet so as
to avoid false positives. Scan performance benchmarking is supported from
this tool. User can use this to identify the time taken to scan each
blocklet for a particular column.
Other Improvements

   - Code optimized to avoid unnecessary listing of CarbonData files stored
   in S3, resulting in S3 performance enhancement.
   - Now SDK supports Varchar columns greater than 32K characters.
   - Now you can decide the sort_scope during CarbonData write operation
   from SDK.
   - Memory footprint for Dataloading with Local dictionary is optimized to
   consume approximately 2x times that of DataLoading with Global Dictionary.
   In earlier versions, the memory footprint was 10x.
   - SDK APIs are more simplified for easy accommodation of new input types
   (for example, CSV, JSON, and so on) without modifying much of business
   code.
   - Bloom Filter quality has been further enhanced by fixing various bugs
   related to bloom index creation and clean up. Now bloom filter scan for In
   Expressions have be optimised to scan once.
   - MV datamap quality has been enhanced by fixing numerous bugs related
   to MV selection logic and by supporting various sql constructs. Examples
   have been added to explain the usage of MV.
   - Compaction bug of ignoring subsequent segments from compacting when
   configuration is of (X,1) is handled.
   - SHOW SEGMENT command now displays the size of each segment. This helps
    the user to perform maintenance operations like compaction, backup.
   - SDK has been enhanced to support long_string_columns, Map complex data
   type, sort_scope.

Behavioral ChangesRenaming of Table Names

Earlier renaming of CarbonData table used to rename in Hive metastore as
well as folder name on HDFS. Now, it will be renamed only in Hive metastore.

Changed Configuration Default Values
Configuration name
Old Value
New Value
bloom_size 32000 640000
bloom_fpp 0.01 0.00001
carbon.stream.parser org.apache.carbondata.streaming.parser.
CSVStreamParserImp org.apache.carbondata.streaming.parser.RowStreamParserImp
New Configuration Parameters
Configuration name
Default Value
Range
carbon.minmax.allowed.byte.count 200 bytes (100 characters) 10-1000 bytes
carbon.insert.persist.enable false NA
carbon.insert.storage.level MEMORY_AND_DISK
http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence
carbon.update.storage.level MEMORY_AND_DISK
http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence
carbon.global.sort.rdd.storage.level MEMORY_ONLY
http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence


Please find the detailed JIRA list:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12341006
Sub-task

   - [CARBONDATA-2512
   <https://issues.apache.org/jira/browse/CARBONDATA-2512>] - Support
   long_string_columns in sdk
   - [CARBONDATA-2633
   <https://issues.apache.org/jira/browse/CARBONDATA-2633>] - Bugs are
   found when bloomindex column is dictionary/sort/date column
   - [CARBONDATA-2634
   <https://issues.apache.org/jira/browse/CARBONDATA-2634>] - Provide more
   information about the datamap when showing datamaps
   - [CARBONDATA-2635
   <https://issues.apache.org/jira/browse/CARBONDATA-2635>] - Support
   different provider based index datamaps on same column
   - [CARBONDATA-2637
   <https://issues.apache.org/jira/browse/CARBONDATA-2637>] - Fix bugs for
   deferred rebuild for bloomfilter datamap
   - [CARBONDATA-2650
   <https://issues.apache.org/jira/browse/CARBONDATA-2650>] - explain query
   shows negative skipped blocklets for bloomfilter datamap
   - [CARBONDATA-2653
   <https://issues.apache.org/jira/browse/CARBONDATA-2653>] - Fix bugs in
   incorrect blocklet number in bloomfilter
   - [CARBONDATA-2654
   <https://issues.apache.org/jira/browse/CARBONDATA-2654>] - Optimize
   output for explaining query with datamap
   - [CARBONDATA-2655
   <https://issues.apache.org/jira/browse/CARBONDATA-2655>] - Support `in`
   operator for bloomfilter datamap
   - [CARBONDATA-2657
   <https://issues.apache.org/jira/browse/CARBONDATA-2657>] -
   Loading/Filtering empty value fails on bloom index columns
   - [CARBONDATA-2660
   <https://issues.apache.org/jira/browse/CARBONDATA-2660>] - Support
   filtering on longstring bloom index columns
   - [CARBONDATA-2675
   <https://issues.apache.org/jira/browse/CARBONDATA-2675>] - Support
   config long_string_columns when create datamap
   - [CARBONDATA-2681
   <https://issues.apache.org/jira/browse/CARBONDATA-2681>] - Fix loading
   problem using global/batch sort fails when table has long string columns
   - [CARBONDATA-2683
   <https://issues.apache.org/jira/browse/CARBONDATA-2683>] - Fix data
   convertion problem for Varchar
   - [CARBONDATA-2685
   <https://issues.apache.org/jira/browse/CARBONDATA-2685>] - make datamap
   rebuild for all segments in parallel
   - [CARBONDATA-2687
   <https://issues.apache.org/jira/browse/CARBONDATA-2687>] - update
   document for bloomfilter
   - [CARBONDATA-2693
   <https://issues.apache.org/jira/browse/CARBONDATA-2693>] - Fix bug for
   alter rename is renameing the existing table on which bloomfilter datamp
   exists
   - [CARBONDATA-2694
   <https://issues.apache.org/jira/browse/CARBONDATA-2694>] - show
   long_string_columns in desc table command
   - [CARBONDATA-2702
   <https://issues.apache.org/jira/browse/CARBONDATA-2702>] - Fix bugs in
   clear bloom datamap
   - [CARBONDATA-2706
   <https://issues.apache.org/jira/browse/CARBONDATA-2706>] - clear bloom
   index file after segment is deleted
   - [CARBONDATA-2708
   <https://issues.apache.org/jira/browse/CARBONDATA-2708>] - clear index
   file if dataloading is failed
   - [CARBONDATA-2790
   <https://issues.apache.org/jira/browse/CARBONDATA-2790>] - Optimize
   default parameter for bloomfilter datamap
   - [CARBONDATA-2811
   <https://issues.apache.org/jira/browse/CARBONDATA-2811>] - Add query
   test case using search mode on table with bloom filter
   - [CARBONDATA-2835
   <https://issues.apache.org/jira/browse/CARBONDATA-2835>] - Block MV
   datamap on streaming table
   - [CARBONDATA-2844
   <https://issues.apache.org/jira/browse/CARBONDATA-2844>] - SK AK not
   getting passed to executors for global sort
   - [CARBONDATA-2845
   <https://issues.apache.org/jira/browse/CARBONDATA-2845>] - Merge bloom
   index files of multi-shards for each index column
   - [CARBONDATA-2851
   <https://issues.apache.org/jira/browse/CARBONDATA-2851>] - support zstd
   as column compressor
   - [CARBONDATA-2852
   <https://issues.apache.org/jira/browse/CARBONDATA-2852>] - support zstd
   on legacy store
   - [CARBONDATA-2853
   <https://issues.apache.org/jira/browse/CARBONDATA-2853>] - Add min/max
   index for streaming segment
   - [CARBONDATA-2859
   <https://issues.apache.org/jira/browse/CARBONDATA-2859>] - add sdv test
   case for bloomfilter datamap
   - [CARBONDATA-2869
   <https://issues.apache.org/jira/browse/CARBONDATA-2869>] - SDK support
   for Map DataType
   - [CARBONDATA-2894
   <https://issues.apache.org/jira/browse/CARBONDATA-2894>] - Add support
   for complex map type through spark carbon file format API
   - [CARBONDATA-2922
   <https://issues.apache.org/jira/browse/CARBONDATA-2922>] - support long
   string columns with spark FileFormat and SDK with "long_string_columns"
   TableProperties
   - [CARBONDATA-2935
   <https://issues.apache.org/jira/browse/CARBONDATA-2935>] - Write
   is_sorted field in file footer
   - [CARBONDATA-2942
   <https://issues.apache.org/jira/browse/CARBONDATA-2942>] - Add read and
   write support for writing min max based on configurable bytes count
   - [CARBONDATA-2952
   <https://issues.apache.org/jira/browse/CARBONDATA-2952>] - Provide
   CarbonReader C++ interface for SDK
   - [CARBONDATA-2957
   <https://issues.apache.org/jira/browse/CARBONDATA-2957>] - update
   document about zstd support in carbondata

Bug

   - [CARBONDATA-1787
   <https://issues.apache.org/jira/browse/CARBONDATA-1787>] - Carbon 1.3.0-
   Global Sort: Global_Sort_Partitions parameter doesn't work, if specified in
   the Tblproperties, while creating the table.
   - [CARBONDATA-2418
   <https://issues.apache.org/jira/browse/CARBONDATA-2418>] - Presto can't
   query Carbon table when carbonstore is created at s3
   - [CARBONDATA-2478
   <https://issues.apache.org/jira/browse/CARBONDATA-2478>] - Add
   datamap-developer-guide.md file in readme
   - [CARBONDATA-2515
   <https://issues.apache.org/jira/browse/CARBONDATA-2515>] - Filter OR
   Expression not working properly in Presto integration
   - [CARBONDATA-2516
   <https://issues.apache.org/jira/browse/CARBONDATA-2516>] - Filter
   Greater-than for timestamp datatype not generating Expression in
   PrestoFilterUtil
   - [CARBONDATA-2528
   <https://issues.apache.org/jira/browse/CARBONDATA-2528>] - MV Datamap -
   When the MV is created with the order by, then when we execute the
   corresponding query defined in MV with order by, then the data is not
   accessed from the MV.
   - [CARBONDATA-2530
   <https://issues.apache.org/jira/browse/CARBONDATA-2530>] - [MV] Wrong
   data displayed when parent table data are loaded
   - [CARBONDATA-2531
   <https://issues.apache.org/jira/browse/CARBONDATA-2531>] - [MV] MV not
   hit when alias is in use
   - [CARBONDATA-2534
   <https://issues.apache.org/jira/browse/CARBONDATA-2534>] - MV Dataset -
   MV creation is not working with the substring()
   - [CARBONDATA-2539
   <https://issues.apache.org/jira/browse/CARBONDATA-2539>] - MV Dataset -
   Subqueries is not accessing the data from the MV datamap.
   - [CARBONDATA-2540
   <https://issues.apache.org/jira/browse/CARBONDATA-2540>] - MV Dataset -
   Unionall queries are not fetching data from MV dataset.
   - [CARBONDATA-2542
   <https://issues.apache.org/jira/browse/CARBONDATA-2542>] - MV creation
   is failed for other than default database
   - [CARBONDATA-2550
   <https://issues.apache.org/jira/browse/CARBONDATA-2550>] - [MV] Limit is
   ignored when data fetched from MV, Query rewrite is Wrong
   - [CARBONDATA-2560
   <https://issues.apache.org/jira/browse/CARBONDATA-2560>] - [MV]
   Exception in console during MV creation but MV registered successfully
   - [CARBONDATA-2568
   <https://issues.apache.org/jira/browse/CARBONDATA-2568>] - [MV] MV
   datamap is not hit when ,column is in group by but not in projection
   - [CARBONDATA-2576
   <https://issues.apache.org/jira/browse/CARBONDATA-2576>] - MV Datamap -
   MV is not working fine if there is more than 3 aggregate function in the
   same datamap.
   - [CARBONDATA-2610
   <https://issues.apache.org/jira/browse/CARBONDATA-2610>] - DataMap
   creation fails on null values
   - [CARBONDATA-2614
   <https://issues.apache.org/jira/browse/CARBONDATA-2614>] - There are
   some exception when using FG in search mode and the prune result is none
   - [CARBONDATA-2616
   <https://issues.apache.org/jira/browse/CARBONDATA-2616>] - Incorrect
   explain and query result while using bloomfilter datamap
   - [CARBONDATA-2629
   <https://issues.apache.org/jira/browse/CARBONDATA-2629>] - SDK carbon
   reader don't support filter in HDFS and S3
   - [CARBONDATA-2644
   <https://issues.apache.org/jira/browse/CARBONDATA-2644>] - Validation
   not present for carbon.load.sortMemory.spill.percentage parameter
   - [CARBONDATA-2658
   <https://issues.apache.org/jira/browse/CARBONDATA-2658>] - Fix bug in
   spilling in-memory pages
   - [CARBONDATA-2674
   <https://issues.apache.org/jira/browse/CARBONDATA-2674>] - Streaming
   with merge index enabled does not consider the merge index file while
   pruning.
   - [CARBONDATA-2703
   <https://issues.apache.org/jira/browse/CARBONDATA-2703>] - Fix bugs in
   tests
   - [CARBONDATA-2711
   <https://issues.apache.org/jira/browse/CARBONDATA-2711>] -
   carbonFileList is not initalized when updatetablelist call
   - [CARBONDATA-2715
   <https://issues.apache.org/jira/browse/CARBONDATA-2715>] - Failed to run
   tests for Search Mode With Lucene in Windows env
   - [CARBONDATA-2729
   <https://issues.apache.org/jira/browse/CARBONDATA-2729>] - Schema
   Compatibility problem between version 1.3.0 and 1.4.0
   - [CARBONDATA-2758
   <https://issues.apache.org/jira/browse/CARBONDATA-2758>] - selection on
   local dictionary fails when column having all null values more than default
   batch size.
   - [CARBONDATA-2769
   <https://issues.apache.org/jira/browse/CARBONDATA-2769>] - Fix bug when
   getting shard name from data before version 1.4
   - [CARBONDATA-2802
   <https://issues.apache.org/jira/browse/CARBONDATA-2802>] - Creation of
   Bloomfilter Datamap is failing after UID,compaction,pre-aggregate datamap
   creation
   - [CARBONDATA-2823
   <https://issues.apache.org/jira/browse/CARBONDATA-2823>] - Alter table
   set local dictionary include after bloom creation fails throwing incorrect
   error
   - [CARBONDATA-2854
   <https://issues.apache.org/jira/browse/CARBONDATA-2854>] - Release table
   status file lock before delete physical files when execute 'clean files'
   command
   - [CARBONDATA-2862
   <https://issues.apache.org/jira/browse/CARBONDATA-2862>] - Fix exception
   message for datamap rebuild command
   - [CARBONDATA-2866
   <https://issues.apache.org/jira/browse/CARBONDATA-2866>] - Should block
   schema when creating external table
   - [CARBONDATA-2874
   <https://issues.apache.org/jira/browse/CARBONDATA-2874>] - Support SDK
   writer as thread safe api
   - [CARBONDATA-2886
   <https://issues.apache.org/jira/browse/CARBONDATA-2886>] - select filter
   with int datatype is showing incorrect result in case of table created and
   loaded on old version and queried in new version
   - [CARBONDATA-2888
   <https://issues.apache.org/jira/browse/CARBONDATA-2888>] - Support multi
   level sdk read support for carbon tables
   - [CARBONDATA-2901
   <https://issues.apache.org/jira/browse/CARBONDATA-2901>] - Problem: Jvm
   crash in Load scenario when unsafe memory allocation is failed.
   - [CARBONDATA-2902
   <https://issues.apache.org/jira/browse/CARBONDATA-2902>] - Fix showing
   negative pruning result for explain command
   - [CARBONDATA-2908
   <https://issues.apache.org/jira/browse/CARBONDATA-2908>] - the option of
   sort_scope don't effects while creating table by data frame
   - [CARBONDATA-2910
   <https://issues.apache.org/jira/browse/CARBONDATA-2910>] - Support
   backward compatability in fileformat and support different sort colums per
   load
   - [CARBONDATA-2924
   <https://issues.apache.org/jira/browse/CARBONDATA-2924>] - Fix parsing
   issue for map as a nested array child and change the error message in sort
   column validation for SDK
   - [CARBONDATA-2925
   <https://issues.apache.org/jira/browse/CARBONDATA-2925>] - Wrong data
   displayed for spark file format if carbon file has mtuiple blocklet
   - [CARBONDATA-2926
   <https://issues.apache.org/jira/browse/CARBONDATA-2926>] -
   ArrayIndexOutOfBoundException if varchar column is present before
   dictionary columns along with empty sort_columns.
   - [CARBONDATA-2927
   <https://issues.apache.org/jira/browse/CARBONDATA-2927>] - Multiple
   issue fixes for varchar column and complex columns that grows more than 2MB
   - [CARBONDATA-2932
   <https://issues.apache.org/jira/browse/CARBONDATA-2932>] -
   CarbonReaderExample throw some exception: Projection can't be empty
   - [CARBONDATA-2933
   <https://issues.apache.org/jira/browse/CARBONDATA-2933>] - Fix errors in
   spelling
   - [CARBONDATA-2940
   <https://issues.apache.org/jira/browse/CARBONDATA-2940>] - Fix
   BufferUnderFlowException for ComplexPushDown
   - [CARBONDATA-2955
   <https://issues.apache.org/jira/browse/CARBONDATA-2955>] - bug for
   legacy store and compaction with zstd compressor and
   adaptiveDeltaIntegralCodec
   - [CARBONDATA-2956
   <https://issues.apache.org/jira/browse/CARBONDATA-2956>] - CarbonReader
   can't support use configuration to read S3 data
   - [CARBONDATA-2967
   <https://issues.apache.org/jira/browse/CARBONDATA-2967>] - Select is
   failing on pre-aggregate datamap when thrift server is restarted.
   - [CARBONDATA-2969
   <https://issues.apache.org/jira/browse/CARBONDATA-2969>] - Query on
   local dictionary column is giving empty data
   - [CARBONDATA-2974
   <https://issues.apache.org/jira/browse/CARBONDATA-2974>] - Bloomfilter
   not working when created bloom on multiple columns and queried
   - [CARBONDATA-2975
   <https://issues.apache.org/jira/browse/CARBONDATA-2975>] - DefaultValue
   choosing and removeNullValues on range filters is incorrect
   - [CARBONDATA-2979
   <https://issues.apache.org/jira/browse/CARBONDATA-2979>] - select count
   fails when carbondata file is written through SDK and read through
   sparkfileformat for complex datatype map(struct->array->map)
   - [CARBONDATA-2980
   <https://issues.apache.org/jira/browse/CARBONDATA-2980>] - clear
   bloomindex cache when dropping datamap
   - [CARBONDATA-2982
   <https://issues.apache.org/jira/browse/CARBONDATA-2982>] -
   CarbonSchemaReader don't support Array<string>
   - [CARBONDATA-2984
   <https://issues.apache.org/jira/browse/CARBONDATA-2984>] - streaming
   throw NPE when there is no data in the task of a batch
   - [CARBONDATA-2986
   <https://issues.apache.org/jira/browse/CARBONDATA-2986>] - Table
   Properties are lost when multiple driver concurrently creating table
   - [CARBONDATA-2990
   <https://issues.apache.org/jira/browse/CARBONDATA-2990>] - JVM crashes
   when rebuilding the datamap.
   - [CARBONDATA-2991
   <https://issues.apache.org/jira/browse/CARBONDATA-2991>] -
   NegativeArraySizeException during query execution
   - [CARBONDATA-2992
   <https://issues.apache.org/jira/browse/CARBONDATA-2992>] - Fixed Between
   Query Data Mismatch issue for timestamp data type
   - [CARBONDATA-2993
   <https://issues.apache.org/jira/browse/CARBONDATA-2993>] - Concurrent
   data load throwing NPE randomly.
   - [CARBONDATA-2994
   <https://issues.apache.org/jira/browse/CARBONDATA-2994>] - Unify
   property name for badrecords path in create and load.
   - [CARBONDATA-2995
   <https://issues.apache.org/jira/browse/CARBONDATA-2995>] - Queries slow
   down after some time due to broadcast issue

New Feature

   - [CARBONDATA-2896
   <https://issues.apache.org/jira/browse/CARBONDATA-2896>] - Adaptive
   encoding for primitive data types
   - [CARBONDATA-2916
   <https://issues.apache.org/jira/browse/CARBONDATA-2916>] - Support
   CarbonCli tool for data summary
   - [CARBONDATA-2919
   <https://issues.apache.org/jira/browse/CARBONDATA-2919>] - StreamSQL
   support ingest from Kafka
   - [CARBONDATA-2945
   <https://issues.apache.org/jira/browse/CARBONDATA-2945>] - Support JSON
   record in StreamSQL
   - [CARBONDATA-2965
   <https://issues.apache.org/jira/browse/CARBONDATA-2965>] - Support scan
   performance benchmark tool
   - [CARBONDATA-2976
   <https://issues.apache.org/jira/browse/CARBONDATA-2976>] - Support
   dumping column chunk meta in CarbonCli

Improvement

   - [CARBONDATA-2309
   <https://issues.apache.org/jira/browse/CARBONDATA-2309>] - Add strategy
   to generate bigger carbondata files in case of small amount of data
   - [CARBONDATA-2428
   <https://issues.apache.org/jira/browse/CARBONDATA-2428>] - Support Flat
   folder structure in carbon.
   - [CARBONDATA-2532
   <https://issues.apache.org/jira/browse/CARBONDATA-2532>] - Carbon to
   support spark 2.3 version
   - [CARBONDATA-2549
   <https://issues.apache.org/jira/browse/CARBONDATA-2549>] - Implement LRU
   cache in Bloom filter based on Carbon LRU cache interface
   - [CARBONDATA-2553
   <https://issues.apache.org/jira/browse/CARBONDATA-2553>] - support ZSTD
   compression for sort temp file
   - [CARBONDATA-2593
   <https://issues.apache.org/jira/browse/CARBONDATA-2593>] - Add an option
   'carbon.insert.storage.level' to support configuring the storage level when
   insert into data with 'carbon.insert.persist.enable'='true'
   - [CARBONDATA-2594
   <https://issues.apache.org/jira/browse/CARBONDATA-2594>] - Incorrect
   logic when set 'Encoding.INVERTED_INDEX' for each dimension column
   - [CARBONDATA-2599
   <https://issues.apache.org/jira/browse/CARBONDATA-2599>] - Use
   RowStreamParserImp as default value of config 'carbon.stream.parser'
   - [CARBONDATA-2656
   <https://issues.apache.org/jira/browse/CARBONDATA-2656>] - Presto Stream
   Readers performance Enhancement
   - [CARBONDATA-2686
   <https://issues.apache.org/jira/browse/CARBONDATA-2686>] - Implement
   left outer join in mv
   - [CARBONDATA-2801
   <https://issues.apache.org/jira/browse/CARBONDATA-2801>] - Add
   documentation for flat folder
   - [CARBONDATA-2815
   <https://issues.apache.org/jira/browse/CARBONDATA-2815>] - Add
   documentation for memory spill and rebuild datamap
   - [CARBONDATA-2837
   <https://issues.apache.org/jira/browse/CARBONDATA-2837>] - Add MV
   Example in examples module
   - [CARBONDATA-2857
   <https://issues.apache.org/jira/browse/CARBONDATA-2857>] - Improvement
   in "How to contribute to Apache CarbonData" page
   - [CARBONDATA-2876
   <https://issues.apache.org/jira/browse/CARBONDATA-2876>] - Support Avro
   datatype conversion to Carbon Format
   - [CARBONDATA-2879
   <https://issues.apache.org/jira/browse/CARBONDATA-2879>] - Support Sort
   Scope for SDK
   - [CARBONDATA-2884
   <https://issues.apache.org/jira/browse/CARBONDATA-2884>] - Should rename
   the methods of ByteUtil class to avoid the misuse
   - [CARBONDATA-2899
   <https://issues.apache.org/jira/browse/CARBONDATA-2899>] - Add MV
   modules to assembly JAR
   - [CARBONDATA-2900
   <https://issues.apache.org/jira/browse/CARBONDATA-2900>] - Add dynamic
   configuration support for some system properties
   - [CARBONDATA-2903
   <https://issues.apache.org/jira/browse/CARBONDATA-2903>] - Fix compiler
   warnings
   - [CARBONDATA-2905
   <https://issues.apache.org/jira/browse/CARBONDATA-2905>] - Should allow
   set stream property on streaming table
   - [CARBONDATA-2906
   <https://issues.apache.org/jira/browse/CARBONDATA-2906>] - Show segment
   data size in SHOW SEGMENT command
   - [CARBONDATA-2907
   <https://issues.apache.org/jira/browse/CARBONDATA-2907>] - Support
   setting blocklet size in table property
   - [CARBONDATA-2909
   <https://issues.apache.org/jira/browse/CARBONDATA-2909>] - Support
   Multiple User reading and writing through SDK.
   - [CARBONDATA-2911
   <https://issues.apache.org/jira/browse/CARBONDATA-2911>] - Remove unused
   BTree related code
   - [CARBONDATA-2915
   <https://issues.apache.org/jira/browse/CARBONDATA-2915>] - Updates to
   CarbonData documentation and structure
   - [CARBONDATA-2929
   <https://issues.apache.org/jira/browse/CARBONDATA-2929>] - Add block
   skipped info for explain command
   - [CARBONDATA-2938
   <https://issues.apache.org/jira/browse/CARBONDATA-2938>] - Update
   comment of blockletId in IndexDataMapRebuildRDD
   - [CARBONDATA-2947
   <https://issues.apache.org/jira/browse/CARBONDATA-2947>] - Adaptive
   encoding support for timestamp no dictionary and Refactor ColumnPageWrapper
   - [CARBONDATA-2948
   <https://issues.apache.org/jira/browse/CARBONDATA-2948>] - Support Float
   and Byte Datatypes for SDK and DataSource
   - [CARBONDATA-2961
   <https://issues.apache.org/jira/browse/CARBONDATA-2961>] - Simplify SDK
   API interfaces
   - [CARBONDATA-2963
   <https://issues.apache.org/jira/browse/CARBONDATA-2963>] - Add support
   to add byte column as a sort column
   - [CARBONDATA-2964
   <https://issues.apache.org/jira/browse/CARBONDATA-2964>] - Unsupported
   Float datatype exception for query with more than 1 page
   - [CARBONDATA-2966
   <https://issues.apache.org/jira/browse/CARBONDATA-2966>] - Update
   Documentation For Avro DataType conversion
   - [CARBONDATA-2972
   <https://issues.apache.org/jira/browse/CARBONDATA-2972>] - Debug Logs
   and a function for type of Adaptive Encoding
   - [CARBONDATA-2973
   <https://issues.apache.org/jira/browse/CARBONDATA-2973>] - Add
   Documentation for complex Columns for Local Dictionary Support
   - [CARBONDATA-2983
   <https://issues.apache.org/jira/browse/CARBONDATA-2983>] - Change bloom
   query model to proceed multiple filter values
   - [CARBONDATA-2985
   <https://issues.apache.org/jira/browse/CARBONDATA-2985>] - Fix issues in
   Table level compaction and TableProperties
   - [CARBONDATA-2989
   <https://issues.apache.org/jira/browse/CARBONDATA-2989>] - Upgrade spark
   integration version to 2.3.2

Task

   - [CARBONDATA-2756
   <https://issues.apache.org/jira/browse/CARBONDATA-2756>] - Add BSD
   license for ZSTD external dendency
   - [CARBONDATA-2839
   <https://issues.apache.org/jira/browse/CARBONDATA-2839>] - Add custom
   compaction example


-- 
Thanks & Regards,
Ravindra

Mime
View raw message