phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "maghamravikiran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-2088) Prevent splitting and recombining select expressions for MR integration
Date Fri, 03 Jul 2015 23:04:04 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613501#comment-14613501
] 

maghamravikiran commented on PHOENIX-2088:
------------------------------------------

[~giacomotaylor] Yes. We don't need the separator character. From what I remember, we used
it to avoid multiple calls to PhoenixRuntime.generateColumnInfo() from within PhoenixConfigurationUtil.getSelectColumnMetadataList()
.  We store the list as a String the first time and for all subsequent calls, we were merely
deserializing the String back to ColumnInfo and returning. With the serialization process
removed, we would be fetching the ColumnInfo list every time. 

I just noticed we are serializing the ColumnInfo to String from within the CSVBulkLoadTool
MR job. https://github.com/apache/phoenix/blob/7f6bf10b2cc54279b9210772323dc8f4d2939a19/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/CsvToKeyValueMapper.java#L205

We should be seeing similar exception for index tables in Bulk Import also. Am I right?  


> Prevent splitting and recombining select expressions for MR integration
> -----------------------------------------------------------------------
>
>                 Key: PHOENIX-2088
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2088
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: maghamravikiran
>         Attachments: PHOENIX-2088-pig.patch, PHOENIX-2088-wip-v2.patch, PHOENIX-2088-wip.patch
>
>
> We currently send in the select expressions for the MR integration with a delimiter separated
string, split based on the delimiter, and then recombine again using a comma separator. This
is problematic because the delimiter character may appear in a select expression, thus breaking
this logic. Instead, we should use a comma as the delimiter and avoid splitting and recombining
as it's not necessary in that case. Instead, the entire string can be used as-is in that case
to form the select expressions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message