phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Mahonin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-2088) Prevent splitting and recombining select expressions for MR integration
Date Tue, 21 Jul 2015 21:05:05 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635815#comment-14635815
] 

Josh Mahonin commented on PHOENIX-2088:
---------------------------------------

[~jamestaylor] If [~tdsilva] is comfortable with making the changes, I have no problems reviewing
it. It's a busy time for me right now, so unfortunately I don't think I'll get a chance to
dive into this until some time tomorrow. The spark integration tests are a pretty good indicator
if the changes are compatible or not.

The trouble with Spark is it's attempt to serialize data to each worker, and it can do so
in subtle and frustrating ways. If the 'ColumnInfoToStringEncoderDecoder' class is back, then
the code in trunk should be pretty close to working I would think. However, if we need to
derive the columns from the configuration object, then each partition needs to instantiate
its own version (as per the previous patch) to prevent serialization.

> Prevent splitting and recombining select expressions for MR integration
> -----------------------------------------------------------------------
>
>                 Key: PHOENIX-2088
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2088
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: Thomas D'Silva
>         Attachments: PHOENIX-2088-4.4-HBase-0.98-v2.patch, PHOENIX-2088-4.4-HBase-0.98.patch,
PHOENIX-2088-pig.patch, PHOENIX-2088-wip-v2.patch, PHOENIX-2088-wip-v3.patch, PHOENIX-2088-wip.patch
>
>
> We currently send in the select expressions for the MR integration with a delimiter separated
string, split based on the delimiter, and then recombine again using a comma separator. This
is problematic because the delimiter character may appear in a select expression, thus breaking
this logic. Instead, we should use a comma as the delimiter and avoid splitting and recombining
as it's not necessary in that case. Instead, the entire string can be used as-is in that case
to form the select expressions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message