phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-1227) Upsert select of binary data doesn't always correctly coerce data into correct format
Date Mon, 01 Sep 2014 11:56:20 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117344#comment-14117344
] 

Gabriel Reid commented on PHOENIX-1227:
---------------------------------------

[~jamestaylor] do you have an opinion on the best way to approach this? I've looked at it
from a few different angles -- to me the one that makes the most sense is just to disallow
{code}UPSERT INTO MYTABLE (v) SELECT MD5(v) FROM MYTABLE{code} due to datatype mismatch.

> Upsert select of binary data doesn't always correctly coerce data into correct format
> -------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-1227
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1227
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Gabriel Reid
>
> If you run an upsert select statement that selects a binary value and writes a numerical
value (or probably other types as well), you can end up with invalid binary values stored
in HBase.
> For example, in something like this if v is an {{INTEGER}} column:
> {code}UPSERT INTO MYTABLE (v) SELECT MD5(v) FROM MYTABLE{code}
> the literal 16-byte binary values from the MD5 function will be added verbatim into the
field v. 
> This is a really big problem if v is the key field, as it can even lead to multiple keys
with what appear to be the same value. This happens if there are multiple (invalid) row keys
that begin with the same 4 bytes, as only the first 4 bytes of the key will be shown when
selecting data from the column, but the different full-length values of the row keys will
lead to multiple records.
> Somewhat related to this, a statement like the following (with a constant binary value)
will fail immediately due to datatype mismatch:
> {code}UPSERT INTO MYTABLE (v) SELECT MD5(1) FROM MYTABLE{code}
> It seems that the first expression above should probably fail in the same way as the
expression with the constant binary value (or neither of them should fail). Obviously there
shouldn't be any invalid values being written in to HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message