[ https://issues.apache.org/jira/browse/SPARK-19692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15878431#comment-15878431
]
Don Smith commented on SPARK-19692:
------------------------------------
an even more trivial example:
{code}
val sc = SparkSession.builder.appName("test").getOrCreate()
val schema = StructType(Seq(StructField("byte", BinaryType)))
val byte = Seq(Array(0x8C.toByte))
val df = sc.createDataFrame(
sc.sparkContext.parallelize(byte, 1).map { ip =>
SQLRow(ip)
}, schema
)
logger.info(df.show)
val query = df
.where(df("byte") >= Array(0x00.toByte))
.where(df("byte") <= Array(0xFF.toByte))
logger.info(query.explain(true))
val results = query.collect()
results.length mustEqual 1
{code}
i'm having trouble believing this is the expected behavior, and if it is, is it defined somewhere?
> Comparison on BinaryType has incorrect results
> ----------------------------------------------
>
> Key: SPARK-19692
> URL: https://issues.apache.org/jira/browse/SPARK-19692
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.1.0
> Reporter: Don Smith
>
> I believe there is an issue with comparisons on binary fields:
> {code}
> val sc = SparkSession.builder.appName("test").getOrCreate()
> val schema = StructType(Seq(StructField("ip", BinaryType)))
> val ips = Seq("1.1.1.1", "2.2.2.2", "200.10.6.7").map(s => InetAddress.getByName(s).getAddress)
> val df = sc.createDataFrame(
> sc.sparkContext.parallelize(ips, 1).map { ip =>
> Row(ip)
> }, schema
> )
> val query = df
> .where(df("ip") >= InetAddress.getByName("200.10.0.0").getAddress)
> .where(df("ip") <= InetAddress.getByName("200.10.255.255").getAddress)
> logger.info(query.explain(true))
> val results = query.collect()
> results.length mustEqual 1
> {code}
> returns no results.
> i believe the problem is that the comparison is coercing the bytes to signed integers
in the call to compareTo here in TypeUtils:
> {code}
> def compareBinary(x: Array[Byte], y: Array[Byte]): Int = {
> for (i <- 0 until x.length; if i < y.length) {
> val res = x(i).compareTo(y(i))
> if (res != 0) return res
> }
> x.length - y.length
> }
> {code}
> with some hacky testing i was able to get the desired results with: {code} val res =
(x(i).toByte & 0xff) - (y(i).toByte & 0xff) {code}
> thanks!
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org
|