spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Don Smith (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-19692) Comparison on BinaryType returns no results
Date Wed, 22 Feb 2017 02:25:44 GMT

     [ https://issues.apache.org/jira/browse/SPARK-19692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Don Smith  updated SPARK-19692:
-------------------------------
    Description: 
I believe there is an issue with comparisons on binary fields:
{code}
      val sc = SparkSession.builder.appName("test").getOrCreate()
      val schema = StructType(Seq(StructField("ip", BinaryType)))

      val ips = Seq("1.1.1.1", "2.2.2.2", "200.10.6.7").map(s => InetAddress.getByName(s).getAddress)

      val df = sc.createDataFrame(
        sc.sparkContext.parallelize(ips, 1).map { ip =>
          Row(ip)
        }, schema
      )

      val query = df
        .where(df("ip") >= InetAddress.getByName("200.10.0.0").getAddress)
        .where(df("ip") <= InetAddress.getByName("200.10.255.255").getAddress)

      logger.info(query.explain(true))
      val results = query.collect()
      results.length mustEqual 1
{code}

returns no results.
i believe the problem is that the comparison is coercing the bytes to signed integers in the
call to compareTo here in TypeUtils: 
{code}
  def compareBinary(x: Array[Byte], y: Array[Byte]): Int = {
    for (i <- 0 until x.length; if i < y.length) {
      val res = x(i).compareTo(y(i))
      if (res != 0) return res
    }
    x.length - y.length
  }
{code}

with some hacky testing i was able to get the desired results with: {code} val res = (x(i).toByte
& 0xff) - (y(i).toByte & 0xff) {code}

thanks!

  was:
I believe there is an issue with comparisons on binary fields:
{code}
      val sc = SparkSession.builder.appName("test").getOrCreate()
      val schema = StructType(Seq(StructField("ip", BinaryType)))

      val ips = Seq("1.1.1.1", "2.2.2.2", "200.10.6.7").map(s => InetAddress.getByName(s).getAddress)

      val df = sc.createDataFrame(
        sc.sparkContext.parallelize(ips, 1).map { ip =>
          Row(ip)
        }, schema
      )

      val query = df
        .where(df("ip") >= InetAddress.getByName("200.10.0.0").getAddress)
        .where(df("ip") <= InetAddress.getByName("200.10.255.255").getAddress)

      logger.info(query.explain(true))
      val results = query.collect()
      results.length mustEqual 1
{code}

returns no results.
i believe the problem is that the comparison is coercing the bytes to signed integers in the
call to compareTo here in TypeUtils: 
{code}
  def compareBinary(x: Array[Byte], y: Array[Byte]): Int = {
    for (i <- 0 until x.length; if i < y.length) {
      val res = x(i).compareTo(y(i))
      if (res != 0) return res
    }
    x.length - y.length
  }
{code}

with some hacky testing i was able to get the desired results with: {{ val res = (x(i).toByte
& 0xff) - (y(i).toByte & 0xff) }}

thanks!


> Comparison on BinaryType returns no results
> -------------------------------------------
>
>                 Key: SPARK-19692
>                 URL: https://issues.apache.org/jira/browse/SPARK-19692
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Don Smith 
>
> I believe there is an issue with comparisons on binary fields:
> {code}
>       val sc = SparkSession.builder.appName("test").getOrCreate()
>       val schema = StructType(Seq(StructField("ip", BinaryType)))
>       val ips = Seq("1.1.1.1", "2.2.2.2", "200.10.6.7").map(s => InetAddress.getByName(s).getAddress)
>       val df = sc.createDataFrame(
>         sc.sparkContext.parallelize(ips, 1).map { ip =>
>           Row(ip)
>         }, schema
>       )
>       val query = df
>         .where(df("ip") >= InetAddress.getByName("200.10.0.0").getAddress)
>         .where(df("ip") <= InetAddress.getByName("200.10.255.255").getAddress)
>       logger.info(query.explain(true))
>       val results = query.collect()
>       results.length mustEqual 1
> {code}
> returns no results.
> i believe the problem is that the comparison is coercing the bytes to signed integers
in the call to compareTo here in TypeUtils: 
> {code}
>   def compareBinary(x: Array[Byte], y: Array[Byte]): Int = {
>     for (i <- 0 until x.length; if i < y.length) {
>       val res = x(i).compareTo(y(i))
>       if (res != 0) return res
>     }
>     x.length - y.length
>   }
> {code}
> with some hacky testing i was able to get the desired results with: {code} val res =
(x(i).toByte & 0xff) - (y(i).toByte & 0xff) {code}
> thanks!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message