spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shixiong Zhu <zsxw...@gmail.com>
Subject Re: Nested Complex Type Data Parsing and Transforming to table
Date Wed, 12 Nov 2014 13:00:19 GMT
Could you give an example of your data?

This line is wrong.

p(1).trim.map(_.toString.split("\002")).map(s =>
s.map(_.toString.split("\003")).map(t => StructField1(

For example, p(1) is a String, so in p(1).trim.map(x =>
x.toString.split("\002")), x is a Char. That should not be what you want.

If I understand your data format correctly:

Every line is a UltraComplexTextFileTable.
UltraComplexTextFileTable’s fields are separated by “\001”.
The Seq of StructField1 is separrated by “\002”.
StructField1’s fields are separrated by “\003”.

here is the correct code:

    val ultraComplexTextfile =
sc.textFile("/examples/src/main/resources/aggrnew.text").map(_.split("\001")).map
{
      p => UltraComplexTextFileTable(
        p(0).trim.toInt,
        p(1).trim.split("\002").map(_.split("\003")).map {
          t => StructField1(
            t(0).trim.toInt - 48,
            t(1).trim.toInt - 48,
            t(2).trim.toInt - 48,
            t(3).trim.toInt - 48,
            t(4).trim.toInt - 48,
            t(5).trim.toFloat - 48,
            t(6).trim.toFloat - 48,
            t(7).trim.toFloat - 48
          )
        },
        p(2))
    }

​

Best Regards,
Shixiong Zhu

2014-11-12 17:38 GMT+08:00 <luohui20001@sina.com>:

> Hi
>
>        I got a problem when reading a textfile which contains nested
> complex type data and got a type unmatch problem.Any hint will be
> appreciated.
>
>        The problem take place at "map(s => s.map"  as "type mismatch;
> found   :
> scala.collection.immutable.IndexedSeq[Array[com.redhadoop.bean.StructField1]]
> required: Seq[com.redhadoop.bean.StructField1]"
>
>
>
>      here is my code:
>
>
>
>     import sqlContext.createSchemaRDD
>
>     val ultraComplexTextfile =
> spark.textFile("/examples/src/main/resources/aggrnew.text").map(_.split("\001")).map(p
> => UltraComplexTextFileTable(
>       p(0).trim.toInt,
>       p(1).trim.map(_.toString.split("\002")).map(s =>
> s.map(_.toString.split("\003")).map(t => StructField1(
>         t(0).toString.trim.toInt - 48,
>         t(1).toString.trim.toInt - 48,
>         t(2).toString.trim.toInt - 48,
>         t(3).toString.trim.toInt - 48,
>         t(4).toString.trim.toInt - 48,
>         t(5).toString.trim.toFloat - 48,
>         t(6).toString.trim.toFloat - 48,
>         t(7).toString.trim.toFloat - 48
>         ))
>           ),
>
>       p(2))
>       )
>
> and my case class below:
>
> case class UltraComplexTextFileTable (
>     frameid:Int,
>     detaillist:Seq[StructField1],
>     frame:String
> )
>
>
>
> case class StructField1(
>     targetNum:Int,
>     x:Int,
>     y:Int,
>     width:Int,
>     height:Int,
>     HM:Float,
>     HS:Float,
>     HT:Float
> )
>
>
>
>
>
> --------------------------------
>
> Thanks&amp;Best regards!
> San.Luo
> Redhadoop
>

Mime
View raw message