Thanks for your input Soma , but I = am actually looking to understand the differences and not only on the perfo= rmance.

---- On Sun, 05 Apr 2020 02:21:0= 7 -0400 somplasticllc@gmail.com wrote ----

If you want to  measure optimisation= in terms of time taken , then here is an idea  :)

pub= lic class MyClass {
public static void= main(String args[])
throws Inte= rruptedException
{
long start  =3D  System.cu= rrentTimeMillis();

// enough data to measure

long end  =3D System.currentTimeMilli= s();

&nb= sp;      int timeTaken =3D 0;
=     timeTaken =3D (int) (end  - start );

System.out.println("Ti= me taken  " + timeTaken) ;
}
}

On Sat, 4= Apr 2020, 19:07 , <email@yeikel.com> wrote:

Dear Community, <= /p>

Re= cently, I had to solve the following problem =E2=80=9Cfor every entry of a = Dataset[String], concat a constant value=E2=80=9D , and to solve it, I used= built-in functions :

= ;

val data =3D Seq("A","b","c").toDS

=3D=3D Physical Plan =3D=3D

LocalTableScan [valueconcat#161]

As an alterna= tive , a much simpler version of the program is to use map, but it adds a s= erialization step that does not seem to be present for the version above : =

scala> data.map(e=3D> s"\$e concat").explain<= /u>

=3D=3D Physical Plan =3D=3D

=

*(1) SerializeFromObject [staticinvoke(class org.apa= che.spark.unsafe.types.UTF8String, StringType, fromString, input[0, java.la= ng.String, true], true, false) AS value#92]

+- *(1) MapElements <function1>, obj#91: java.lang.String=

+- *(1) DeserializeToObj= ect value#12.toString, obj#90: java.lang.String

+- LocalTableScan [value#12]<= u>

Is this over-optimization or is this the right way to go?  = ;

As a follow up , is there any better API to get the one and = only column available in a DataSet[String] when using built-in functions? = =E2=80=9Ccol(data.columns.head)=E2=80=9D works but it is not ideal.<= u>

Thanks!

------=_Part_245736_475696837.1586276095531--