spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <gourav.sengu...@gmail.com>
Subject Re: Newbie question on how to extract column value
Date Tue, 07 Aug 2018 15:33:09 GMT
Hi James,

It is always advisable to use the latest SPARK version. That said, can you
please giving a try to dataframes and udf if possible. I think, that would
be a much scalable way to address the issue.

Also in case possible, it is always advisable to use the filter option
before fetching the data to Spark.


Thanks and Regards,
Gourav

On Tue, Aug 7, 2018 at 4:09 PM, James Starks <suserft@protonmail.com.invalid
> wrote:

> I am very new to Spark. Just successfully setup Spark SQL connecting to
> postgresql database, and am able to display table with code
>
>     sparkSession.sql("SELECT id, url from table_a where col_b <> ''
> ").show()
>
> Now I want to perform filter and map function on col_b value. In plain
> scala it would be something like
>
>     Seq((1, "http://a.com/a"), (2, "http://b.com/b"), (3,
> "unknown")).filter { case (_, url) => isValid(url) }.map { case (id, url)
> => (id, pathOf(url)) }
>
> where filter will remove invalid url, and then map (id, url) to (id, path
> of url).
>
> However, when applying this concept to spark sql with code snippet
>
>     sparkSession.sql("...").filter(isValid($"url"))
>
> Compiler complains type mismatch because $"url" is ColumnName type. How
> can I extract column value i.e. http://... for the column url in order to
> perform filter function?
>
> Thanks
>
> Java 1.8.0
> Scala 2.11.8
> Spark 2.1.0
>
>
>
>
>
>

Mime
View raw message