Because of some legacy issues I can't immediately upgrade spark version. But I try filter data before loading it into spark based on the suggestion by

     val df ="jdbc").option(...).option("dbtable", "(select .. from ... where url <> '') table_name")....load()

Then perform custom operation do the trick.

    sparkSession.sql("select id, url from new_table").as[(String, String)].map { case (id, url) =>
       val derived_data = ... // operation on url
       (id, derived_data)

Thanks for the advice, it's really helpful!

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On August 7, 2018 5:33 PM, Gourav Sengupta <> wrote:

Hi James,

It is always advisable to use the latest SPARK version. That said, can you please giving a try to dataframes and udf if possible. I think, that would be a much scalable way to address the issue. 

Also in case possible, it is always advisable to use the filter option before fetching the data to Spark.

Thanks and Regards,

On Tue, Aug 7, 2018 at 4:09 PM, James Starks <> wrote:
I am very new to Spark. Just successfully setup Spark SQL connecting to postgresql database, and am able to display table with code

    sparkSession.sql("SELECT id, url from table_a where col_b <> '' ").show()

Now I want to perform filter and map function on col_b value. In plain scala it would be something like

    Seq((1, ""), (2, ""), (3, "unknown")).filter { case (_, url) => isValid(url) }.map { case (id, url)  => (id, pathOf(url)) }

where filter will remove invalid url, and then map (id, url) to (id, path of url).

However, when applying this concept to spark sql with code snippet 


Compiler complains type mismatch because $"url" is ColumnName type. How can I extract column value i.e. http://... for the column url in order to perform filter function?


Java 1.8.0
Scala 2.11.8
Spark 2.1.0