spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Le Bihan <>
Subject My I report a special comparaison of executions leading on issues on Spark JIRA ?
Date Thu, 01 Oct 2020 08:27:01 GMT

I currently run a Spark project based on cities, local authorities,
enterprises, local communities, etc.
Ten Datasets written in Java are doing operations going from simple join to
elaborate ones.
Language used is Java. 20 integrations tests with the whole data (20 GB)
takes seven hour.

*All work perfectly under Spark 2.4.6 - Scala 2.12 - Java 11 or 8*. 
I remember it was working well on Spark 2.4.5 too, 
but had many troubles in the past with Spark 2.4.3 (if I remember well from
L4Z algorithms often).

I attempted to run my integration tests on Spark 3.0.1. Many of them has
failed, with strange messages. 
Something about lambda or about Map that where no more taken into account
when in a Java Dataset, object or schema ?

I then gone back, but to Spark 2.4.7. To make a try. And Spark 2.4.7. also
encounters troubles that 2.4.6. didn't have.

My question :

May I create an issue on JIRA based on the comparison of the executions of
my project with different versions of Spark, reporting error messages
received, call stacks and showing the lines around the one that encountered
a problem if available, 
even if I can't provide you test cases for each trouble ? 
Would this be able to give you hints about things that are going wrong ?

I could then have a try with some development version if needed (when asked
for) to see if my project returns to stability.

Sent from:

To unsubscribe e-mail:

View raw message