spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei Chen <weic...@apache.org>
Subject Re: Set TimeOut and continue with other tasks
Date Wed, 10 Jul 2019 09:55:46 GMT
I am currently trying to use Future Await to set a timeout inside the
map-reduce.
However, the tasks now fail instead of stuck, even if I have a Try Match to
catch it.
Doesn't anyone have an idea why?

The code is like

```Scala
files.map { file =>
  Try {
    def tmpFunc(): Boolean = { FILE CONVERTION ON HDFS }
    val tmpFuture = Future[Boolean] { tmpFunc() }
    Await.result(tmpFuture, 600 seconds)
  } match {
    case Failure(e) => "F"
    case Success(r) => "S"
  }
}
```

The converter is created in a lazy function in a broadcast object,
which shouldn't be a problem.

Best Regards
Wei


On Wed, Jul 10, 2019 at 3:16 PM Gourav Sengupta <gourav.sengupta@gmail.com>
wrote:

> Is there a way you can identify those patterns in a file or in its name
> and then just tackle them in separate jobs? I use the function
> input_file_name() to find the name of input file of each record and then
> filter out certain files.
>
> Regards,
> Gourav
>
> On Wed, Jul 10, 2019 at 6:47 AM Wei Chen <weichen@apache.org> wrote:
>
>> Hello All,
>>
>> I am using spark to process some files parallelly.
>> While most files are able to be processed within 3 seconds,
>> it is possible that we stuck on 1 or 2 files as they will never finish
>> (or will take more than 48 hours).
>> Since it is a 3rd party file conversion tool, we are not able to debug
>> why the converter stuck at the time.
>>
>> Is it possible that we set a timeout for our process, throw exceptions
>> for those tasks,
>> while still continue with other successful tasks?
>>
>> Best Regards
>> Wei
>>
>

Mime
View raw message