spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bolke de Bruin <bdbr...@gmail.com>
Subject Re: [Events] Events not fired for SaveAsTextFile (?)
Date Mon, 15 Oct 2018 17:28:16 GMT
Hi Fokko

Spark fires it off for many other things. It does so for ML pipelines and
it does make information available for data frames.

We use S3 in this case I just simplified the example. It is important to
know what process took what action. Only spark knows this and it does
supply this information at other occasions.

So I don't think your comment makes sense?

Cheers
Bolke

Op ma 15 okt. 2018 19:05 schreef Driesprong, Fokko <fokko@driesprong.frl>:

> Hi Bolke,
>
> I would argue that Spark is not the right level of abstraction of doing
> this. I would create a wrapper around the particular filesystem:
> http://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileSystem.html
> Therefore you can write a wrapper around the LocalFileSystem if data will
> be written to local disk, DistributedFileSystem when written to HDFS, and
> also many object stores implements this interface. My 2¢
>
> Cheers, Fokko
>
> Op ma 15 okt. 2018 om 18:58 schreef Bolke de Bruin <bdbruin@gmail.com>:
>
>> Hi,
>>
>> Apologies upfront if this should have gone to user@ but it seems a
>> developer question so here goes.
>>
>> We are trying to improve a listener to track lineage across our platform.
>> This requires tracking where data comes from and where it goes to. E.g.
>>
>> sc.setLogLevel("INFO");
>> val data = sc.textFile("hdfs://migration/staffingsec/Mydata.gz")
>> data.saveAsTextFile ("hdfs://datalab/user/xxx”);
>>
>> In this case we would like to know that Spark picked up “Mydata.gz” and
>> wrote it to “xxx”. Of course more complex examples are possible.
>>
>> In the particular case of the above Spark (2.3.2) does not seem trigger
>> any events, or at least not that we know of that give us the relevant
>> information.
>>
>> Is that a correct assessment? What can we do to get that information
>> without knowing the code upfront? Should we provide a patch?
>>
>> Thanks
>> Bolke
>>
>> Verstuurd vanaf mijn iPad
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>

Mime
View raw message