flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chesnay Schepler <ches...@apache.org>
Subject Re: Retrieve written records of a sink after job
Date Wed, 14 Feb 2018 12:08:09 GMT
Technically yes, a subset of metrics is stored in the ExecutionGraph 
when the job finishes. (This is for example where the webUI derives the 
values from for finished jobs). However these are on the task level, and 
will not contain the number of incoming records if your sink is chained 
to another operator. Changing this would be a larger endeavor, and tbh i 
don't see this happening soon.

I'm afraid for now you're stuck with the REST API for finished jobs. 
(Correction for my previous mail: The metrics REST API cannot be used 
for finished jobs)

Alternatively, if you rather want to work on files/json you can enable 
job archiving by configuring the |jobmanager.archive.fs.dir| directory. 
When the job finishes this will contain a big JSON file for each job 
containing all responses that the UI would return for finished jobs.

On 14.02.2018 12:50, Flavio Pompermaier wrote:
> The problem here is that I don't know the vertex id of the sink..would 
> it be possible to access the sink info by id?
> And couldn't be all those info attached to the JobExecutionResult 
> (avoiding to set up all the rest connection etc)?
> On Wed, Feb 14, 2018 at 12:44 PM, Chesnay Schepler <chesnay@apache.org 
> <mailto:chesnay@apache.org>> wrote:
>     The only way to access this info from the client is the REST API
>     <https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#details-of-a-running-or-completed-job>
>     or the Metrics REST API
>     <https://ci.apache.org/projects/flink/flink-docs-master/monitoring/metrics.html#rest-api-integration>.
>     On 14.02.2018 12:38, Flavio Pompermaier wrote:
>>     Actually I'd like to get this number from my Java class in order
>>     to update some external dataset "catalog",
>>     so I'm asking if there's some programmatic way to access this
>>     info (from JobExecutionResult for example).
>>     On Wed, Feb 14, 2018 at 12:25 PM, Chesnay Schepler
>>     <chesnay@apache.org <mailto:chesnay@apache.org>> wrote:
>>         Do you want to know how many records the sink received, or
>>         how many the sink wrote to the DB?
>>         If it's the first you're in luck because we measure that
>>         already, check out the metrics documentation.
>>         If it's the latter, then this issue is essentially covered by
>>         FLINK-7286 which aims at allowing functions
>>         to modify the numRecordsIn/numRecordsOut counts.
>>         On 14.02.2018 12:22, Flavio Pompermaier wrote:
>>>         Hi to all,
>>>         I have a (batch) job that writes to 1 or more sinks.
>>>         Is there a way to retrieve, once the job has terminated, the
>>>         number of records written to each sink?
>>>         Is there any better way than than using an accumulator for
>>>         each sink?
>>>         If that is the only way to do that, the Sink API could be
>>>         enriched in order to automatically create an accumulator
>>>         when required. E.g.
>>>         dataset.output(JDBCOutputFormat.buildJDBCOutputFormat()
>>>         .setDrivername(...)
>>>                     .setDBUrl(...)
>>>                     .setQuery(...)
>>>         *.addRecordsCountAccumulator("some-name")*
>>>                     .finish())
>>>         Best,
>>>         Flavio
>>     -- 
>>     Flavio Pompermaier
>>     Development Department
>>     OKKAM S.r.l.
>>     Tel. +(39) 0461 041809 <tel:+39%200461%20041809>
> -- 
> Flavio Pompermaier
> Development Department
> OKKAM S.r.l.
> Tel. +(39) 0461 041809

View raw message