mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mahler <bmah...@apache.org>
Subject Re: Review Request 69082: Correctly propagated `close` failures in some instances.
Date Fri, 18 Jan 2019 17:49:50 GMT


> On Jan. 15, 2019, 6:22 p.m., Benjamin Mahler wrote:
> > Should we be surfacing a close EINTR as an error or let that be silent?
> > I think these errors need some message pre-fixing? E.g.
> > 
> > ```
> > Failed to close '3': Bad file number
> > ```
> > 
> > As it stands the error messages will only show "Bad file number" and it won't be
clear if the write or close produced this? It's also an issue here between open/write/flush
but would be great to resolve this now.
> 
> Benjamin Bannier wrote:
>     If I read the POSIX spec for `close` correctly, if `close` fails with `EINTR` the
passed file descriptor is left in an unspecified state so it e.g., would not be safe to assume
that pending data was successfully flushed. Am I missing something?

There's quite a rabbit hole of reading around this, but with respect to the POSIX close spec:

> If close() is interrupted by a signal that is to be caught, it shall
> return -1 with errno set to [EINTR] and the state of fildes is unspecified.
> If an I/O error occurred while reading from or writing to the file system
> during close(), it may return -1 with errno set to [EIO]; if this error is
> returned, the state of fildes is unspecified.

My interpretation so far from reading more about this is that the "state of fildes is unspecified"
is *only* referring to whether it's closed or open. Howerver, Linus says:

> The error return just tells you that soem error happened on the file: for
> example, in the case of EINTR, the close() may not have flushed all the
> pending data synchronously.

https://lkml.org/lkml/2005/9/10/129

But, even if close returns successfully we cannot assume pending data was flushed (although
some of our code in question here uses an explicit flush before closing as suggested), from
linux's close man page:

> A successful close does not guarantee that the data has been
> successfully saved to disk, as the kernel uses the buffer cache to
> defer writes.  Typically, filesystems do not flush buffers when a
> file is closed.  If you need to be sure that the data is physically
> stored on the underlying disk, use fsync(2).  (It will depend on the
> disk hardware at this point.)

Based on this, EINTR (which is more like EINPROGRESS in terms of semantics, see https://ewontfix.com/4/)
seems to provide the same guarantee as return code of 0:

* `0`: filedes is closed, previous `write()`s are in kernel buffer cache but data might not
make it to disk
* `EINTR`: filedes is closed, close call was interrupted by a signal, previous `write()`s
are in kernel buffer cache but data might not make it to disk

In the case of an `fsync()` before close, it seems to mean:

* `0`: filedes is closed, nothing in kernel buffer cache, data is already fsynced
* `EINTR`: filedes is closed, nothing in kernel buffer cache, data is already fsynced, close
call was interrupted by a signal (not sure if EINTR is possible with an `fsync()` beforehand..)

Looking through the linux source, it calls through these:
https://github.com/torvalds/linux/blob/d7393226d15add056285c8fc86723d54d7e0c77d/fs/open.c#L1152-L1169
https://github.com/torvalds/linux/blob/903b77c631673eeec9e9114e9524171cdf9a2646/fs/file.c#L617-L641
https://github.com/torvalds/linux/blob/ad1d69735878a6bf797705b5d2a20316d35e1113/fs/open.c#L1126-L1150
and the interruptible part appears to be the flush here:
https://github.com/torvalds/linux/blob/ad1d69735878a6bf797705b5d2a20316d35e1113/fs/open.c#L1140

Trying to figure out what this flush is, I see this: https://lwn.net/Articles/576478/

> In fact, it is difficult to even return EINTR from close() on Linux, according to Christoph
Hellwig. If the driver or filesystem's release() method returns an error, it is explicitly
ignored. The only path that would allow a driver to return EINTR is if it provides a flush()
method that does so. Hellwig plans to post a patch that would enforce a no-EINTR policy on
that path as well.

> If EINTR can never be returned, there is no real reason to map it to EINPROGRESS in glibc.
But, since glibc may be used on an older kernel that can return EINTR in some rare situations,
mapping it to something probably makes sense. That could be EINPROGRESS or, perhaps better
still, just zero for success, as suggested by Rich Felker. There really isn't much the application
programmer can do if close() returns an error.

I'm not sure under which circumstances flushing is occurring on close (not sure which drivers
provide the flush method), but this makes it sound rare / driver dependent, and it sounds
like the plan would make EINTR impossible.

So, I guess surfacing EINTR up to the caller sounds like the simplest thing to do, and we'll
have to see whether this actually occurs in practice.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69082/#review212026
-----------------------------------------------------------


On Jan. 16, 2019, 1:06 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69082/
> -----------------------------------------------------------
> 
> (Updated Jan. 16, 2019, 1:06 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Chun-Hung Hsiao, and Alex Clemmer.
> 
> 
> Bugs: MESOS-9331
>     https://issues.apache.org/jira/browse/MESOS-9331
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> When e.g., writing to disk, errors from write might only manifest at
> ::close time. We update some instances to correctly propagate these
> errors instead of dropping them silently. We only propagate the
> ::close error if the write operation succeeded, otherwise we just
> propagate the error from the write operation.
> 
> 
> Diffs
> -----
> 
>   3rdparty/stout/include/stout/os/posix/mktemp.hpp 63b3d1a7720d07f877fa1d4eb7f32a548916637a

>   3rdparty/stout/include/stout/os/write.hpp f7538f94f5a953a7a90a05bc1d2f138b6c17f814

>   3rdparty/stout/include/stout/protobuf.hpp eb4adef56f1701e3c101284e05e4e6c66eef9180

> 
> 
> Diff: https://reviews.apache.org/r/69082/diff/3/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message