subversion-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Fuhrmann <stefan.fuhrm...@wandisco.com>
Subject Making EXPANDED_SIZE reliable in FSFS [was: svn commit: r1673204 - /subversion/trunk/subversion/libsvn_fs_fs/dag.c]
Date Mon, 13 Apr 2015 23:04:51 GMT
On Mon, Apr 13, 2015 at 11:20 PM, Daniel Shahaf <d.s@daniel.shahaf.name>
wrote:

> rhuijben@apache.org wrote on Mon, Apr 13, 2015 at 14:28:39 -0000:
> > -      *has_props = (noderev->prop_rep->expanded_size > 4);
> > +      *has_props = (noderev->prop_rep->expanded_size > 4
> > +                    || (noderev->prop_rep->expanded_size == 0
> > +                        && noderev->prop_rep->size > 4));
>
> Having to remember, on every use of EXPANDED_SIZE, to check if it's zero
> is madness.


The pain has been increasing and is now at a level
where we should do something about it.


> Is there a good reason not to make the deserializer
> function handle this?  If needed, by inventing a getter or a new struct
> member that doesn't have this quirk?
>

The main problem is that we don't know whether SIZE is
correct, either, without looking at the actual representation:
It could be a self-delta of an empty fulltext. So, most of
these tests are contextual, we e.g. know that these are
property reps. Or we can allow for false positives or false
negatives etc.

One way to get around this problem is to basically use
the approach of svn_fs_fs__file_length(), where we compare
the contents checksum with with that of an empty rep.
We might also restrict that check to "meaningful" values
of "SIZE", i.e. those txdelta streams that produce an
empty output.

Assuming we find a truly reliable test (e.g. say txdelta is
always 4 bytes and we can enumerate all MD5s), we could
add it to the parser function in low_level.c.

Thoughts?

-- Stefan^2.

Mime
View raw message