subversion-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Igor Varfolomeev" <...@mail.ru>
Subject RE: Almost repetitive repository corruption
Date Tue, 31 Dec 2013 02:07:12 GMT
> -----Original Message-----
> From: Bert Huijben [mailto:bert@qqmail.nl]
> Sent: 30 December, 2013 02:58
> To: 'Igor Varfolomeev'; users@subversion.apache.org
> Subject: RE: Almost repetitive repository corruption
> 
> 
> > -----Original Message-----
> > From: Igor Varfolomeev [mailto:i3v@mail.ru]
> > Sent: zondag 29 december 2013 23:00
> > To: users@subversion.apache.org
> > Subject: Almost repetitive repository corruption
> >
> > Hi all,
> >
> > I’ve just ran into a weird bug which damaged my svn repository. I
> > still don’t understand what exactly was wrong, so, I don’t know how to
> > describe it in a clear and simple manner, sorry… I’ll just try to
> > describe all the symptoms I’ve experienced. I’ll use real file names,
> > since I wasn’t able to reproduce this bug on synthetic test repository.
> >
> > *SETUP*
> > Most simple single-user, single-PC setup. Local repository.
> > First svn version: “Subversion command-line client, version 1.8.5.”.
> > Windows 7 x64
> > Antivirus: Kaspersky Endpoint Security 10
> >
> > *THE STORY*
> > The story began, when I ran into some sort of error message, while
> > trying to commit r3349.
> > After a bit of struggling, I’ve realized, that my repository got
> > broken after previous commit (r3348). Nasty thing is that previous
> > commit finished without any error message.
> >
> > *SYMPTOMS*
> > **svn verify**
> > Output ends like this:
> > <….>
> > * Verified revision 3346.
> > * Verified revision 3347.
> > svnadmin: E160004:
> > Corrupt node-revision '4d-610.2-2392.r3348/35659066'
> > svnadmin: E160004: Found malformed header '' in revision file
> >
> > **svn checkout**
> > When I try to checkout a new working copy, I receive similar
> > message:
> > <…>
> > W:\testCO\Binar\Matlab\deploy
> > W:\testCO\Binar\Matlab\deploy\x64
> > W:\testCO\Binar\Matlab\deploy\x64\Binar_x64.prj
> > W:\testCO\Binar\Matlab\deploy\x64\Binar_x64
> > W:\testCO\Binar\Matlab\deploy\x64\Binar_x64\distrib
> > Corrupt node-revision '4d-610.2-2392.r3348/35659066'
> > Found malformed header '' in revision file
> >
> > **svn Repository Browser**
> > When I navigate to
> > file:///V:/R_Matlab/Binar/trunk/Binar/Matlab/deploy/x64/Binar_x64
> >  in tortoise svn repository browser, I see the same error message:
> >
> > Corrupt node-revision '4d-610.2-2392.r3348/35659066'
> > Found malformed header '' in revision file
> >
> > Here’s a screenshot: http://sdrv.ms/1fJVuwa
> >
> > *ZEROS IN DATA FILE*
> > Luckily, I have a full backup (r3337). I’ve manually repeated all my
> > commits up to r3347 and verified that at this state repository is OK.
> >
> > Next, I’ve tried to reproduce the bug:
> >
> > 1.	Firstly (“try1”), I’ve repeated same Matlab commit script
> >         (Matlab simply calls svn, just like from cmd). And… «success»
> >         - same bug again!
> >
> > 2.	Secondly (“try3”), I’ve managed to reproduce the bug using
> >         only windows cmd commands.
> >
> > 3.	Thirdly (“try4” and “try5(0)”), I wrote a bat-script to
> >         reproduce the same actions.
> >
> > I’ve compared
> > R_Matlab\db\revs\3\3348
> > file for different “tries”:  (initial bug is designated as “try0”) and
> > discovered a single interesting thing:
> > each “3348” file has a long sequence of zero-bytes:
> >
> > •	try0: 0x2201B0A to 0x2201FFF
> >
> > •	try1: 0x2201000 to 0x2201FFF
> >        o	try0_vs_try1_p1: http://sdrv.ms/Ju7nev
> >        o	try0_vs_try1_p2: http://sdrv.ms/Ju7tmu
> >        o	try0_vs_try1_p3: http://sdrv.ms/Ju7AOI
> >
> > •	try3: 0x2201B11 to 0x2201FFF
> >        o	try0_vs_try3_p1: http://sdrv.ms/Ju7G9g
> >        o	try0_vs_try3_p2: http://sdrv.ms/Ju7HKd
> >
> > •	try4: 0x2201000 to 0x2201FFF
> >        o	try0_vs_try4_p1: http://sdrv.ms/Ju7OFE
> >        o	try0_vs_try4_p2: http://sdrv.ms/Ju86MJ
> >        o	try0_vs_try4_p3: http://sdrv.ms/Ju89ID
> >
> > •	try5(0): 0x2201000 to 0x2201FFF (just like try4).
> >        o	try0_vs_try5(0)_p1: http://sdrv.ms/1daKwjG
> >        o	try0_vs_try5(0)_p2: http://sdrv.ms/1daKxUx
> >        o	try0_vs_try5(0)_p3: http://sdrv.ms/Ju8iM5
> >
> >
> > Moreover, try4 and try5 have only one single difference, two zero-
> > bytes, starting from 0x21F9FFE (in case of “try5(0)”):
> > http://sdrv.ms/19jmBdm
> >
> > *BUG DISAPPERED*
> > That’s all I have. 5 broken repositories. After that bug DISAPPEARED.
> > Just like a UFO :) . I’ve launched the SAME script, with the SAME
> > input data 10 more times (“try5(1)”,”try5(2)”…) – nothing – svn
> > correctly commits r3348, resulting repository is valid:
> 
> 	Hi,
> 
> Did you make sure you restored the db\rep-cache.db in every step. (This
> may make difference then you expected)
> 
> The fact that you copy a single file two times in one commit makes me expect
> that this is relevant information.
> 
> Are all the drives in your test scenario local harddisk or are some network
> drives involved?
> 
> 	Bert
==============================================================
[Igor Varfolomeev] 


> Are all the drives in your test scenario local harddisk or are some network
> drives involved?

Only local HDDs.

> Did you make sure you restored the db\rep-cache.db in every step. (This
> may make difference then you expected)

*COPY PROCESS*
Hm... I've simply copied all files from source (r3347) repository to a new folder
to create an "experimental repository" (independently for each "try").. 
Source repository was not accessed anyhow during copy process...
For "try1".."try4" I did it manually, during "try5(0)" there was an 
"xcopy" command built in bat-script:

http://sdrv.ms/JqqVRL

and xcopy finished as it should:

http://sdrv.ms/JqrlYf 

*SVNADMIN VERIFY*
Also, during "try4" and "try5(0)",before real job, "experimental repositories"
were verified with "svnadmin verify"(to be sure they are copied OK):
	
http://sdrv.ms/Jqrpav 

In case if "rep-cache.db" is damaged, would "svnadmin verify" detect it?

*REP-CACHE.DB DIFF*
I've just compared "db\rep-cache.db" for "try5(0)" and "try5(1)"
(i.e. last broken vs first valid) and they are equal...

*COPY LOG DIFF*

The only interesting thing I've mentioned, when comparing logs for
"try5(0)" http://sdrv.ms/JqrlYf  and
"try5(1)" http://sdrv.ms/1dlXwTv 

is that in first case "db\rev-prop-atomics.mutex" was also copied.
http://sdrv.ms/1dlWV4o 

Though, there's no such file either in source or in target dir now...
Its temporary, isn't it?

PS
Still, all files mentioned above are here: http://sdrv.ms/1jMN250 

Best regards,
Varfolomeev Igor


Mime
View raw message