Fix corrupted txt file (NULL)
-
I have an interesting variation on a theme to report. Yesterday I worked most of the day on a set of text files; some xls files, and asp.net files loaded into visual studio. Most work was done by 9 pm; I had saved those files many times during the day’s work. At about 11:30 pm the pc became increasingly unresponsive and eventually I had to reset it. The pc booted normally. I left it on all night so that the daily Acronis backup, which seemed to have stalled in the evening (and may have been the source of the system failure) could do it’s thing.
This AM I found that all of those files (.net, .xls, .txt in notepad++) were corrupted, with the text and .net files showing as NULL everywhere. I don’t have Backup on Save turned on. Session snapshot and periodic backup are on, probably default settings as I’ve never known about np++ backup features before today, but there are no recent files in that dir. I have various types of backups to fall back on.
But the main reason I’m posting this is that the NULL blowout happened to other files that were not in use by notepad++. I personally don’t recall ever having file data losses like this before, ever; but it doesn’t seem to be true that it’s an issue with notepad++, or at least only notepad++.
-
you are right, this issue can happen with any software if the requirements are fulfilled.
Basically it is this, a thread has opened the file and then something happens which forces
the operating system to kill the process. In this situation a thread might not be able to close
the file correctly which then results in a corrupted file.I’m able to reproduce this with a 32bit npp version and file around ~100MB where backup is enabled and
the system is under load. If, for example, I would convert everything to uppercase,
under this condition, ~1 out of 10 times npp freezes, dies and backup file only is corrupted.
But if you restart npp, then it looks like the real file is corrupted, just because npp loaded the backup file as it wasn’t saved correctly at this time.
Unfortunately I wasn’t able to reproduce this behavior running npp under the debugger and
haven’t found out yet whether the cpu, memory or io load or a combination of those is needed
to reproduce it. This happens, for me, only on a 32bit version of npp with activated backup,
on a 64bit version I was not able to reproduce this behavior, even with larger files. Tested with 1GB.Cheers
Claudia -
Hi @Claudia-Frank ,
Very good analysis indeed.
Can I ask you to test your failing scenario with a test build of mine?
It is the official v7.5.8 but with a fix I did on the locking mechanics of the file during backup. Here is the 32-bit binary:
https://ci.appveyor.com/api/buildjobs/bjaisvrte85g4o0a/artifacts/Notepad%2B%2B.Win32.Unicode Release.exeThank you for trying this and writing back, I appreciate it.
BR
-
Hi @pnedev
Thx for the build, I will do the test later today - need to go to work soon.
Cheers
Claudia -
@Claudia-Frank thanks for that clear explanation.
-
I found this file from this issue thread and it seems that this file can be used to confirm that the backup is the culprit.
Unfortunattely your build is crashing also when doing the trim trailing spaces action.
I’m currently running some tests like doing it 100 times in a row and 100 times after restart npp … to see if this is really the case.The test file is a bit unusual as I described in the issue thread and needs to be investigated as well
but at least it looks like there is a reliable way to crash npp and therefore a chance to find out what it is causing it.
Whether or how NULL backup files are involved can’t be said at the moment.Cheers
Claudia -
Hi @Claudia-Frank ,
Thanks for trying and for pointing to that file.
I can do some tests on my own now.BR
-
OK, I think I got what the problem is.
This is the cause for the problem in this issue as well.Backup is running in a separate thread as can be seen in Notepad_plus.cpp,
void Notepad_plus::launchDocumentBackupTask()
.
What it does is periodically executing Buffer.cpp,bool FileManager::backupCurrentBuffer()
.Now imagine that we launch a long-running operation in Notepad++ from the user thread.
While it is running and doing its processing on the Scintilla buffer the periodic backup thread kicks.
Thebool FileManager::backupCurrentBuffer()
is called which GETS THE RAW SCINTILLA BUFFER POINTER here:FILE *fp = UnicodeConvertor.fopen(fullpath, TEXT("wb"));
if (fp)
{
int lengthDoc = _pNotepadPlus->_pEditView->getCurrentDocLen();
char* buf = (char*)_pNotepadPlus->_pEditView->execute(SCI_GETCHARACTERPOINTER); //to get characters directly from Scintilla buffer
Now backup can be written at once or in portions depending if
WcharMbcsConvertor *wmc = WcharMbcsConvertor::getInstance()
conversions are needed BUT anyway:WHILE BACKUP FILE IS BEING WRITTEN FROM THE RAW SCINTILLA BUFFER,
THE SCINTILLA BUFFER ITSELF IS BEING UPDATED BY THE USER LONG RUNNING OPERATION THREAD (THIS MIGHT EVEN INCLUDE BUFFER RELOCATION IF MORE MEMORY ALLOCATION IS NEEDED).I’ll have to figure out a proper way to fix that now.
-
One update:
The
bool FileManager::backupCurrentBuffer()
properly locksLongRunningOperation
so the concurrent Scintilla buffer access shouldn’t happen during backup but it seems it still does. Which means theLongRunningOperation
lock is not done properly for Trim trailing spaces and Find and Replace in Files and probably in other places. -
Hi again @Claudia-Frank ,
Could you please try again with this binary:
https://ci.appveyor.com/api/buildjobs/3tfm2eeubr9hg9y9/artifacts/Notepad%2B%2B.Win32.Unicode Release.exe ?It has fix for Trim trailing spaces and I cannot reproduce the issue on my side anymore.
Sorry to bother you again, I appreciate your help.BR
-
Hi @pnedev
Sorry to bother you again, I appreciate your help.
No problem, if I can be of any help I will do so but same as yesterday, I will do the test later today.
I’m on late shift this week.Cheers
Claudia -
Hi again @Claudia-Frank ,
I have made official pull request addressing the issue:
https://github.com/notepad-plus-plus/notepad-plus-plus/pull/4803 .The corresponding x86 binary is:
https://ci.appveyor.com/api/buildjobs/p3t4vsd8lj202wjm/artifacts/Notepad%2B%2B.Win32.Unicode Release.exe .This needs to be thoroughly tested though as it changes the way backup is performed.
I hope it fixes the file corruption problems observed so far but only time will tell.Thank you.
BR
-
Hi @pnedev
I’m currently thinking of the using the test file and add additional data to fulfill
the requirements for all modification actions like find/replace, trim, convert case …
basically what can be done using edit/search/view menu (encoding menu I’m unsure atm??)Then I run each action to see that if it crashes npp with the original npp binary and your build.
Afterwards I’m doing long running test which means each action do get executed 100 times
in a row and a following test of 1000 random actions in a row.I will be using python script plugin as the only additional plugin and the test would be something like
while counter < 100 notepad.menuCommand(ACTION_ID) editor.undo()
and modified for having 1000 iterations with random ACTION_ID
I know, having an additional plugin loaded might have impact on the test itself but
if the result is success then it should be ok. If not I do it manually with minimum npp setup.What do you think?
@all - any additional ideas?Cheers
Claudia -
A test to see if the backup file gets created correctly is needed too.
Cheers
Claudia -
I can confirm that your build didn’t crash when running the trim_trailing_space test.
1st results - 100 successful runsRun | compare document and backupfile hashes | runtime in sec | undo deleted backup file ======================================================================================== 1 | ok | 31.40 | ok 2 | ok | 31.38 | ok 3 | ok | 31.60 | ok 4 | ok | 31.78 | ok 5 | ok | 32.28 | ok 6 | ok | 32.25 | ok 7 | ok | 33.00 | ok 8 | ok | 32.94 | ok 9 | ok | 31.59 | ok 10 | ok | 31.62 | ok 11 | ok | 31.56 | ok 12 | ok | 31.67 | ok 13 | ok | 31.66 | ok 14 | ok | 31.41 | ok 15 | ok | 31.32 | ok 16 | ok | 31.46 | ok 17 | ok | 31.68 | ok 18 | ok | 32.92 | ok 19 | ok | 33.97 | ok 20 | ok | 33.94 | ok 21 | ok | 32.90 | ok 22 | ok | 33.56 | ok 23 | ok | 34.08 | ok 24 | ok | 34.69 | ok 25 | ok | 38.68 | ok 26 | ok | 38.63 | ok 27 | ok | 38.36 | ok 28 | ok | 38.22 | ok 29 | ok | 39.75 | ok 30 | ok | 39.57 | ok 31 | ok | 39.72 | ok 32 | ok | 39.29 | ok 33 | ok | 37.99 | ok 34 | ok | 38.21 | ok 35 | ok | 39.12 | ok 36 | ok | 39.88 | ok 37 | ok | 39.55 | ok 38 | ok | 38.28 | ok 39 | ok | 39.21 | ok 40 | ok | 39.55 | ok 41 | ok | 38.15 | ok 42 | ok | 38.46 | ok 43 | ok | 38.50 | ok 44 | ok | 39.24 | ok 45 | ok | 39.05 | ok 46 | ok | 39.64 | ok 47 | ok | 39.05 | ok 48 | ok | 39.55 | ok 49 | ok | 38.99 | ok 50 | ok | 38.98 | ok 51 | ok | 37.86 | ok 52 | ok | 39.44 | ok 53 | ok | 39.90 | ok 54 | ok | 39.64 | ok 55 | ok | 40.21 | ok 56 | ok | 39.84 | ok 57 | ok | 40.28 | ok 58 | ok | 40.60 | ok 59 | ok | 39.33 | ok 60 | ok | 35.01 | ok 61 | ok | 40.56 | ok 62 | ok | 39.85 | ok 63 | ok | 39.12 | ok 64 | ok | 40.14 | ok 65 | ok | 39.19 | ok 66 | ok | 39.41 | ok 67 | ok | 39.36 | ok 68 | ok | 38.37 | ok 69 | ok | 38.43 | ok 70 | ok | 39.00 | ok 71 | ok | 38.87 | ok 72 | ok | 38.43 | ok 73 | ok | 39.46 | ok 74 | ok | 38.17 | ok 75 | ok | 39.24 | ok 76 | ok | 39.03 | ok 77 | ok | 39.01 | ok 78 | ok | 39.22 | ok 79 | ok | 36.97 | ok 80 | ok | 37.75 | ok 81 | ok | 39.70 | ok 82 | ok | 39.39 | ok 83 | ok | 39.58 | ok 84 | ok | 39.24 | ok 85 | ok | 39.40 | ok 86 | ok | 37.53 | ok 87 | ok | 32.78 | ok 88 | ok | 35.11 | ok 89 | ok | 31.46 | ok 90 | ok | 34.63 | ok 91 | ok | 34.72 | ok 92 | ok | 35.40 | ok 93 | ok | 34.99 | ok 94 | ok | 35.08 | ok 95 | ok | 34.97 | ok 96 | ok | 34.82 | ok 97 | ok | 34.84 | ok 98 | ok | 34.84 | ok 99 | ok | 34.80 | ok 100 | ok | 34.75 | ok
More tests will follow.
Cheers
Claudia -
Hi Claudia,
Your test approach seems very good and should catch most functional issues pretty well.
I don’t think covering encoding menu is necessary. The only potential thing missing is macro recording/execution but that cannot be automated.
I suppose you are testing with the pull request binary (https://notepad-plus-plus.org/community/topic/13302/fix-corrupted-txt-file-null/53), right? It should fix all potential backup concurrency issues so far.My main concern actually is the “power loss -> data loss” issue but that is impossible to reproduce and is beyond our control.
Thank you very much for testing the changes.
BR
-
I suppose you are testing with the pull request binary
correct, and the x64 binary from the artifacts from here.
Concerning your concerns :-), for sure we can’t cover every kind of situation but
I’m thinking ofPowerloss - VirtualBox/Qemu shutdown
Dataloss - kill the process when accessing real and backup file.
Macros - missed that totally - maybe some kind of sendkeys solution (!?) - need to think of.One, side question, if allowed - testing 64bit npp seems to run ~30% faster than 32bit.
Any idea why this might be? I mean, I expected it to be faster but ~30% ??
As far as I can tell I didn’t hit any limit when testing the 32bit npp.And most important - THANK YOU for that fix - even haven’t tested all functions yet,
it looks very promising.Cheers
Claudia -
Hi @Claudia-Frank ,
Powerloss - VirtualBox/Qemu shutdown
Dataloss - kill the process when accessing real and backup file.That’s worth trying. I don’t know about the macros, I have never used them actually.
Any idea why this might be?
I suppose it might be connected to the memory buffers allocations in Scintilla and in general. Perhaps the pre-allocated memory chunk size is bigger in 64 bit which leads to less buffer re-allocations and data moves making things a lot more optimal.
Thanks,
BR -
hello guys . i got a solution for this issue.
in fact i got the same problem this morning and the current fille on wich i was working get corrupted and content a serie of NULL character .i use recuva as mention . by other to solve that issue .
this is what i did:- don’t update the corrupted file . i you have updated it , don’t save it .
- run recuva . and in the wizar . access the particular place where your file is located .
- for the firs time , no need to activate deep scan .
- at the main page of search in recuva . find advance mode / option
- click option --> action
- check scan for non deleted file.
- click ok
- start the scan .
- after the scan. check for your file in the list . and most probably you will find it .
- check the box to select the file --> click on restore
- restore in a different location .
thank you . hope it help you to solve your problem.
-
Hi guys
I randomly found the best and easy solution. Just run the corrupted file to chrome, instad of null you will see the right code, then just the classic ctrl+u and you have again the code (if is html/css etc) eg for php cannot work for obvious reasons…