Fix corrupted txt file (NULL)



  • @Ben said in Fix corrupted txt file (NULL):

    To properly use NotePad++ I now understand that one should have a backup solution that constantly (every second) writes every open file to a rollback history and keep many histories of every single file, all day long, every day. This makes absolutely no sense.

    Sadly, this is somewhat close to my backup strategy.
    :-(



  • @Ben ,

    I understood your point just fine.

    My backup runs every 15 minutes, because that’s what my I.T. department decided, and it’s a reasonable compromise between “I lost 5 hours of work” (which is unacceptable) and “backup gigabytes every second” (which is hyperbolic nonsense). Losing 0-15min of work in the unlikely event of a power outage seems an acceptable compromise.

    Any modern incremental backup solution looks at the file, and if it has changed, backs up just the changed file. If you have gigabytes of critical data changing every second on your specific machine, then your I.T. department has not picked a reasonable backup solution. Personally, I would feel that a once-a-day backup of my most critical files is not sufficient. Maybe you should ask for once-a-day on standard files, and once-an-hour or once-a-15min on the more important files.

    If you use version control software (like Git or Subversion), then the bandwidth is significantly reduced, because commits only transfer a description of which bytes have changed in the files since the last commit (I use svn terminology and mental processes; sorry git users with different terms).

    just in case NotePad++ F*** up and decides to overwrite one of my files with NUL characters

    No, you have misunderstood the technical issue. WINDOWS messed up, not Notepad++. Notepad++ saved the file; WINDOWS decided it knew better and would cache to memory, instead of to disk. Then while WINDOWS was writing the file, as far as I understand it WINDOWS first buffered the file with the right number of NUL characters to match the bytes in your file, but before WINDOWS could overwrite those NULs with the actual data, WINDOWS crashed.

    @pnedev’s solution uses Win32 API calls which highly encourage Windows to perform the write to disk more often, to make sure that it doesn’t accidentally leave the disk with the NUL bytes.

    couldn’t NotePad++ simply discard such an overwrite operation?

    I don’t believe so; see my description above.

    Reminder

    As a reminder to you: this forum consists of fellow users of Notepad++, who want to help you have the best Notepad++ experience you can , as long as it is within our power. None of the regulars who have replied to you have a magic wand which can fix any problem in the Notepad++ codebase; very few have ever submitted code to the codebase, and none of the regulars can automatically approve a pull request to become part of the codebase. The regulars in the forum have been trying to get the NUL file problem fixed for literally years. There have been many attempted fixes over the years, some of which improved things (but didn’t solve every edge case), so it is definitely better than it was.

    Everyone involved in this thread wants the problem fixed. Yelling at us, getting mad at, Q*bert Cur$!ng us won’t help anyone, and likely won’t even help you fell better.

    Personally, in the more-than-10 years since I started using Notepad++ (I think in about 2007 around version 4.0), and definitely in the last 4+ years since I joined here, I’ve had roughly 0 bytes of data lost because of a Notepad++ bug. In that same timeframe, I’ve had dozens of multi-hour changes to huge Microsoft Excel spreadsheets and Microsoft Word documents lost because of Microsoft-induced bugs; since we switched to incremental backup every 15 minutes, I haven’t lost more than 15min of changes in those same Microsoft products – not because of bug fixes made by Microsoft, but because of using reasonable backup software with reasonable settings.

    It doesn’t matter whether you are using mom-and-pop freely downloaded software with one main unpaid developer and a handful of volunteer contributors, or a multi-billion dollar company with huge paid teams devoted to supporting and improving each product – one small or hard-to-fix bug anywhere in the chain from user to application to api to os to hardware can cause you problems, and it’s up to you to make sure that you don’t lose more critical data than you are willing to re-enter. This is the best advice that anyone can give you when using critical data with any software platform.



  • @PeterJones Thanks for explaining further what the current technical state of NP++ is. This is very interesting… and extremely concerning at the same time. I’m now reading your post while waiting for yesterday’s Recuva scan to complete (only 25 minutes left!!) BTW I did not know this forum was not read by the developers, so I’ll move on to GitHub. Perhaps it will get more meaningful attention there. If this doesn’t get fixed this summer, this will be the last time I ever use this software. I’ve never, ever had this problem (over the past 25 years using a computer every day) with any text editor except NotePad++.

    Losing 0-15min of work in the unlikely event of a power outage seems an acceptable compromise.

    IMO, it depends, because:

    1. Home users often do not have the luxury of using corporate backup tools like this (and really shouldn’t HAVE to for using a text editor anyway. I mean, have you ever got a MS notepad.exe file corrupted? Me neither. And it’s more than 25 years old technology.) My backup runs every midnight and I mean, even commercial grade cloud backup solutions don’t necessarily pick up the same files over and over every 15 minutes. I do use SVN too, but you still have to manually go and COMMIT the changed file for it to store in the base. When you work, you cannot interrupt every 15 minutes to go commit on SVN. This is absurd and counter-productive.
    2. As short a work shift as 15 minutes may seem, when editing a non-linear document (source code, in my case) you really often edit segments hundreds if not thousands lines apart from each other. In just 15 minutes, one could very easily do small changes on hundreds of different lines scattered throughout the document. This is the case I’m facing right now and that’s why I’m still trying to restore a more recent version. I might introduce errors and/or forget about small code changes in various locations throughout the document when re-integrating all the changes I had done before NP++ decided to overwrite the entire document with NUL characters yesterday.

    but before WINDOWS could overwrite those NULs with the actual data, WINDOWS crashed.

    See, this is the problem with NotePad++. What it should do is ask Windows to create a new/separate file, then fill it with the RAM buffer’s contents, then verify it using a checksum comparison, then if all is good and the file is not filled with NUL characters, then ask Windows to replace the old file with the temp file. Can you imagine how much frustration and lost work would have been avoided if developers were on par with any other editor out there? And I mean for over the past quarter of a century! What other software does that? (directly overwrites a file) Take any other editor and watch how it saves. It first produces a temp file, fill it then replace the file handle in the OS. WHY ON EARTH is NotePad++ not doing this is beyond me.

    which highly encourage Windows to perform the write to disk more often

    I don’t see why this would even be attempted. It’s like trying to patch a 6 inch wide leak with 1 inch band-aids.

    As a reminder to you: this forum consists of fellow users of Notepad++

    I’m sorry, I didn’t know that. Thanks for trying to help, but there is nothing more to discuss on here. Also my Recuva deep scan is done. Unfortunately, due to security cameras and a bunch of other software running on that computer, I’m unlucky enough that at least a sector containing the file NP++ destroyed got overwritten. It’s over. Now praying I’m gonna emulate yesterday’s 5 hours of work exactly without forgetting a single change, scattered throughout the document. At least I only lost 5 hours of work, thanks to nightly backups. Poor individuals using NP++ for their personal notes and such, without cloud backup who lost everything for no good reason. This extremely serious issue needs to be fixed ASAP for them. Not for the corporate environments that use 15 minutes incremental backups. That’s not where the real issue is. The issue is for the people at home for the poor people without 15 minutes iterative backup solutions.

    Thanks again for trying to help. I’m not mad at any of you, btw, never been either. Not sure why this was involved in your reminder.

    I’ve had dozens of multi-hour changes to huge Microsoft Excel spreadsheets and Microsoft Word documents lost because of Microsoft-induced bugs

    This has never, ever happened to me over the last 25 years. Not even once. And if it ever happens one day, I’ll just restore the tmp file it created when I hit save.

    Hope my post serves a purpose and I’ll be onto GitHub BATTLING to get NotePad++ on par with modern editors for a couple months. Then if it’s not fixed, I’ll just move on with my life and never, ever use this software again, reminding as many people as possible that they are at big risk using it.



  • Because I wanted to understand how different editors have implemented the backup functionality,
    I did a test with Npp, Atom, SublimeText and VSCode and recorded them with ProcMon.
    The test always followed this pattern

    Step 1: Create the test file with content
    Step 2: Change the content in the editor but do not save the changes
    Step 3: Make further modifications and then save the changes

    Atom and SublimeText proceed differently here.
    Atom seems to realize a backup via a database entry (?),
    SublimeText does this via a json file. To what extent this is also the case for larger files, I have not tested.

    Npp and VSCode create a backup file.

    When it comes to the point of data persistence, then Npp, VSCode and SublimeText going the same way.
    Update the backup file to the current state,
    update the test file and
    delete backup file.

    Atom does an intermediate step of creating a temporary file here.

    But what is still noticeable is that VSCode and SublimeText calling after each WriteFile,
    a FlushBuffersFile, which in turn triggers a WriteFile.
    I am inclined to say that this might be the key to solving some (all?) reported problems.

    If we take a quick look at the last step.

    • Update the backup file to the current state
    • Update the test file
    • Delete backup file.

    then the potential problem is, that with the update of the test file and immediately deleting the backup file afterwards,
    the test file may NOT really was updated because Windows uses buffered IO by default.
    Means I write the file and the system reports back you did it,
    but in reality it’s only in the system buffer.
    Now I delete the backup file, because the system said yes the test file was written
    and puffff the power is gone before the system actually could write the file.

    If the power is gone at the first step to update the backup file, then the test file still exists, but is obsolete.
    If the power failed during the second step of updating the test file, then the backup file is still there.

    So, conclusion, if after each WriteFile a FlushBuffersFile would be made,
    then either the backup file or the test file would still exist in case of a power failure.

    Some thoughts about it?

    Btw. Here are the results with excerpts of the relevant information from the ProcMon Log.

    [Notepad++]
    1. => Test file gets created
    CreateFile			D:\backup_test.txt		Desired Access: Generic Write, Read Attributes
    WriteFile			D:\backup_test.txt		Offset: 0, Length: 7
    CloseFile			D:\backup_test.txt
    
    2. => Create a backup file and store the current content
    CreateFile			D:\...\backup_test.txt@2020-06-17_153651	Desired Access: Generic Write, Read Attributes
    WriteFile			D:\...\backup_test.txt@2020-06-17_153651	Offset: 0, Length: 16
    CloseFile			D:\...\backup_test.txt@2020-06-17_153651
    
    3. 
    => Update the backup file
    CreateFile			D:\...\backup_test.txt@2020-06-17_153651	Desired Access: Generic Write, Read Attributes
    WriteFile			D:\...\backup_test.txt@2020-06-17_153651	Offset: 0, Length: 18
    CloseFile			D:\...\backup_test.txt@2020-06-17_153651
    CreateFile			D:\...\backup_test.txt@2020-06-17_153651	Desired Access: Generic Write, Read Attributes
    WriteFile			D:\...\backup_test.txt@2020-06-17_153651	Offset: 0, Length: 34
    CloseFile			D:\...\backup_test.txt@2020-06-17_153651
    => Update the test file
    CreateFile			D:\backup_test.txt							Desired Access: Generic Write, Read Attributes
    WriteFile			D:\backup_test.txt							Offset: 0, Length: 34
    CloseFile			D:\backup_test.txt
    => Delete the backup file
    CreateFile			D:\...\backup_test.txt@2020-06-17_153651	Desired Access: Read Attributes, Delete
    CloseFile			D:\...\backup_test.txt@2020-06-17_153651
    
    
    
    [Atom]
    1.
    CreateFile			D:\backup_test.txt		Desired Access: Generic Read/Write
    WriteFile			D:\backup_test.txt		Offset: 0, Length: 9
    CloseFile			D:\backup_test.txt
    
    2.
    WriteFile			C:\...000003.log		Offset: -1, Length: 7
    WriteFile			C:\...000003.log		Offset: -1, Length: 4.616
    FlushBuffersFile	C:\...000003.log
    WriteFile			C:\...000003.log		Offset: 2.211.840, Length: 8.192, I/O Flags: Non-cached
    WriteFile			C:\...000003.log		Offset: -1, Length: 7
    WriteFile			C:\...000003.log		Offset: -1, Length: 4.682
    FlushBuffersFile	C:\...000003.log
    WriteFile			C:\...000003.log		Offset: 2.215.936, Length: 8.192, I/O Flags: Non-cached
    
    3.
    WriteFile			C:\...000003.log		Offset: -1, Length: 7
    WriteFile			C:\...000003.log		Offset: -1, Length: 4.092
    WriteFile			C:\...000003.log		Offset: -1, Length: 7
    WriteFile			C:\...000003.log		Offset: -1, Length: 590
    FlushBuffersFile	C:\...000003.log
    WriteFile			C:\...000003.log		Offset: 2.220.032, Length: 12.288, I/O Flags: Non-cached
    
    CreateFile			D:\backup_test.txt				Desired Access: Generic Read
    CreateFile			C:\...backup_test-73c059.txt	Desired Access: Generic Write, Read Attributes
    ReadFile			D:\backup_test.txt				Offset: 0, Length: 9
    
    WriteFile			C:\...backup_test-73c059.txt	Offset: 0, Length: 9
    ReadFile			D:\backup_test.txt				Offset: 9, Length: 65.536
    CloseFile			D:\backup_test.txt
    CloseFile			C:\...backup_test-73c059.txt
    
    CreateFile			D:\backup_test.txt				Desired Access: Generic Read/Write
    WriteFile			D:\backup_test.txt				Offset: 0, Length: 36
    CloseFile			D:\backup_test.txt
    
    CreateFile			C:\...backup_test-73c059.txt	Desired Access: Read Attributes, Write Attributes, Delete, Synchronize
    CloseFile			C:\...backup_test-73c059.txt
    
    CreateFile			D:\backup_test.txt				Desired Access: Generic Read
    ReadFile			D:\backup_test.txt				Offset: 0, Length: 36
    ReadFile			D:\backup_test.txt				Offset: 36, Length: 8.192
    ReadFile			D:\backup_test.txt				Offset: 36, Length: 8.192
    CloseFile			D:\backup_test.txt
    
    WriteFile			C:\...000003.log		Offset: -1, Length: 7
    WriteFile			C:\...000003.log		Offset: -1, Length: 4.616
    FlushBuffersFile	C:\...000003.log
    WriteFile			C:\...000003.log		Offset: 2.228.224, Length: 8.192, I/O Flags: Non-cached
    
    
    [SublimeText]
    1.
    CreateFile			D:\backup_test.txt		Desired Access: Generic Write, Read Attributes
    WriteFile			D:\backup_test.txt		Offset: 0, Length: 7
    FlushBuffersFile	D:\backup_test.txt
    WriteFile			D:\backup_test.txt		Offset: 0, Length: 4.096, I/O Flags: Non-cached
    CloseFile			D:\backup_test.txt
    
    2.
    CreateFile			D:\...sublime_session		Desired Access: Generic Write, Read Attributes
    WriteFile			D:\...sublime_session		Offset: 0, Length: 8.192
    WriteFile			D:\...sublime_session		Offset: 8.192, Length: 1.027
    FlushBuffersFile	D:\...sublime_session
    WriteFile			D:\...sublime_session		Offset: 0, Length: 12.288, I/O Flags: Non-cached
    CloseFile			D:\...sublime_session
    
    3.
    CreateFile			D:\backup_test.txt			Desired Access: Generic Write, Read Attributes
    WriteFile			D:\backup_test.txt			Offset: 0, Length: 34
    FlushBuffersFile	D:\backup_test.txt
    WriteFile			D:\backup_test.txt			Offset: 0, Length: 4.096, I/O Flags: Non-cached
    CloseFile			D:\backup_test.txt
    CreateFile			D:\...sublime_session		Desired Access: Generic Write, Read Attributes
    WriteFile			D:\...sublime_session		Offset: 0, Length: 8.192
    WriteFile			D:\...sublime_session		Offset: 8.192, Length: 1.031
    FlushBuffersFile	D:\...sublime_session
    WriteFile			D:\...sublime_session		Offset: 0, Length: 12.288, I/O Flags: Non-cached
    CloseFile			D:\...sublime_session
    
    
    [Visual Studio Code]
    1.
    CreateFile			D:\backup_test.txt		Desired Access: Generic Read/Write
    WriteFile			D:\backup_test.txt		Offset: 0, Length: 7
    FlushBuffersFile	D:\backup_test.txt
    WriteFile			D:\backup_test.txt		Offset: 0, Length: 4.096, I/O Flags: Non-cached
    CloseFile			D:\backup_test.txt
    
    2.
    CreateFile			C:\...b647b231e6b7493c3c99ee04ce0956d6		Desired Access: Generic Write, Read Attributes
    WriteFile			C:\...b647b231e6b7493c3c99ee04ce0956d6		Offset: 0, Length: 137
    FlushBuffersFile	C:\...b647b231e6b7493c3c99ee04ce0956d6
    WriteFile			C:\...b647b231e6b7493c3c99ee04ce0956d6		Offset: 0, Length: 4.096, I/O Flags: Non-cached
    CloseFile			C:\...b647b231e6b7493c3c99ee04ce0956d6
    
    3.
    CreateFile			C:\...b647b231e6b7493c3c99ee04ce0956d6		Desired Access: Generic Read/Write
    WriteFile			C:\...b647b231e6b7493c3c99ee04ce0956d6		Offset: 0, Length: 155
    FlushBuffersFile	C:\...b647b231e6b7493c3c99ee04ce0956d6
    WriteFile			C:\...b647b231e6b7493c3c99ee04ce0956d6		Offset: 0, Length: 4.096, I/O Flags: Non-cached
    CloseFile			C:\...b647b231e6b7493c3c99ee04ce0956d6
    CreateFile			D:\backup_test.txt							Desired Access: Generic Read/Write
    WriteFile			D:\backup_test.txt							Offset: 0, Length: 34
    FlushBuffersFile	D:\backup_test.txt
    WriteFile			D:\backup_test.txt							Offset: 0, Length: 4.096, I/O Flags: Non-cached
    CloseFile			D:\backup_test.txt
    CreateFile			C:\...b647b231e6b7493c3c99ee04ce0956d6  	Desired Access: Read Attributes, Write Attributes, Delete, Synchronize
    CloseFile			C:\...b647b231e6b7493c3c99ee04ce0956d6
    


  • Hi @Ekopalypse ,

    Your observations on how Windows writes data to disk are correct and this is exactly what I had written before.

    Hi @PeterJones ,

    @pnedev’s solution uses Win32 API calls which highly encourage Windows to perform the write to disk more often, to make sure that it doesn’t accidentally leave the disk with the NUL bytes.

    My fix does not “encourage” Windows to write to disk more often - it literally instructs it to do so when the user saves the file. It works every time.

    The problem is that the standard C library file API (fopen(), fwrite(), fflush()) used by Notepad++ does not have the means to instruct Windows10 to directly write data to disk without caching.

    Win32 API on the other hand uses the necessary OS syscall to immediately flush data to disk on write.

    What the user can do is to turn off the disk’s write caching in Windows settings - it can be done explicitly but by default caching is ON. I’m not sure it is a good idea though.
    With the spread of the SSD disks this write caching perhaps is meant to protect them and increase their lifetime as they don’t have that many write cycles.

    ALL programs relying on the standard C library file API WILL have that problem when saving data to disk NO MATTER HOW MANY INTERMEDIATE BACKUP WRITES THEY DO.

    BR



  • @pnedev said in Fix corrupted txt file (NULL):

    it literally instructs it to do so when the user saves the file. It works every time.

    Thanks for the clarification.

    Now, if we could just find a way to convince Don to accept the PR…



  • @pnedev said in Fix corrupted txt file (NULL):

    and this is exactly what I had written before

    So I could have spent my time with something more productive,
    like watching TV and eating chips ;-)
    At least I learned something new :-)

    I thank you for your contributions to npp,
    hopefully @donho will reconsider his opinion.



  • So I’m back to this thread, hopefully without a pure garbage contribution this time…

    @Ekopalypse said in Fix corrupted txt file (NULL):

    So I could have spent my time with something more productive,
    like watching TV and eating chips

    I think your analysis and “market comparison” was valuable and an interesting read. Put the chips down and turn off the TV.

    So I did a little research and read about how the proposed fix to the issue was rejected because it had too much risk of introducing a regression.

    I find that a tad bit ironic because isn’t a text editor that can lose data by corrupting hours of someone’s work already in a “regressed” state?

    So what’s the future on this?

    Don’s rejected a fix once. AFAICT Don doesn’t keep his “finger on the pulse” of current user concerns (i.e., doesn’t monitor here, doesn’t communicate with people via email about Notepad++). With some of the limited back-and-forth I’ve seen with him in issue comments on github there seems to be a language-barrier problem as well when the communication is in English.

    I just don’t know…



  • @PeterJones ,

    You are more than welcome.

    @Ekopalypse ,

    I thank you too for your great analysis. I find your comparison very interesting.
    Next time just keep the TV ON and the chips on a hand distance so you can combine your favourite activities ;) And make sure to save often ;))

    BR



  • Another user with this problem has been reported today on the “Live Support” channel:

    4d317605-f50e-4b03-afd5-7a5dbb206c2f-image.png

    “the big problem with N++”…indeed.



  • I have the same problem. Recuva and other programs did not help. I think data recovery companies can help. How do you think? That file is very important to me.



  • Hi @ben and others
    were you able to resolve this issue? any softwares that can be used? I have the same issue and need to recover a file. using Recuva i have similar experience as others.
    please help me if anybody has resolved this. thanks



  • I suppose I will just keep echoing these here as I happen to notice them, using this thread as a “rallying point”:

    23073b7f-d259-4c0c-9564-f715eaa8a4b3-image.png



  • @Alan-Kilborn
    I don’t think there is a solution for this. I have been looking for it for the past week. Nobody has a working solution. I see NULNUL in my notepad++, if i open it in notepad I see blank file, I was not sure of the original size of the file, but it shows as 13kb which might be smaller or of the same size, if I open it in sublime txt editor it shows all 0000 0000 entries for about 1000 lines. so I guess it may be of the original size. No recover from backup file option, recuva does not help. Not sure if it is due to system crash or malware correupted the file. This is the only file that got corrupted as it was last saved before the crash, none of the other opened up notepad++ files in the same session did not get corrupted. All other files in the backup folder I am seeing it, as I have this file saved in a different location than the notepad++ backup file location. If there is a working proven software I can pay for and will fix this, I will buy that. I do not see anything that is available for txt files.



  • @general-purpose said in Fix corrupted txt file (NULL):

    I don’t think there is a solution for this.

    From my somewhat limited knowledge, I agree.
    Well, except the solution is to prevent it in the future.
    But the change to the s/w for that has been declined, so…

    If there is a working proven software I can pay for and will fix this, I will buy that.

    Well, this would presume that the data still exists.
    If it doesn’t (which I would suspect to be true), sadly nothing can recover it.



  • I am not sure about that but I suspect the file contents might still exist somewhere on the disk.
    I had been thinking about the NUL issue before when I made the fix.
    Why is the file showing only NULs?
    You see, when you save the file, no matter if it is immediately written or partially written or not written to the disk at all, when opened after the crash the file should show some sane data + some corrupted portions. This is not the case.
    One reason would be that the file is wiped (written as all NULs) before the actual write to the disk. This is unlikely however because it contradicts the very reason why the actual content is not immediately written to the disk. Why first write NULs and then write again the new contents?

    The second reason is the file location.
    What we know about the file is its name. But from system point of view it is just an address to the file content. The system keeps in its file system a register - the correlation between the file name and where on the disk its content resides. Now when we save, the name is kept the same, the new content is supposedly written to the disk BUT THE PLACE WHERE IT IS WRITTEN MIGHT DIFFER FROM THE PREVIOUS ONE. In other words the address of the file is changed. This means that the register containing the correlation name - content needs to be updated as well (in the file system itself).
    Now imagine that you save the file, the new address is assigned and updated in the file system (so your file name points to the new location on the disk) BUT your new file content is still not flushed to the disk. The system crashes, you restart, open your file and what you see is the content on the new location (which has some random data or all NULs for example).
    In that case your file old content is perhaps still somewhere on the disk but the correlation between the file name and where that location is is gone.

    Software like Recuva might help by trying to find where that previous location was before the save and the crash but unfortunately it will not always succeed.
    As far as I remember there are some options you need to set in Recuva to maximize your chances for success but I can’t remember them for sure.
    If you haven’t done that already, go through Recuva settings and turn on all of them that imply something like “deep” or “thorough” or “aggressive” scan. It will give you more results you need to check yourself but it might help.
    If it doesn’t help then perhaps there is nothing else you can do.

    Good luck!



  • @pnedev
    I agree with your analysis and reasoning. Thanks for the detailed response. I used Recuva with deep scan. It did not help. As you said nothing much to do at this point.
    lessons learnt.



  • And at last, as a safety measure, I started using google drive backup and sync. I am a PHP developer, so all my files to be edited are in htdocs folder. gdrive backup and sync monitors the folder for file changes and starts to upload the file to drive as soon as the file contents are changed or the date modified is changed, I don’t know which one it is. But this has been serving me well.