Number of lines NP++ and Excel shows won't match



  • Hi all,
    I have two files (“IATE_de.txt” and “IATE_en.txt”) number of lines of which won’t match in NP++ and Excel respectively. Since yesterday I’ve been trying to align the lines in these two files, based on the numbers standing at the very beginning of lines in NP++ (namely the numbers before the very first tabulation) or in the first column in Excel. The said numbers are unuqie in each file and after going back and forth between NP++ and Excel and numerous operations to delete lines with unique numbers and keep those lines with numbers that are also in the other file, I believe to have managed to come very close to aligning the files, again based on the said numbers. Now there is a problem with the number of lines that NP++ and Excel shows and both won’t match! See the screenshots. Even if I repeated those operations for finding, deleting, keeping unique and/or duplicate numbers, hence the thier respective lines, the number of lines still won’t match. I believe the number of lines Excels shows for each file is correct, besides both match, which I expect and tried to achieve but NP++ won’t agree and shows different numbers of lines.

    Screenshots for the file “IATE_de.txt”
    alt text
    alt text

    Screenshots for the file “IATE_en.txt”
    alt text
    alt text

    Download:
    IATE_en.txt: https://easyupload.io/uzkda6
    IATE_en.txt: https://easyupload.io/z0tayl
    Could it be a bug in NP++ or could there be a problem with my clipboard when copying and pasting between NP++ and Excel?

    Could someone please have a look into the files and confirm whether the numbers of lines NP++ and Excel shows, matches for the respective file? I use NP++ ver. 7.9.3 64-bit and Excel 2016.

    Thank you so much in advance!
    glossar.



  • Addendum:

    1- I’ve sorted the lines in both files in NP++ (Ascending ignoring case) as well as with Excel’s sorting function - the result in both case is the same: The number of lines won’t match in NP++ and Excel for the respective file.

    2- Excel can’t find:

    • any more duplicate cells in the column A, correspoing to the numbers before the very first of the tabulation in NP++
    • unique cells in the column A, wenn treating the whole content of a file as a table on Excel (i.e. the columns A, B and C) and making Excel compare the two tables (i.e. the whole content of the both files in the olumns A, B, C and D, E, F) based the columns A and D that correspond to the numbers

    which suggests that the both files must have reached the state that I’ve tried to achieve: To align the lines of both files based the on the said numbers. But the fact that NP++ shows different number of lines for each file compared to that Excel shows, confuses me.



  • I really think it is on you to do what you’ve asked someone else to do here.

    After you do your work, if you do decide that there’s a bug in Notepad++ behavior, by all means return here for further discussion of it.



  • @Alan-Kilborn

    You seem not to have read what I wrote above. I repeated over and over again and am still repeating the process/operations. I wish I could screen-record it. Just now, I’ve re-produced it - the number of lines Excel shows/has and the number of lines NP++ shows after I simply copy the whole content from Excel and paste it to NP++ won’t match.



  • Hi, @glossar and All,

    I correctly downloaded your files, witch contain 402,132 lines for IATE_en.txt and 417,213 lines for IATE_de.txt

    Unfortunately, with my old XP SP3 machine, my Excel version cannot open tables with more than 65,536 lines :-((

    However, I can affirm that the last 65,536th line, of both files, are strictly identical in Notepad++ and in Excel :

    • Line 65,536 : 1109427 medical science safety factor, in IATE_en.txt

    • Line 65,536 : 1173731 AGRICULTURE, FORESTRY AND FISHERIES Holzwirtschaft, in IATE_de.txt

    Best Regards,

    guy038



  • @glossar said in Number of lines NP++ and Excel shows won't match:

    You seem not to have read what I wrote above. I repeated over and over again and am still repeating the process/operations. I wish I could screen-record it. Just now, I’ve re-produced it - the number of lines Excel shows/has and the number of lines NP++ shows after I simply copy the whole content from Excel and paste it to NP++ won’t match.

    I stand by my previous statement:

    “After you do your work, if you do decide that there’s a bug in Notepad++ behavior, by all means return here for further discussion of it.”



  • @guy038 said in Number of lines NP++ and Excel shows won't match:

    Hi, @glossar and All,

    I correctly downloaded your files, witch contain 402,132 lines for IATE_en.txt and 417,213 lines for IATE_de.txt

    Unfortunately, with my old XP SP3 machine, my Excel version cannot open tables with more than 65,536 lines :-((

    However, I can affirm that the last 65,536th line, of both files, are strictly identical in Notepad++ and in Excel :

    • Line 65,536 : 1109427 medical science safety factor, in IATE_en.txt

    • Line 65,536 : 1173731 AGRICULTURE, FORESTRY AND FISHERIES Holzwirtschaft, in IATE_de.txt

    Best Regards,

    guy038

    Hello Guy,
    Thank you so much for jumping in!

    The fact that you’ve seen the 402,132 lines for “IATE_en.txt” and 417,213 lines for “IATE_de.txt” supports my suspection that there might be something wrong with Notepad++, because, again, my Excel shows the same number of lines for both file (which- again- is expected), then the (first) 65,536 lines out of 400K or so, may not be an enough sample to claim otherwise, i.e. everthing works the way expected both in NP and Excel.

    Greetings,
    glossar



  • Hello, @glossar and All,

    Seemingly, your English and German EXCEL files contains exactly 402,128 records / raws / lines ! And, when opened in N++, you get more lines. This means that a single line, in Excel is sometimes displayed as two or more consecutive lines in Notepad++


    Thus, just slice your initial files in smaller parts and compare, each time, if the number of lines differ, when opened in Excel and Notepad++

    And, little by little, decrease the selection … till you get a file with, let’s say, 10 records, only, which still has a different number of lines in both applications. Then, it shouldn’t be very difficult to verify which characters forces the N++ text to be displayed in several lines, instead of a single line in Excel !

    BR

    guy038



  • @guy038 said in Number of lines NP++ and Excel shows won't match:

    Seemingly, your English and German EXCEL files contains exactly 402,128 records / raws / lines ! And, when opened in N++, you get more lines. This means that a single line, in Excel is sometimes displayed as two or more consecutive lines in Notepad++

    @glossar

    @guy038 sounds right. Assuming some cells have a carriage return / line feed probably in quotes which Excel respects and keeps in a single cell, but Notepad++ as a text editor probably puts on a new line thus increasing the line count in N++.

    Cheers.



  • @guy038 @Michael-Vincent

    Thank you both for the solution!

    Michael - I’ve just removed all quotes in both files and the numbers of lines now match in NP and Excel!

    Cheers,
    glossar



  • @glossar said in Number of lines NP++ and Excel shows won't match:

    number of lines of which won’t match in NP++ and Excel respectively

    Because the lines of a plain text file are not equivalent to the rows of an Excel spreadsheet. A row in a spreadsheet can contain one or more newline sequences, whereas a line in a text document in a text editor ends with a newline sequence by definition.

    If you don’t understand the difference between a text file and a spreadsheet, I suggest you start studying these subtle differences.

    I’ve just removed all quotes in both files and the numbers of lines now match in NP and Excel!

    Congratulations. You just fixed your file by breaking the data. I hope this is not critical data that you are breaking.

    "a1 line1
    a1 line2","b1 line 1
    b1 line 2","c1","d1 line 1
    d1 line2
    d1 line3"
    

    That is a CSV file that represents exactly one row in the spreadsheet, but is obviously five lines of text.

    If you were to just remove the quotes, then open the CSV in a spreadsheet, it would fill in a1, a2, b2, a3, b3, c3, a4, a5 – which is a completely different data structure, with way too many cells being populated, in the wrong rows and columns.

    If this data is anything other than a personal hobby you are doing for yourself with no outside implications, then please reconsider just blindly deleting the quote marks without understanding the consequences – because it could cost you or someone else their job, their money, or worse! PLEASE UNDERSTAND THIS!

    If you do not understand the differences between a spreadsheet and a text file, then please just use a spreadsheet for manipulating spreadsheet data until such time as you have understood the sometimes subtle

    OTHER READERS: Please do not follow the example of blindly deleting quotes in a CSV to get the number of rows in a spreadsheet to match with the number of lines shown in a text editor



  • @PeterJones said in Number of lines NP++ and Excel shows won't match:

    OTHER READERS: Please do not follow the example of blindly deleting quotes in a CSV to get the number of rows in a spreadsheet to match with the number of lines shown in a text editor

    YES, what he said. My “advice” above was more of a diagnosis than a course of treatment. The problem was probably quotes to capture newlines. I never meant that the fix was to remove the quotes! As @PeterJones says, this CHANGES your data!

    Cheers.


Log in to reply